A model for the genetic architecture of quantitative traits under

Amodelforthegeneticarchitectureofquantitativetraitsunder
stabilizingselection
YuvalB.Simonsa,1,KevinBullaugheyb,RichardR.Hudsonb,andGuySellaa,1
aDepartmentofBiologicalSciences,ColumbiaUniversity,NewYork,NY10027;
bDepartmentofEcology&Evolution,UniversityofChicago,Chicago,Illinois60637;
1Towhomcorrespondenceshouldbeaddressed:[email protected],
[email protected]
Abstract
Genome-wideassociationstudies(GWAS)inhumansarerevealingthegeneticarchitecture
ofbiomedical,lifehistoryandanthropomorphictraits,i.e.,thefrequenciesandeffectsizes
of variants contributing to heritable variation in a trait. To interpret these findings, we
need to understand how genetic architecture is shaped by basic population genetics
processes—notably, by mutation, natural selection and genetic drift. Because many
quantitativetraitsaresubjecttostabilizingselectionandgeneticvariationthataffectsone
traitoftenaffectsmanyothers,wemodelthegeneticarchitectureofafocaltraitthatarises
under stabilizing selection in a multi-dimensional trait space. We solve the model for the
phenotypicdistributionandallelicdynamicsatsteadystateandderiverobust,closedform
solutionsforsummariesofgeneticarchitecture.Ourresultssuggestthatthedistributionof
genetic variance among the loci discovered in GWAS take a simple form that depends on
one evolutionary parameter, and provide a simple interpretation for missing heritability
and why it varies among traits. We test our predictions against the results of GWAS for
heightandbodymassindex(BMI)andfindthattheyfitthedatawell,allowingustomake
inferencesaboutthedegreeofpleiotropyandthemutationaltargetsize.Ourfindingshelp
tounderstandwhyGWASforheightexplainmoreoftheheritablevariancethansimilarlysizedGWASforBMI,andtopredicthowfutureincreasesinsamplesizewilltranslateinto
explainedheritability.
1
Much of the phenotypic variation in human populations, including variation in
morphological, life history and biomedical traits, is “quantitative”, meaning that heritable
variation in the trait is largely due to small contributions from many genetic variants
segregatinginthepopulation(1,2).Quantitativetraitshavebeenstudiedsincethebirthof
biometrics over a century ago (1-3), but only in the past decades have technological
advancesmadeitpossibletosystematicallydissecttheirgeneticbasis(4-6).Notably,since
2007,genome-wideassociationstudies(GWAS)inhumanshaveledtotheidentificationof
thousands of variants reproducibly associated with hundreds of quantitative traits,
includingsusceptibilitytoawidevarietyofdiseases(4).Whilestillongoing,thesestudies
alreadyprovideimportantinsightsintothegeneticarchitectureofquantitativetraits,i.e.,
the number of variants that contribute to heritable variation and their frequencies and
effectsizes.
Perhaps the most striking observation to emerge from these studies is that, despite the
large sample size of many GWAS, all variants significantly associated with any given trait
typicallyaccountforless(oftenmuchless)than25%ofthenarrowsenseheritability(4,7,
8) (but see (9)). (Henceforth, we use “heritability” to refer to narrow sense heritability).
Whilemanyfactorshavebeenhypothesizedtocontributetothe“missingheritability”(7,8,
10-14),themoststraightforwardexplanationandtheemergingconsensusisthatmuchof
the heritable variation derives from variants with frequencies that are too low or effect
sizesthataretoosmallforcurrentstudiestodetect.Comparisonsamongtraitsalsosuggest
that there are substantial differences in architectures. For example, recent meta-analyses
GWASuncoveredseventimesasmanyvariantsforheight(697)thanforbodymassindex
(97), and together the variants for height account for more than four times the heritable
varianceforbodymassindex(~20%vs.~3-5%,respectively),despitethesmallersample
sizeoftheGWAS(250kcomparedto340k,respectively)(15,16).
These first glimpses underscore the need for theory that explains how genetic
architecturesareshapedinthecourseofevolutionandwhytheymightdifferamongtraits.
Such theory should allow us to use GWAS findings to make inferences about underlying
evolutionary parameters, helping to answer enduring questions about the processes that
maintainphenotypicvariationinquantitativetraits(5,17).Fromapracticalperspective,it
2
willhelptointerpretGWASfindingsandspecificallyitmayinformsolutionstothe“missing
heritability” problem (18-21). In so doing, it will inform the design of future mapping
studiesandphenotypicpredictions(18,20-22).
Development of such theory can be guided by empirical observations and first-principles
considerations. First, we know that the architecture of a trait arises from genetic and
populationgeneticprocesses.Newmutationsaffectingatraitariseataratethatdepends
onits“mutationaltargetsize”(i.e.,thenumberofsitesatwhichamutationwouldaffectthe
trait).Oncetheyarise,thetrajectoriesofvariantsthroughthepopulationaredetermined
bytheinterplaybetweengeneticdrift,demographicprocesses,andnaturalselectionacting
on them. These processes determine the number and frequencies of segregating variants
underlying variation in the trait. The genetic architecture further depends on the
relationship between the selection on variants and their effects on the trait. Notably,
selection on variants depends not only on their effect on the focal trait but also on their
pleiotropic effects on other traits. We would therefore expect both direct and pleiotropic
selectiontoshapethejointdistributionofallelefrequenciesandeffectsizes.
Multiple lines of evidence suggest that many quantitative traits are subject to stabilizing
selection, i.e., selection favoring an intermediate trait value (5, 23-27). For instance, a
declineinfitnesscomponents(e.g.,viabilityandfecundity)isobservedwithdisplacement
frommeanvaluesforavarietyoftraitsinhumanpopulations(28-30),inotherspeciesin
the wild (31, 32) and in experimental manipulations (31, 33). While less is known about
theselectionactingoncomplexdiseases,somelikelyreflectlargedeviationsfromoptimal
valuesofunderlyingtraitsunderstabilizingselection.Whatremainsunclearistheextent
towhichstabilizingselectionisactingdirectlyonvariationinagiventraitorresultsfrom
pleiotropiceffectsofthisvariationonothertraits.
Yet other lines of evidence suggest that pleiotropy is pervasive. For one, theoretical
considerations about the variance in fitness in natural populations and its accompanying
geneticloadsuggestthatonlyamoderatenumberofindependenttraitscanbeeffectively
selected on at the same time (34). Thus, the aforementioned relationships between the
value of a focal trait and fitness are likely heavily affected by the pleiotropic effects of
genetic variation on other traits (26, 34-36). Second, many of the variants detected in
3
human GWAS have been found to be associated with more than one trait (37-41). For
example,arecentanalysisofGWASrevealedthatvariantsthatdelaytheageofmenarchein
women tend to delay the age of voice drop in men, decrease body mass index, increase
adultheight,anddecreaseriskofmalepatternbaldness(37).Moregenerally,theextentof
pleiotropy revealed by GWAS appears to be increasing rapidly with improvements in the
powerandmethodology(37,42-44).Suchconsiderationspointtothepotentialimportance
ofpleiotropicselectiononquantitativegeneticvariationformanytraits.
Moreover, the discoveries emerging from human GWAS suggest that genetic variance is
predominated by additive contributions from numerous variants with small effect sizes.
Specifically, most or all of the heritability explained in GWAS of many traits derives from
variants with squared effect sizes that are substantially smaller than the total genetic
variance(e.g.(15,16,45,46)).Moreover,statisticalquantificationsofthegeneticvariance
tagged by genotyping suggest that such variants may account for the bulk of heritable
variance in many traits (9, 47, 48). Other analyses of GWAS results suggest that nonadditive interactions, e.g., dominance and epistasis, have only minor contributions to
heritable variance, consistent with theory and other lines of evidence (49-55). Notably,
considerable efforts to detect epistatic interactions in human GWAS have, by and large,
cameupempty-handed(9,56,57),withfewcounter-examplesmostlyinvolvingvariantsin
theMHCregion(52,56,58,59).Motivatedbytheseconsiderations,wemodelhowdirect
and pleiotropic stabilizing selection shape the genetic architecture of continuous,
quantitative traits by considering additive variants of small effects and assuming that
togethertheyaccountformostoftheheritablevariance.
There has been relatively little theoretical work aimed at relating population genetics
processeswiththeresultsemergingfromGWAS.Moreover,thefewexistingmodelshave
reached divergent predictions about genetic architecture, largely because of making
differentassumptionsabouttheeffectsofpleiotropy.Pritchard(20)consideredthe“purely
pleiotropic” extreme, in which selection on variants is independent of their effect on the
traitbeingconsidered(hefocusedondiseasesusceptibility).Inthiscase,wewouldexpect
the largest contribution to genetic variance in a trait to come from mutations that have
largeeffectsizesbutarealsoweaklyselectedorneutral,allowingthemtoascendtohigher
4
frequenciesthanothermutations.Otherstudieshaveconsideredtheoppositeextreme,in
which selection on variants stems entirely from their effect on the trait under
consideration (27, 60-64), and have shown that the greatest contribution to genetic
variancearisesfromstronglyselectedmutations(61,62)(wereturntothiscasebelow).
Inpractice,wewouldexpectmosttraitstofallsomewhereinbetweenthesetwoextremes.
Whiletherearecompellingreasonstobelievethatquantitativegeneticvariationishighly
pleiotropic,theeffectsofvariantsondifferenttraitsarelikelytobecorrelated.Ifso,evenif
agiventraitisnotsubjecttoselection,variantsthathavealargeeffectonitwillalsotend
tohavelargereffectsontraitsthatareunderselection(e.g.,bycausingalargeperturbation
to pathways that affect multiple traits) (36). Motivated by such considerations, EyreWalker (2010) and Caballero et al. (2015) considered models in which the correlation
between the strength of selection on an allele and its effect size can vary between the
purelypleiotropicanddirectselectionextremes(alsocf.65).Thesemodelsdivergeintheir
predictions about architecture, however. Assuming, as seems plausible, an intermediate
correlationbetweenthestrengthofselectionandeffectsize,Eyre-Walkerfindsthatgenetic
variance should be dominated by strongly selected mutations, whereas Caballero et al.
conclude the greatest contribution should arise from weakly selected ones (19). Their
conclusionsdifferbecauseofhowtheychosetomodeltherelationshipbetweenselection
and effect size, a choice based largely on mathematical convenience. We approach this
problem by explicitly modeling stabilizing selection on multiple traits, thereby learning,
ratherthanassuming,therelationshipbetweenselectionandeffectsizes.
TheModel
We model an individual’s phenotype as a vector in an n-dimensional Euclidian space, in
whicheachdimensioncorrespondstoanadditive,continuous,quantitativetrait.Wefocus
onthearchitectureofoneofthesetraits(say,the1stdimension),wherethetotalnumberof
traits parameterizes pleiotropy. Fitness is assumed to decline with distance from the
optimal phenotype positioned at the origin, thereby introducing stabilizing selection into
themodel.Specifically,weassumethatabsolutefitnesstakestheform
5
W 𝑟 = exp −
/0
12 0
,
(1)
where𝑟is the (n-dimensional) phenotype,𝑟 = 𝑟 is the distance from the origin and w
parameterizes the strength of stabilizing selection. However, we later show that the
specificformofthefitnessfunctiondoesn’tmatter.Moreover,theadditiveenvironmental
contribution to the phenotype can be absorbed into w (see (66) and SI Section 1.1); we
thereforeconsideronlythegeneticcontribution.
Wefurtherassumethestandardpopulationdynamicforadiploid,panmicticpopulationof
constantsizeN,includingWright-Fishersamplingwithviabilityselection(asdescribedby
Eq.(1)),mutation,freerecombinationandMendeliansegregation.Mutationsaffectingthe
phenotypeareintroducedaccordingtoaninfinitesitesmodel.Thenumberpergamete,per
generation follows a Poisson distribution with mean U, reflecting both the mutational
targetsizeandthemutationratepersite.Thesizeofmutationsinthen-dimensionaltrait
space, 𝑎1 = 𝑎
1
,isdrawnfromsomedistribution,assumingonlythat𝑎1 ≪ 𝑤 1 .Welater
show that this requirement is equivalent to the standard assumption about selection
coefficients satisfying𝑠 ≪ 1(also cf. Supplementary Section 4.3). The directions of
mutationsareassumedtobeuniformly(isotropically)distributed,althoughwelatershow
thatourresultsarerobusttorelaxingthisassumption.Anindividual’sphenotypecanthen
beobtainedaddingupthephenotypiceffectsofhisorheralleles,i.e.,
𝑟=
8
1
;
:<8 𝑔: 𝑎: ,
(2)
where L is the number of sites with fixed or segregating mutations,𝑎: is the phenotypic
differencebetweenhomozygotesfortheancestralandderivedallelesatsitel(becauseof
theinfinitesitesassumption,thereareatmosttwoalleles)and𝑔: = 0, 1or2isthenumber
ofderivedallelesatsitel.
Results
Thephenotypicdistribution.Inthefirstthreesections,wedevelopthetoolsthatwelater
use to study genetic architecture. We start by considering the equilibrium distribution of
phenotypesinthepopulationandgeneralizepreviousresultsforthecasewithasingletrait
6
(27, 60, 61, 64). Under biologically sensible conditions on the rate and size of mutations
(i.e., when 1 ≫ 𝑈 ≫ 1/2𝑁 and 𝑎1 ≪ 𝑤 1 ; SI Section 4), this distribution is well
approximated by a tight multivariate normal centered at the optimum, namely by the
probabilitydensity:
f 𝑟 =
8
F
1DE 0 0
exp −
/0
1E 0
(3)
wherethevarianceofthedistributionsatisfies𝜎 1 ≪ 𝑤 1 (seeSISection4).Intuitively,the
phenotypic distribution is normal because it derives from additive and (approximately)
i.i.d.contributionsfrommanysegregatingsites.Thedistributionstaystightlyconcentrated
around the optimum because stabilizing selection becomes stronger with increasing
displacement from it, and because the population responds to such selection by minor
changestoallelefrequenciesatmanysegregatingsites,rapidlyoffsettingthedisplacement.
With phenotypes close to the optimum, only the curvature of the fitness function at the
optimum (i.e., the multi-dimensional second derivative) affects the selection acting on
individuals.Inaddition,itisalwayspossibletochooseanorthonormalcoordinatesystem
centered at the optimum, in which the trait under consideration varies along the first
coordinateandaunitchangeinothertraits(inothercoordinates)neartheoptimumhave
the same effect on fitness. These considerations suggest that the equilibrium behavior is
insensitivetoourchoiceoffitnessfunction.Moreover,intheSI(Section5),weshowthat
perturbations of the population mean from the optimum are rapidly offset by minor
changestoallelefrequenciesatnumeroussites,andthatthisbehaviorlendsrobustnessto
theequilibriumdynamicswithrespecttothepresenceofmajorloci,changesintheoptimal
phenotypeovertime,andmoderateasymmetriesinthemutationaldistribution.
Allelic dynamic. Next, we consider the dynamics at a segregating site, and generalize
previousresultsforthecasewithasingletrait(62-64).Thedynamicscanbedescribedin
terms of the first two moments of change in allele frequency in a single generation (cf.
(67)). For an allele with phenotypic effect𝑎and frequency q, the distribution of
contributions to the phenotype from the genetic background,𝑅, follows from the
7
equilibrium phenotypic distribution (Eq. 3) and is well approximated by probability
density
8
f 𝑅 𝑎, 𝑞 =
1DE 0
exp −
F/0
KLMN
1E 0
0
.
(4)
By averaging the fitness of the three genotypes over the distribution of genetic
backgrounds,wefindthatthefirstmomentiswellapproximatedby
E Δ𝑞 ≈
N0
S2 0
𝑝𝑞 𝑞 −
8
1
,
(5)
assumingthat𝑎1 and𝜎 1 ≪ 𝑤 1 (cf.SISection4).Bythesametoken,wefindthat
V Δ𝑞 ≈
VM
1W
, (6)
whichisthestandardsecondmomentwithgeneticdrift.
The functional form of the first moment is equivalent to that of the standard viability
selectionmodelwithunder-dominance.Thisresultisahallmarkofstabilizingselectionon
(additive) quantitative traits: with the population mean at the optimum, the dynamics at
differentsitesaredecoupledandselectionatagivensitereducesthephenotypicvariance
(i.e., ½a2pq), thereby pushing rare alleles to loss. Comparison with the standard viability
selection model shows that the selection coefficient in our model is s=a2/4w2, or
S=2Ns=Na2/2w2 in scaled units. In other words, the selection acting on an allele is
proportionaltoitssize-squaredinthen-dimensionaltraitspace(wherewtranslateseffect
sizeintounitsoffitness).
Therelationshipbetweenselectionandeffectsize.Thestatisticalrelationshipbetween
thestrengthofselectionactingonmutationsandtheireffectonagiventraitfollowsfrom
theaforementionedgeometricinterpretationofselection.Specifically,allmutationswitha
given selection coefficient, s, lie on a(𝑛 − 1)– dimensional hypersphere with radius𝑎 =
2𝑤 𝑠,andanygivenmutationsatisfies
𝑠=
8
S2 0
𝑎1 =
8
S2 0
\
1
[<8 𝑎[ ,
(7)
8
where𝑎[ is the allele’s effect on the i-th trait (Fig. 1A). Our assumption that mutation is
isotropic then implies that the probability density of mutations on the hypersphere is
uniform.
(b)
Trait2
(n-1)-dimensional
hypersphere
!
a
(n-2)-dimensional
cross-section
Probability density
(a)
a1
Trait1
(focaltrait)
a
Tr
3
it
0.4
n=1
n=4
n=10
n→∞
0.3
0.2
0.1
Effect size Hin units of
0.0
-4
-2
0
2
a ën
2
L 4
Fig. 1. The distribution of effect sizes corresponding to a given selection coefficient. (a)
Mutations with selection coefficient, s, lie on a(𝑛 − 1)– dimensional hypersphere with
radius𝑎 = 2𝑤 𝑠;andtheprobabilityofsuchmutationswitheffectsizea1isproportional
to the volume of the(𝑛 − 2)– dimensional cross section of that hyper-sphere with
projection𝑎8 . (b) The distribution of effect sizes corresponding to a given selection
coefficient,measuredinunitsofthedistribution’sstandarddeviation.
The distribution of effect sizes on a focal trait, a1, corresponding to a given selection
coefficient,s,follows.Giventhatmutationissymmetricinanygiventrait,𝐸 𝑎8 𝑠 = 0,and
giventhatitissymmetricamongtraits,
E 𝑎81 𝑠 = 𝑎1 𝑛 = 2𝑤 1 𝑛 𝑠.
(8)
Moregenerally,theprobabilitydensitycorrespondingtoaneffectsizea1isproportionalto
thevolumeofthe(𝑛 − 2)–dimensionalcrosssectionofthathyper-spherewithprojection
a1(Fig.1A).Forasingletrait,thisimpliesthat𝑎8 = ±𝑎withprobability½,andfor𝑛 > 1,it
impliestheprobabilitydensity
φ\ 𝑎8 𝑎 =
b \ 1 /b (\c8) 1
\/1
8
1D N 0 \
1−
8
Nd0
\ N0 \
Fef
0
(9)
(cf. SI Section 1.2). Intriguingly, when the number of traits n increases, this density
approachesanormaldistribution,i.e.,
9
Nd
N 0 /\
~N(0,1).
(10)
Thislimitisalreadywellapproximatedforamoderatenumberoftraits(e.g.,n=10;Fig.1B).
The limit behavior also holds when we relax the assumption of isotropic mutation. This
generalizationisanimportantone,becausehavingchosenaparameterizationoftraitsin
whichthefitnessfunctionneartheoptimumisisotropic,wecannolongerassumethatthe
distributionofmutationsisalsoisotropic(68).Specifically,mutationsmighttendtohave
larger effects on some traits than on others and their effects on different traits might be
correlated.InSISection5,weshow,however,thatexcludingpathologicalcases,thelimit
distribution (Eq. 10) also holds for anisotropic mutation. To this end, we introduce the
conceptofaneffectivenumberoftraits,whichcantakeanyrealvalue≥1andisdefinedas
thenumberoftraits,withexpectedfitnesseffectequaltothatofthefocaltrait,requiredto
satisfy Eq. 8. In the limit of high pleiotropy, the selection strength on a mutation and the
magnitude of its effect on a focal trait are correlated, implying that the kind of “purely
pleiotropic” extreme postulated in previous work cannot arise under our model (19-21).
Mountingevidencethatgeneticvariationishighlypleiotropic(seeIntroduction)alongside
the robustness of our model raise the intriguing possibility that this limit form applies
quitegenerally.
Genetic architecture. We can now derive closed forms for summaries of genetic
architecture (cf. SI Section 2). For mutations with a given selection coefficient, the
frequency distribution follows from the diffusion approximation based on the first two
moments of change in allele frequency (Eq. 5 and 6; (67)), and the distribution of effect
sizesfollowsfromthegeometricconsiderationsoftheprevioussection.Conditionalonthe
selection coefficient, these distributions are independent and therefore the joint
distribution of frequency and effect size equals their product. Summaries of architecture
canbeexpressedasexpectationsoverthejointdistributionoffrequenciesandeffectsizes
for a given selection coefficient, and then weighted according to the distribution of
selection coefficients. Given that we know little about the distribution of selection
10
coefficientsofmutationsaffectingquantitativetraits,weexaminehowsummariesdepend
onthestrengthofselection.
Expected variance per site. We focus on the distribution of additive genetic variance
among sites, a central feature of architecture that is key to connecting our model with
GWASresults.Westartbyconsideringhowselectionaffectstheexpectedcontributionofa
site to additive genetic variance in a focal trait. We include monomorphic sites in the
expectation,suchthattheexpectedtotalvarianceisgivenbytheproductoftheexpectation
per-site and the population mutation rate (i.e., 2NU). Under the infinite sites assumption,
sitescanbemonomorphicorbi-allelicandtheirexpectedcontributionis
E d0𝑎81 𝑝𝑞 𝑆 = E 𝑎81 𝑆 E d0𝑝𝑞 𝑆 =
12 0
\W
𝑆E d0𝑝𝑞 𝑆 (11)
(expressed in terms of the scaled selection coefficient S). Thus, the degree of pleiotropy
only affects the expectation through a multiplicative constant. This effect of pleiotropy is
not identifiable from data, because even if we could measure genetic variance in units of
fitness(e.g.,asopposedtounitsofthetotalphenotypicvariance),westillwouldn’tbeable
todistinguishbetweenwandn.Weinsteadfocusontheproportionaleffectofselectionon
thecontributiontovariance,whichisinsensitivetothedegreeofpleiotropy.
Theproportionalcontributiontogeneticvarianceasafunctionoftheselectioncoefficient
wasdescribedbyKeighleyandHill(intheonedimensionalcase;(62))andisshowninFig.
2A.Whenselectionisstrong(S>>1),itseffectonallelefrequency(whichscaleswith1/S)is
canceled out by its relationship with the effect size, yielding a constant contribution to
geneticvariancepersite,vs=2w2/(nN),regardlessoftheselectioncoefficient(SISection3;
Fig. 2A). Henceforth, we measure genetic variance relative to vs. When selection is
effectively neutral (𝑆 ≪ 1) and thus too weak to affect the allele frequency, the expected
contribution of a site to genetic variance scales with the effect size and is ½S, which is
considerably lower than under strong selection (SI Section 3; Fig. 2A). In between these
selection regimes, allele frequency is affected by under-dominance, resulting in a more
complexdependencyontheselectioncoefficient(SISection3).Themaximalcontribution
togeneticvariancepersiteoccurswithinthisrange(at𝑆 ≈ 10)andisslightlyhigher(by
~30%)thanunderstrongselection(Fig.2A).Henceforth,werefertothisselectionregime
11
as intermediate, to distinguish it from the considerably weaker selection in the nearly
neutralregime.Theseresultssuggestthateffectivelyneutralsitesshouldcontributemuch
lesstogeneticvariancethanintermediateandstronglyselectedones(61,62).
withpleiotropy
(a)
Strong
1
0.75
0.5
0.25
0
Effectively
neutral
Scaled selection coefficient HSL
0.1
1
10
100
1000
100
(b)
80
60
Strongselection(S=100)
Intermediateselection(S=10)
Effectivelyneutral(S=0.5)
40
20
0
0
Threshold variance Hrelative to vs L
1
2
3
4
5
% variance from loci > v
1.25
Intermediate
% variance from loci > v
Expected variance
Hin units of vs L
withoutpleiotropy
1.5
100
80
(c)
6
4
2
60
0
5
10
15
40
20
0
0
Threshold variance Hrelative to vs L
1
2
3
4
5
Fig. 2. The distribution of additive genetic variance among sites. In (a), we plot the
expectedcontributionasafunctionofthescaledselectioncoefficient.In(b)&(c),weshow
theproportionofgeneticvariancestemmingfromsitesthatcontributemorethanthevalue
onthex-axis,forasingletrait(b)andinthepleiotropiclimit(c).
Distribution of variance among sites. Next, we consider how genetic variance is
distributed among sites with a given selection coefficient. We focus on the distribution
amongsegregatingsites(includingmonomorphiceffectswouldjustaddapointmassat0).
This distribution is especially relevant to interpreting the results of GWAS, because, to a
firstapproximation,astudywilldetectonlysiteswithcontributionstovarianceexceeding
a certain threshold (which decreases as the study size increases; see Discussion). We
therefore represent the distribution in terms of the proportion of genetic variance, G(v),
arisingfromsiteswhosecontributiontogeneticvarianceexceedsathresholdv.
Webeginwiththecaseofasingletrait(n=1),inwhichselectiononanalleledeterminesits
effect size (Fig 2B). When selection is strong (S>>1), the proportion of genetic variance
exceeding a threshold v is also insensitive to the selection coefficient and takes a simple
form,with
G 𝑣 = exp(−2𝑣) (12)
(SISection3).Incontrast,intheeffectivelyneutralrange(𝑆 ≪ 1),
G 𝑣 =
1 − 𝑣 𝑣lNm ,
(13)
12
8
wherethedependencyontheselectioncoefficiententersthrough𝑣lNm = 𝑆,whichisthe
n
maximalcontributiontovarianceforasitewithselectioncoefficientSthatisobtainedwith
an allele frequency of ½ (SI Section 3). In the intermediate selection regime,G 𝑣 is also
intermediate and takes a more elaborate functional form (SI Section 3). These results
suggest how genetic variance would be distributed among sites given any distribution of
selectioncoefficients(Fig2B):startingfromsitesthatcontributethemost,thedistribution
would at first be dominated by contributions from strongly selected sites, then the
intermediate selected sites would begin to contribute, whereas effectively neutral sites
8
wouldenteronlyfor𝑣 < 𝑆 ≪ 1.
n
The effect of pleiotropy is to cause sites with a given selection coefficient to have a
distributionofeffectsizesonthefocaltrait,therebyincreasingthecontributiontogenetic
varianceofsomesitesanddecreasingitforothers.InSISection3,weshowthatincreasing
the degree of pleiotropy, n, increases the proportion of genetic variance,𝐺 𝑣 , for any
threshold,v,regardlessofthedistributionofselectioncoefficients(Fig.S2).Whenvariation
inatraitissufficientlypleiotropicforthedistributionofeffectsizestoattainthelimitform
(Eq.10):
G 𝑣 = 1 + 2 𝑣 exp −2 𝑣 (14)
(15)
forstronglyselectedsitesand
G 𝑣 = exp −4 𝑣 𝑆 foreffectivelyneutralones(Fig.2C).Forintermediatelyselectedsites,𝐺 𝑣 issimilartothe
caseofstrongselection,withmeasurabledifferencesonlywhen𝑣 ≫ 𝑣r (seeinsetinFig.2C
andSISection3).WewouldthereforeexpectthatasthesamplesizeofGWASincreaseand
thethresholdcontributiontovariancedecreases,intermediateandstronglyselectedsites
willbediscoveredfirst,andeffectivelyneutralsiteswillbediscoveredmuchlater.IntheSI
(SISection3andFig.S13),wealsoderivecorollariesoftheseresultsforthedistributionof
numbersofsegregatingsitesthatmakeagivencontributiontogeneticvariance.
13
Discussion
Inhumans,GWASformanytraitsdisplayasimilarbehavior:whensamplesizesaresmall,
the studies discovered almost nothing, but once they exceeded a threshold sample size,
boththenumberofassociationsdiscoveredandtheheritabilityexplainedsuddenlybegan
to increase rapidly (4, 69). Intriguingly though, both the threshold study size and rate of
increase vary among traits. These observations raiseseveral questions, including: How is
the threshold study size determined? How should the number of associations and
explained heritability increase with study size once this threshold is exceeded? With an
orderofmagnitudeincreaseinstudysizesintothemillionsimminent,howmuchmoreof
the genetic variance in traits should we expect to explain? The theory that we developed
offerstentativeanswerstothesequestions.
To relate our theory to GWAS, we must first account for the power to detect loci that
contributetoquantitativegeneticvariation.Instudiesofcontinuoustraits,thepowercan
beapproximatedbyastepfunctioninthecontributionoflocitoadditivegeneticvariance
(ignoringtheeffectsofgenotyping,whichareconsideredbelow,andsomeotherpotential
complications,e.g.(70)).Inotherwords,locithatcontributemorethanathresholdvalue
𝑣 ∗ shouldbedetectedandthosethatcontributelessshouldnot.Thethresholdisaffected
bythestudysize,𝑚,andbythetotalphenotypicvarianceinthetrait,𝑉v ,where𝑣 ∗ 𝑚, 𝑉v ∝
𝑉v /𝑚(SI Section 6) (69). Given a trait and study size, the number of associations
discoveredandheritabilityexplainedshouldthenfollowfromthetailsofthedistributions
thatwehaveconsidered,correspondingtothethreshold𝑣 ∗ 𝑚, 𝑉v .
Our results therefore suggest that when genetic variation in a trait is sufficiently
pleiotropic, the first loci to be discovered in GWAS will be intermediate or strongly
selected,withcorrespondinglylargeeffectsizes(i.e.,𝑆 ≈
W\
12 0
𝑎81 ≫ 1).Thethresholdstudy
size for their discovery is proportional to𝑉v /𝑣r , i.e., to the total phenotypic variance
measuredinunitsoftheexpectedcontributionofastronglyselectedsitetovariance.This
result makes intuitive sense, as the total phenotypic variance can be viewed as the
backgroundnoiseforthediscoveryofindividualstrongly(andintermediate)selectedsites.
Oncethethresholdstudysizeisexceeded,theincreaseinthenumberofassociationsand
14
explained heritability with study size directly follow from the functional forms that we
derivedforintermediateandstronglyselectedsites(Eq.14andS41andFigs.2C,S13B,3A
andS15A).Thetraitspecificbehaviorboilsdowntothesameparameter,𝑉v /𝑣r ,relatingthe
studysizewiththethreshold𝑣 ∗ .Someresultsaremodifiedwhenvariationinatraitisonly
weakly pleiotropic, which is probably less common: notably, intermediate selected sites
wouldbegintobediscoveredonlyafterstronglyselectedones(cf.Fig.2B,S13A,andS14)
andthethresholdstudysizeforstronglyselectedsiteswouldbehigher(cf.Eq.12andFig.
S14A).
30
30
(a)
80 Strongselection(S=100)
Intermediateselection(S=10)
60 Effectivelyneutral(S=0.5)
40
20
20
10
10
20
0
%
explained
% heritability
heritability explained
100
1
00
50 100 200 11
Study size Hin units of VP êvs L
2
5
10
20
(b)
Re-sequencing
g
pin
oty
Gen
% heritability explained
10
30
100
33
10
30
100 300
300
Scaled selection
selection coefficient
Scaled
coefficientHSL
HSL
Fig.3.PredictionsforGWASofcontinuoustraits,inthepleiotropiclimit(cf.SISection3
forderivations).(a)Theproportionofheritabilityexplainedasafunctionofstudysize.(b)
The heritability explained in re-sequencing and genotyping studies as a function of the
scaled selection coefficient. The study size was chosen such that a re-sequencing study
would capture 25% of the strongly selected variance (corresponding to a study size of
~16𝑉v /𝑣r ).Forthecasewithoutpleiotropy,seeFig.S14A-B.
Our results further suggest that for missing heritability to be mainly due to the limited
powerofgenotypingtodetectrarevariants(14,71),selectiononthegeneticvarianceina
trait would have to be very strong. GWAS impute allele frequencies at loci that are not
includedinthegenotypingarray,andtheaccuracyofthisimputationdeclinesrapidlywhen
minorallelefrequenciesbecometoosmalltobewellrepresentedintheimputationpanel
(72, 73). This decline in power is dealt with in several ways, all of which are similar in
effect to considering alleles above a threshold frequency that allows them to be imputed
accurately (74). For example, a study using an Illumina 1M SNP array and the 1000
15
genomes phase III as an imputation panel would include only loci with MAF above ~1%
(75).Eveniflocibelowthatfrequencywereimputedwithperfectaccuracy,however,they
would only be detected if their contribution to variance exceeded the study’s threshold,
implying that they would also have to have extremely large effect sizes; and under our
model,extremelylargeeffectsizesimplyverystrongselection(Fig.3B).Asanillustration,
if a re-sequencing study identified 25% of the strongly selected (S>>1) variance, a
genotyping study with the same sample size would suffer a 50% decrease in explained
heritabilityonlyif𝑆 ≈ 200(assumingthesamegenotypingplatformandimputationpanel
as above). If we further assuming a constant effective population size of 104 this would
imply a (heterozygote) selection disadvantage of 1%. We note, however, that recent
populationgrowthofthekindthathasbeeninferredforEuropeanpopulations(76,77)is
expectedtoreducethefrequencyofallelesunderverystrongselection(77,78),leadingto
agreaterlossofpowerduetogenotyping(seebelow).
While some heritable variation may also arise from effectively neutral mutations, our
results indicate that their expected contribution to genetic variance per site is much
smallerthanforstronglyselectedandintermediateones(Fig.2;(61,62)).Oneimplication
is that for the contribution of effectively neutral variation to be comparable to that of
intermediate and strongly selected variation, they would need to have a much larger
mutationaltargetsize.Itisdifficulttoevaluatewhetherthisisplausible,becausecurrent
GWASarelikelyseverelyunderpoweredtodetectthem(butsee(79)).Forexample,sites
withS=0.1areexpectedtocontributeonly1/20thofthegeneticvariancethatarisesfrom
strongly selected ones; in the high pleiotropy limit, only 1% of that variance would be
mappedwhenstudysizesarelargeenoughtomap95%ofthestronglyselectedvariance.
Importantly,thesetheoreticalpredictionscanbeusedtomakeinferencesbasedonGWAS
results.Asanillustration,weconsiderheightandbodymassindex(BMI)inEuropeans,two
traits for which GWAS have discovered a sufficiently large number of genome-wide
significantassociationstoallowforwellpoweredtestsandinferences(697forheight(16)
and 97 for BMI (15)). We fit our theoretical predictions for the distribution of variance
among loci to the distributions observed for the genome-wide significant associations
reported for each of these traits (cf. Supplement Section 7 for details). In so doing, we
assume that the mapped loci are under intermediate or strong selection, because our
16
theory suggests that they should be the first to be mapped. We also make the plausible
assumptionthatgeneticvarianceforthesetraitsishighlypleiotropic(seeIntroduction;37,
42) and therefore assume the limit form for the distribution of effect sizes. Under these
assumptions,weexpectthedistributionofvarianceamonglocitobewellapproximatedby
asimpleform(Eq.S89),whichdependsonasingleparameter(vs).Whenweestimatethis
parameter,wefindthatthetheoreticaldistributionfitsthedatawell(Fig.4A).Specifically,
wecannotrejectourmodelbasedonthedata,suggestingthatwithonlyoneparameter,we
obtain good fits to the data for both traits (by a Kolmogorov-Smirnov test,𝑝 = 0.14for
heightand𝑝 = 0.54forBMI;cf.SupplementSection7).Bycomparison,withoutpleiotropy
(n=1),ourpredictionsprovideapoorfittothesedata(byaKolmogorov-Smirnovtest,𝑝 <
10cx forheightand𝑝 = 0.05forBMI;Fig.S11).
(a)
25
Model
+95%CI
20
15 Height
10
BMI
5
0
Data
0
Explainedheritability(%)
Heritability (%)
%heritabilityfromloci>v
% heritability
2
4
6
Threshold variance
Thresholdvariance(relativetov
s)
(b)
40
Height
30
20
10
0
BMI
0
200
400
600
800
Study size (in thousands)
1000
Studysize(inthousands)
Fig.4.ModelfitandpredictionsforheightandBMI.In(a),weshowthefitforassociated
loci. In (b), we present our predictions for future increases in the heritability explained
with GWAS size. The predicted number of associations is shown in Fig. S16. 95% CIs are
basedonbootstrap;seeSupplementSectionS7fordetails.
By fitting the model to GWAS results, we can also make inferences about evolutionary
parametersunderlyingquantitativegeneticvariation.Estimatingthedegreeofpleiotropy
(n) as an additional parameter in the model, we find that for both height and BMI, n is
sufficientlylargeforittobeindistinguishablefromthehighpleiotropylimit.Basedonthe
shape of the fitted distributions in this limit and the number of loci that fall above the
17
threshold contribution to variance of the studies, we can also estimate the mutational
target size and proportion of the heritable variance arising from mutations within the
rangeofselectioncoefficientsvisibleintheseGWAS.Theseestimatessuggestthat,within
this range, height has a target size of ~5Mb, which account for ~50% of the heritable
variance,whereasBMIhasatargetsizeof~1Mb,whichexplains~15%ofthevariance(see
SITableS2).
TheseparameterestimatescanhelptointerpretGWASresultsandmakepredictionsabout
futurestudies.TheysuggestthattheGWASforheightsucceededinmappingasubstantially
greater proportion of the heritable variance (~20% compared to ~3-5%) despite their
smaller sample size (~250K compared to ~340K), because the proportion of variance
arisingfrommutationswithintherangeofdetectableselectioneffectsismuchgreaterfor
heightthanforBMI(~50%comparedto~15%).Furthermore,theestimatesoftargetsizes
and the relationship between sample size and threshold contribution to variance can be
usedtopredicthowtheexplainedheritabilityandnumberofassociationsshouldincrease
with sample size (Figs. 4B and S16). These predictions are likely under-estimates as the
rangeofselectioneffectsitselfshouldalsoincreasewithsamplesize.
Tomaketheseinferencesandpredictionsmorereliable,animportantnextstepwillbeto
account for the effects of recent human demography on the distributions of genetic
variance(78,80-82).Notably,therecentgrowthofEuropeanpopulations(76,77)would
have affected the distribution of genetic variance arising from very strongly selected
mutations. Specifically, it will have reduced its expectation, because many of these
mutationswouldhaveenteredthepopulationsincetheonsetofgrowthandthuswillbeat
lowerfrequencies(77,78,82).Thisconsiderationalongsidetheeffectofgenotypingwould
suggestthattherangeofselectioneffectsrevealedbycurrentGWASisinfactboundfrom
above, and includes only moderately selected loci. However, moving beyond qualitative
considerationswouldrequireincorporatingtheeffectsofdemographyintothemodel(77,
78,82).DoingsomayallowustopredicttheoutcomeoffutureGWASaswellastolearn
about the relative importance of different evolutionary and genetic forces in shaping
quantitativegeneticvariationfordifferenttraits.
18
Acknowledgements.WehavebenefitedhugelyfromdiscussionsandcommentsfromGuy
Amster, Nick Barton, Jeremy Berg, Graham Coop, Laura Hayward, David Murphy, Joe
Pickrell, Jonathan Pritchard and Molly Przeworski. The work was funded by
NIHgrantGM115889toGS.
References:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
FalconerDS&MackayTFC(1996) Introductiontoquantitativegenetics(Benjamin
Cummings,Essex,England)4thEd.
Lynch M & Walsh B (1998) Genetics and analysis of quantitative traits (Sinauer
Sunderland,MA).
Provine WB (2001) The origins of theoretical population genetics: With a new
afterword(UniversityofChicagoPress).
VisscherPM,BrownMA,McCarthyMI,&YangJ(2012)Fiveyearsofgwasdiscovery.
AmJHumGenet90:7-24.
Barton NH & Keightley PD (2002) Understanding quantitative genetic variation.
NatureReviewsGenetics3:11-21.
ManolioTA(2010)Genomewideassociationstudiesandassessmentoftheriskof
disease.NewEnglJMed363(2):166-176.
Bloom JS, Ehrenreich IM, Loo WT, Lite T-LV, & Kruglyak L (2013) Finding the
sourcesofmissingheritabilityinayeastcross.Nature494(7436):234-237.
Manolio TA, et al. (2009) Finding the missing heritability of complex diseases.
Nature461:747-753.
Zaitlen N, et al. (2013) Using extended genealogy to estimate components of
heritabilityfor23quantitativeanddichotomoustraits.PLoSGenet9(5):e1003520.
Gibson G (2012) Rare and common variants: Twenty arguments. Nat Rev Genet
13(2):135-145.
Eichler EE, et al. (2010) Missing heritability and strategies for finding the
underlyingcausesofcomplexdisease.NatRevGenet11(6):446-450.
Zuk O, Hechter E, Sunyaev SR, & Lander ES (2012) The mystery of missing
heritability:Geneticinteractionscreatephantomheritability.ProcNatlAcadSciUSA
109(4):1193-1198.
Lee SH, Wray NR, Goddard ME, & Visscher PM (2011) Estimating missing
heritability for disease from genome-wide association studies. Am J Hum Genet
88(3):294-305.
Zuk O, et al. (2014) Searching for missing heritability: Designing rare variant
associationstudies.ProcNatlAcadSciUSA111:E455-E464.
Locke AE, et al. (2015) Genetic studies of body mass index yield new insights for
obesitybiology.Nature518:197-206.
Wood AR, et al. (2014) Defining the role of common variation in the genomic and
biologicalarchitectureofadulthumanheight.NatGenet46(11):1173–1186.
de Vladar HP & Barton N (2014) Stability and response of polygenic traits to
stabilizingselectionandmutation.Genetics197(2):749-767.
19
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
AgarwalaV,FlannickJ,SunyaevS,ConsortiumGD,&AltshulerD(2013)Evaluating
empiricalboundsoncomplexdiseasegeneticarchitecture.NatGenet45:1418-1427.
Caballero A, Tenesa A, & Keightley PD (2015) The nature of genetic variation for
complex traits revealed by gwas and regional heritability mapping analyses.
Genetics201(4):1601-1613.
Pritchard JK (2001) Are rare variants responsible for susceptibility to complex
diseases?AmJHumGenet69:124–137.
Eyre-Walker A (2010) Genetic architecture of a complex trait and its implications
forfitnessandgenome-wideassociationstudies. ProcNatlAcadSciUSA107:1752–
1756.
SpeedD&BaldingDJ(2015)Relatednessinthepost-genomicera:Isitstilluseful?
NatRevGenet16(1):33-44.
TurelliM(1984)Heritablegeneticvariationviamutation-selectionbalance:Lerch's
zetameetstheabdominalbristle.TheorPopulBiol25(2):138-193.
BartonNH&TurelliM(1989)Evolutionaryquantitativegenetics:Howlittledowe
know?AnnuRevGenet23:337–370.
Hill WG & Kirkpatrick M (2010) What animal breeding has taught us about
evolution.AnnuRevEcolEvolSyst41:1-19.
Johnson T & Barton N (2005) Theoretical models of selection and mutation on
quantitativetraits.PhilosTransRSocLondBBiolSci360(1459):1411-1425.
LandeR(1976)Naturalselectionandrandomgeneticdriftinphenotypicevolution.
Evolution30(2):314-334.
Byars SG, Ewbank D, Govindaraju DR, & Stearns SC (2010) Natural selection in a
contemporaryhumanpopulation.ProcNatlAcadSciUSA107(suppl1):1787-1792.
StulpG,PolletTV,VerhulstS,&BuunkAP(2012)Acurvilineareffectofheighton
reproductivesuccessinhumanmales.BehavEcolSociobiol66(3):375-384.
Frederick DA & Haselton MG (2007) Why is muscularity sexy? Tests of the fitness
indicatorhypothesis.PersonalityandSocialPsychologyBulletin33(8):1167-1183.
EndlerJA(1986)Naturalselectioninthewild(PrincetonUniversityPress).
Kingsolver JG, et al. (2001) The strength of phenotypic selection in natural
populations.AmNat157(3):245–261.
Charlesworth B, Lande R, & Slatkin M (1982) A neo-darwinian commentary on
macroevolution.Evolution36(3):474-498.
Barton NH (1990) Pleiotropic models of quantitative variation. Genetics
124(3):773-782.
Kondrashov AS & Turelli M (1992) Deleterious mutations, apparent stabilizing
selectionandthemaintenanceofquantitativevariation.Genetics132(2):603-618.
Wagner GP (1996) Apparent stabilizing selection and the maintenance of neutral
geneticvariation.Genetics143(1):617-619.
PickrellJK,etal.(2016)Detectionandinterpretationofsharedgeneticinfluenceson
42humantraits.NatGenet48(7):709-717.
Sivakumaran S,etal. (2011) Abundant pleiotropy in human complex diseases and
traits.AmJHumGenet89(5):607-618.
Solovieff N, Cotsapas C, Lee PH, Purcell SM, & Smoller JW (2013) Pleiotropy in
complextraits:Challengesandstrategies.NatRevGenet14(7):483-495.
20
40.
41.
42.
43.
44.
45.
46.
47.
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
CotsapasC,etal.(2011)Pervasivesharingofgeneticeffectsinautoimmunedisease.
PLoSGenet7(8):e1002254.
Andreassen OA, et al. (2013) Improved detection of common variants associated
withschizophreniaandbipolardisorderusingpleiotropy-informedconditionalfalse
discoveryrate.PLoSGenet9(4):e1003455.
Bulik-SullivanB,etal.(2015)Anatlasofgeneticcorrelationsacrosshumandiseases
andtraits.NatGenet47(11):1236-1241.
Lee SH, Yang J, Goddard ME, Visscher PM, & Wray NR (2012) Estimation of
pleiotropy between complex diseases using single-nucleotide polymorphismderived genomic relationships and restricted maximum likelihood. Bioinformatics
28(19):2540-2542.
Cross-Disorder Group of the Psychiatric Genomics Consortium (2013) Genetic
relationshipbetweenfivepsychiatricdisordersestimatedfromgenome-widesnps.
NatGenet45(9):984-994.
Perry JRB, et al. (2014) Parent-of-origin-specific allelic associations among 106
genomiclociforageatmenarche.Nature514(7520):92-97.
ScottRA,etal.(2012)Large-scaleassociationanalysesidentifynewlociinfluencing
glycemic traits and provide insight into the underlying biological pathways. Nat
Genet44(9):991-1005.
Shi H, Kichaev G, & Pasaniuc B (2016) Contrasting the genetic architecture of 30
complextraitsfromsummaryassociationdata.AmJHumGenet99(1):139-153.
YangJ,etal.(2010)Commonsnpsexplainalargeproportionoftheheritabilityfor
humanheight.NatGenet42(7):565-569.
TeslovichTM,etal.(2010)Biological,clinicalandpopulationrelevanceof95locifor
bloodlipids.Nature466(7307):707-713.
Charlesworth B (1979) Evidence against fisher's theory of dominance. Nature
278(5707):848-849.
Crow JF (2010) On epistasis: Why it is unimportant in polygenic directional
selection.PhilosTransRSocLondBBiolSci365(1544):1241-1244.
Clayton DG (2009) Prediction and interaction in complex disease genetics:
Experienceintype1diabetes.PLoSGenet5(7):e1000540.
Hill WG, Goddard ME, & Visscher PM (2008) Data and theory point to mainly
additivegeneticvarianceforcomplextraits.PLoSGenet4(2):e1000008.
Ávila V, et al. (2014) The action of stabilizing selection, mutation, and drift on
epistaticquantitativetraits.Evolution68(7):1974-1987.
WrightS(1929)Theevolutionofdominance.AmNat63(689):556-561.
WeiW-H,HemaniG,&HaleyCS(2014)Detectingepistasisinhumancomplextraits.
NatRevGenet15(11):722-733.
Wood AR, et al. (2014) Another explanation for apparent epistasis. Nature
514(7520):E3-E5.
Evans DM, et al. (2011) Interaction between erap1 and hla-b27 in ankylosing
spondylitis implicates peptide handling in the mechanism for hla-b27 in disease
susceptibility.NatGenet43(8):761-767.
The International Multiple Sclerosis Genetics Consortium (2015) Class ii hla
interactions modulate genetic risk for multiple sclerosis. Nat Genet 47(10):11071113.
21
60.
61.
62.
63.
64.
65.
66.
67.
68.
69.
70.
71.
72.
73.
74.
75.
76.
77.
78.
79.
80.
Barton N & Turelli M (1987) Adaptive landscapes, genetic distance and the
evolutionofquantitativecharacters.GenetRes49(02):157-173.
BulmerMG(1972)Thegeneticvariabilityofpolygeniccharactersunderoptimizing
selection,mutationanddrift.GenetRes19(01):17-25.
Keightley PD & Hill WG (1988) Quantitative genetic variability maintained by
mutation-stabilizingselectionbalanceinfinitepopulations.GenetRes52(01):33-43.
Robertson A (1956) The effect of selection against extreme deviants based on
deviationoronhomozygosis.JGenet54(2):236–248.
Wright S (1935) The analysis of variance and the correlations between relatives
withrespecttodeviationsfromanoptimum.JGenet30(2):243-256.
Keightley PD & Hill WG (1990) Variation maintained in quantitative traits with
mutation-selection balance: Pleiotropic side-effects on fitness traits. Proc R Soc
LondBBiolSci242(1304):95-100.
Lande R (1975) The maintenance of geneticvariability by mutation in a polygenic
characterwithlinkedloci.GenetRes26(3):221-235.
Ewens WJ (2004) Mathematical population genetics 1: I. Theoretical introduction
(SpringerScience&BusinessMedia).
Martin G & Lenormand T (2006) A general multivariate extension of fisher's
geometrical model and the distribution of mutation fitness effects across species.
Evolution60(5):893-907.
Sham PC & Purcell SM (2014) Statistical power and significance testing in largescalegeneticstudies.NatRevGenet15(5):335-346.
DicksonSP,WangK,KrantzI,HakonarsonH,&GoldsteinDB(2010)Rarevariants
createsyntheticgenome-wideassociations.PLoSBiol8(1):e1000294.
CirulliET&GoldsteinDB(2010)Uncoveringtherolesofrarevariantsincommon
diseasethroughwhole-genomesequencing.NatRevGenet11(6):415-425.
Li Y, Willer C, Sanna S, & Abecasis G (2009) Genotype imputation. Annu Rev
GenomicsHumGenet10:387-406.
PritchardJK&PrzeworskiM(2001)Linkagedisequilibriuminhumans:Modelsand
data.AmJHumGenet69(1):1-14.
Evangelou E & Ioannidis JPA (2013) Meta-analysis methods for genome-wide
associationstudiesandbeyond.NatRevGenet14(6):379-389.
The1000GenomesProjectConsortium(2015)Aglobalreferenceforhumangenetic
variation.Nature526(7571):68-74.
TennessenJA,etal.(2012)Evolutionandfunctionalimpactofrarecodingvariation
fromdeepsequencingofhumanexomes.Science337(6090):64-69.
Keinan A & Clark AG (2012) Recent explosive human population growth has
resultedinanexcessofraregeneticvariants.Science336(6082):740-743.
Simons YB, Turchin MC, Pritchard JK, & Sella G (2014) The deleterious mutation
loadisinsensitivetorecentpopulationhistory.NatGenet46(3):220-224.
LohP-R,etal.(2015)Contrastinggeneticarchitecturesofschizophreniaandother
complexdiseasesusingfastvariance-componentsanalysis. NatGenet47(12):13851392.
GazaveE,ChangD,ClarkAG,&KeinanA(2013)Populationgrowthinflatestheperindividualnumberofdeleteriousmutationsandreducestheirmeaneffect. Genetics
195(3):969-978.
22
81.
82.
Uricchio LH, Torres R, Witte JS, & Hernandez RD (2015) Population genetic
simulations of complex phenotypes with implications for rare variant association
tests.GenetEpidemiol39(1):35-44.
LohmuellerKE(2014)Theimpactofpopulationdemographyandselectiononthe
geneticarchitectureofcomplextraits.PLoSGenet10(5):e1004379.
23
Amodelforthegeneticarchitectureofquantitativetraitsunderstabilizing
selection
SupplementaryInformation
Simons,Y.B.,Bullaughey,K.,Hudson,R.R.andG.Sella
TableofContents
1. Themodel...................................................................................................................................................2
1.1. Absorbingtheenvironmentalcontributionintothefitnessfunction..........................2
1.2. Thedistributionofmutationaleffectsizesonagiventrait..............................................2
2. Solvingforsummariesofgeneticarchitecture..........................................................................4
2.1. Thefirsttwomomentsofchangeinallelefrequency.........................................................5
2.2. Sojourntime..........................................................................................................................................6
2.3. Calculatingexpectationsofsummariesofarchitecture.....................................................6
3. Additivegeneticvarianceandnumberofsegregatingsites................................................8
3.1. Expectations..........................................................................................................................................8
3.2. Densities...............................................................................................................................................10
3.3. Comparingpredictionsagainstsimulations.........................................................................14
4. Justificationforassumptions..........................................................................................................15
4.1. Normalandisotropicphenotypicdistributionaroundtheoptimum.......................16
4.2. Thephenotypicvariancesatisfies𝜎 " ≪ 𝑤 " .........................................................................16
4.3. Mutationaleffectsizessatisfy𝑎 " ≪ 𝑤 " ................................................................................17
4.4. Deviationsofthemeanphenotypefromtheoptimumcanbeneglected................17
5. Modelrobustness.................................................................................................................................19
5.1. Changestotheoptimalphenotype...........................................................................................19
5.2. Asymmetricmutationalinput.....................................................................................................22
5.3. Majoreffectloci.................................................................................................................................27
5.4. Anisotropicmutation......................................................................................................................28
6. ThepowertodetectlociinGWAS................................................................................................33
6.1. Re-sequencingstudies...................................................................................................................34
6.2. Genotyping..........................................................................................................................................36
7. Inference..................................................................................................................................................38
7.1. Thecompositelikelihood.............................................................................................................38
7.2. Determining𝑣 * andremovingoutliers...................................................................................40
7.3. Estimatingtargetsizeandexplainedvariance....................................................................41
7.4. Estimatingconfidenceintervals................................................................................................42
7.5. Testinggoodnessoffit...................................................................................................................44
8. Glossaryofnotation............................................................................................................................46
9. Bibliography...........................................................................................................................................47
10. Additionalfigures..............................................................................................................................49
1
1.
Themodel
1.1.
Absorbingtheenvironmentalcontributionintothefitnessfunction
Here,weshowthattheadditiveenvironmentalcontributiontothephenotypecan
beabsorbedintothefitnessfunction,whichjustifiesourconsideringonlythe
additivegeneticcontributioninouranalysis.Thisresulthasbeenderivedmultiple
timesfortheonedimensionalcase(e.g.,(1)).Theargumentinthemultidimensionalcaseissimilarandincludedforcompleteness.
First,assumethattheadditiveenvironmentalcontributiontothephenotype,𝑟* ,is
distributedasamulti-normalwithmean0andisotropicvariance𝑉* .Theexpected
absolutefitnessofanindividualwithadditivegeneticcontributiontothe
phenotype,𝑟- ,isgivenbyaveragingfitnessoverthedistributionofenvironmental
contributions.Namely,
W 𝑟- =
=
0
𝑒
63 "123 4 5
0
𝑒
0?23 /A 5 4/5
8
9 5
8 3
5:3
W 𝑟- + 𝑟* =
5
9<
5(>5 =:3 )
.
0
𝑒
63 "123 4 5
9 5
8 3
5:3
𝑒
8
9< =93
5>5
5
(S1)
Giventhatabsolutefitnessisdefineduptoamultiplicativeconstant,wecan
thereforeabsorbtheadditiveenvironmentalcontributionbyusingtheGaussian
fitnessfunction
W 𝑟- = exp −
6<
5
"A 5
,
(S2)
where𝑤 " = 𝑤 " + 𝑉* .Evenwhentheenvironmentalcontributionisanisotropic,we
canalwayschooseacoordinatesysteminwhichtheeffectivefitnessfunctiontakes
anisotropicformaroundthefitnesspeak(Eq.1).
1.2.
Thedistributionofmutationaleffectsizesonagiventrait
Inthemaintext,wedefinethedistributionofphenotypiceffectsofnewlyarising
mutationsinthen-dimensionaltraitspace,𝑎.Here,weconsidertheprojectionof
2
theseeffectsonagiventrait,𝑎0 ,takenwithoutlossofgeneralitytobeonthe1st
dimension.Thedistributionofeffectsizesonafocaltraitwilldependonthedegree
ofpleiotropy,n,andtheformofthisdependencybecomesimportantwhenwe
considerhowpleiotropyaffectsgeneticarchitecture.
Wewanttocalculatethedistributionofeffectsizesonthefocaltrait,𝑎0 ,conditional
ontheiroveralleffect,𝑎 = 𝑎 .Weassumethatthedistributionofeffectsofdenovo
mutationsisisotropicintraitspace.Theeffectofamutation,𝑎,thereforehasequal
probabilitytooccupyanypointonann-dimensionalspherewithradius𝑎.LetSI 𝑥 denotethesurfaceareaofanm-dimensionalsphereofradius𝑥and𝜃denotethe
anglebetweenthevector𝑎anditsprojection𝑎0 ,i.e.,𝑎0 = 𝑎 cos 𝜃.Intheseterms,
thesurfaceareaelementcorrespondingtoangle𝜃is
SO80 𝑎 sin 𝜃 𝑎𝑑𝜃,
(S3)
andbya changeofvariables,thesurfaceareaelementcorrespondingtoprojection
𝑎0 onthefocaltraitis
since𝑑𝑎0 =
UST
UV
S
𝑎" − 𝑎0"
SO80 𝑎 sin 𝜃 𝑎𝑑𝜃 = SO80
𝑑𝜃 = 𝑎 sin 𝜃 𝑑𝜃 =
S 5 8ST5
𝑑𝑎0 , (S4)
𝑎" − 𝑎0" 𝑑𝜃.Thisresultimpliesthatthe
probabilitydensityof𝑎0 is
φO 𝑎0 𝑎 = S 5 8ST5 X4YT
X4 S
S
S 5 8ST5
Z
=
[Z
4
5
4YT
5
1−
ST5
S5
4Y]
5
0
S
(S5)
(forasimilarderivation,see(2)).
Next,weconsiderthehighpleiotropylimitformofthisdistribution.Foranydegree
ofpleiotropy,thesymmetryofthemutationaldistributionimpliesthat
E 𝑎0 𝑎 = 0
(S6)
andtheequivalenceamongtraitsimpliesthat
3
V 𝑎0 𝑎 = 𝑎" 𝑛
(S7)
(cf.maintextformoredetails).Itfollowsthatwhennbecomessufficientlylarge,
𝑎0 𝑎 ≪ 1andtherefore
1−
Inaddition,Γ
ST5
4Y]
5
S5
O
"
/Γ
O ST5
" S5
≈ exp − O80
"
≈
.
(S8)
𝑛/2.SubstitutingtheseexpressionsintoEq.S5,we
findthatforsufficientlylargenthedistributionofeffectsizesapproachesthe
normaldistribution
φO 𝑎0 𝑎 ≈
0
"1
S5
0
O
exp − ST5
" S5 O
.
(S9)
Asweelaborateinthemaintext,importantimplicationsaboutquantitativegenetic
variationfollowfromthishighpleiotropylimit.Thelimitalsoholdsquitegenerally
whenthedistributionofeffectsizesisanisotropic(seeSectionS5.4).
2.
Solvingforsummariesofgeneticarchitecture
Here,wederiveclosedformsforsummariesofgeneticarchitectureunderour
model.Webeginbyderivingthefirsttwomomentsofchangeinallelefrequencyina
singlegeneration.Withthesemomentsathand,weusethediffusionapproximation
tocalculatethesojourntimeforallelesthatcontributetoquantitativegenetic
variation(3).Togetherwiththedistributionofeffectsizesderivedintheprevious
section,thesojourntimeallowsustoobtainclosedformsforsummariesofgenetic
architecture.Specifically,wecanobtainaclosedformforanysummarythatcanbe
describedasafunctionofallelefrequenciesandeffectsizesatsitescontributingto
quantitativegeneticvariation.Weusetheseexpressionstocalculatethesummaries
usedinthemaintext,forexampletheexpectedadditivegeneticvarianceandits
distributionacrosssites.
4
2.1.
Thefirsttwomomentsofchangeinallelefrequency
Weassumethat:
•
Thephenotypicdistributionatsteadystateiswellapproximatedbyan
isotropicmultivariatenormaldistributioncenteredattheoptimum,namely
bytheprobabilitydensityis
f 𝑟 =
•
0
"1e 5 4
𝑒𝑥𝑝 −
5
65
. "e 5
(S10)
Thatboth𝑎" and𝜎 " ≪ 𝑤 " .
TheseassumptionsarejustifiedinSectionS4.
Werelyontheseassumptionstocalculatethefirsttwomomentsofchangein
frequencyinasinglegenerationforanallelewithphenotypiceffect𝑎andfrequency
q.Thefitnessesofthethreegenotypesatthesitedependonitsdistributionof
geneticbackgrounds,i.e.,onthetotalphenotypiccontributionofsitesotherthanthe
focalone𝑅.FollowingEq.S10andassumingeveryallelecontributesonlyasmall
proportionofthegeneticvariance,thedistributionof𝑅iswellapproximatedby
f 𝑅 𝑎, 𝑞 =
5
0
exp
"1e 5 4/5
−
j?kS "e 5
.
(S11)
Theexpectedfitnessesofthethreegenotypesthenfollowfromintegratingover
backgrounds:
O
A
𝑊nn =
f 𝑅 𝑎, 𝑞 W 𝑅 =
j
𝑊n0 =
f 𝑅 𝑎, 𝑞 W 𝑅 + 𝑎 2 =
j
𝑊00 =
j
A 5 ?e 5
exp −
O
A
A 5 ?e 5
S5 k5
exp −
(S12)
(S13)
. (S14)
− 𝑞 ,
(S15)
,
" A 5 ?e 5
S 5 (k80 ")5
"(A 5 ?e 5 )
and
f 𝑅 𝑎, 𝑞 W 𝑅 + 𝑎 =
A
A 5 ?e 5
O
exp −
S 5 (k80)5
"(A 5 ?e 5 )
Thefirstmomentofchangeinallelefrequencyisthen
E Δ𝑞 = −𝑝𝑞
p qrr 8qrT ?k qrT 8qTT
q
≈−
S5
sA 5
𝑝𝑞
0
"
5
relyingonourassumptionsthat𝑎" and𝜎 " ≪ 𝑤 " .Thefunctionalformofthefirst
momentisequivalenttothatofthestandardviabilityselectionmodelwithunderdominanceandselectioncoefficient𝑠 =
𝑆=
vS 5
"A 5
.
S5
sA 5
orscaledselectioncoefficient
(S16)
, (S17)
Similarly,wefindthat
V Δ𝑞 ≈
pk
"v
whichisthestandardsecondmomentwithgeneticdrift.
2.2.
Sojourntime
Basedonthefirsttwomoments,wecanusethediffusionapproximationtocalculate
thesojourntimeasafunctionofallelefrequency,i.e.,thedensityofthetimethatan
allelespendsatagivenfrequency𝑞beforeitfixesorislost(3).Foramutantallele
withinitialfrequency1/2𝑁andscaledselectioncoefficientS,thesojourntimeis
v 1 y
τ 𝑞𝑆 =
z{|
y "
v 1 y
z{|
y "
5
* } TY5~ /•
€(08€)
5
* } TY5~ /•
€(08€)
f8 𝑆, 𝑞 f? 𝑆, 1 2𝑁
0 ≤ 𝑞 ≤ 1/2𝑁
f? 𝑆, 𝑞 f8 𝑆, 1 2𝑁
1 2𝑁 ≤ 𝑞 ≤ 1
whereerfistheerrorfunctionandf± 𝑆, 𝑦 ≡ erf
𝑆 2 ± erf
(S18)
𝑆 1 − 2𝑦 2 .
Thesojourntimetakessimplelimitingformswhenselectioniseffectivelyneutral
(𝑆 ≪ 1)orstrong(𝑆 ≫ 1).Intheeffectivelyneutralrange,itiswellapproximated
"
byτ 𝑞 𝑆 = ,andinthestronglyselectedrange,itiswellapproximatedby
k
"
τ 𝑞 𝑆 = exp −𝑆𝑞 .
k
2.3.
Calculatingexpectationsofsummariesofarchitecture
Manysummariesofinterestcanbeexpressedassumsoversegregatingsitesof
somefunction𝑐(𝑞, 𝑎1),where𝑞isthederivedallelefrequencyanda1istheeffect
sizeonthetrait.Forexample,theadditivegeneticvarianceinatraitisgivenbythe
6
sumof𝑣(𝑞, 𝑎0 ) = ½𝑎0" 𝑞(1 − 𝑞)oversites.Theexpectationoversuchsummaries
canbeexpressedas
E 𝐶 = 2𝑁𝑈
𝑞 𝑎1
𝑐(𝑞, 𝑎0 )𝜌(𝑞, 𝑎0 ),
(S19)
where𝐶isthesummerysummedoverallsites,2NUisthepopulationmutationrate
pergenerationand𝜌(𝑞, 𝑎0 )isthedensityofsiteswiththecorrespondingfrequency
andeffectsizeperunitmutationalinput.
Thedensity𝜌(𝑞, 𝑎0 )canbebrokendownintocontributionsfromsiteswithdifferent
selectioncoefficients,i.e.,
𝜌 𝑞, 𝑎0 =
𝑆
f(𝑆) τ 𝑞 𝑆 η 𝑎0 𝑆 , (S20)
wheref(𝑆)isthedistributionofselectioncoefficientsand𝜏 𝑞 𝑆 isthesojourntime
ofamutationwithselectioncoefficient𝑆(Eq.S18).Theprobabilitydensityη 𝑎0 𝑆 ofeffectsizesgivenselectioncoefficient𝑆followsfromEqs.S5andS16
η 𝑎0 𝑆 = φŽ 𝑎0 𝑎(𝑆) = φŽ 𝑎0
2𝑤 " 𝑁 𝑆 . (S21)
Thisallowsustobreakdownoursummariesintocontributionsfromsiteswith
differentselectioncoefficients
E 𝐶 = 2𝑁𝑈
𝑆
f(𝑆)E 𝐶 𝑆 (S22)
(S23)
with
E 𝐶𝑆 =
𝑞 𝑎1
𝑐(𝑞, 𝑎0 )τ 𝑞 𝑆 η 𝑎0 𝑆 .
WeuseEq.S23tostudyhowsummariesofarchitecturedependonthestrengthsof
selection,andhowthesesummarieswilldependondifferentdistributionsof
selectioncoefficients.Thisallowsustodrawgeneralimplicationsaboutgenetic
architecturedespiteourlimitedknowledgeaboutthisdistribution.
7
3.
Additivegeneticvarianceandnumberofsegregatingsites
Thedistributionsofadditivegeneticvarianceandofthenumberofsegregatingsites
arecriticaltounderstandinggeneticarchitectureandspecificallytointerpreting
resultsofGWAS.Herewederiveclosedformsforbothdistributionsaswellas
simpleapproximationsunderstrongandeffectivelyneutralselection.
3.1.
Expectations
Webeginbyconsideringtheexpectedcontributionofasitetoadditivegenetic
variance.Substitutingthecontributiontovariancefromasinglesite𝑣(𝑞, 𝑎0 ) =
0
"
𝑎0" 𝑞(1 − 𝑞)intoEq.S23,wefindthat
E 𝑉𝑆 =
1
=
=
k2
1
k2
𝑞 𝑎1
𝑣 𝑞, 𝑎0 τ 𝑞 𝑆 η 𝑎0 𝑆 𝑞 1−𝑞 τ 𝑞 𝑆
𝑞 1−𝑞 τ 𝑞 𝑆
ST
2𝑤2
𝑆
𝑛𝑁
𝑎21 η 𝑎1 𝑆 =
2𝑤2
1
𝑆𝑞
𝑛𝑁 k 2
1 − 𝑞 τ 𝑞 𝑆 (S24).
(S25)
Thetotaladditivegeneticvarianceis
𝜎 " = 2𝑁𝑈
y
f(𝑆)E 𝑉 𝑆 .
TheclosedformforE 𝑉 𝑆 (Eq.S24)wasintegratednumericallytoobtainFig.2ain
themaintext.Wecanusetheresultsof(4)toobtainananalyticapproximationfor
E 𝑉 𝑆 :
E 𝑉𝑆 =
2𝑤2
𝑛𝑁
𝜋𝑆
erfi
4
𝑆
4
𝑆
exp − 4 + O
1
2𝑁
,
(S26)
witherfibeingtheimaginaryerrorfunction(erfi 𝑥 = erf 𝑖𝑥 /𝑖).
Intheeffectivelyneutralandstrongselectionlimits,wecanusesimplelimiting
formsforthesojourntimetoderivesimpleformsfor𝐸 𝑉 𝑆 .Intheeffectively
neutrallimit,τ 𝑞 𝑆 ≈ 2/𝑞andso
E 𝑉𝑆
y≪0
≈
"A 5 y
Ov "
.
(S27)
8
Inthestrongselectionlimit,τ 𝑞 𝑆 ≈ 2 exp(−𝑆𝑞) /𝑞andso
E 𝑉𝑆
y≫0
"A 5
Theconstant
Ov
≈
"A 5
Ov
.
(S28)
,whichrecursinourderivations(e.g.,Eq.S24),thushasasimple
interpretation:itistheexpectedcontributionofstronglyselectedsitestoadditive
geneticvariance(perunitmutationalinput).Wethereforedenoteitby
𝑣” =
"A 5
Ov
(S29)
andhenceforthmeasurevarianceintheseunits.
Wenextconsiderhowtheexpectednumberofsegregatingsitesdependsonthe
strengthofselection.Thisexpectation(perunitmutationalinput)issimplythe
meansojourntimeofanewlyarisingmutation.Formally,itfollowsfrom
substitutingc 𝑞, 𝑎0 = 1intoEq.S23,i.e.,
E 𝐾𝑆 =
𝑞 𝑎1
τ 𝑞 𝑆 η 𝑎0 𝑆 =
𝑞
τ 𝑞𝑆
𝑎1
η 𝑎0 𝑆 =
𝑞
τ 𝑞 𝑆 .
(S30)
InFig.S1,wehavecalculatedthisintegralnumericallyfordifferentvaluesofSto
showthatthenumberofsegregatingsitesdependsonlyweaklyon𝑆.
segregating sites
Number of
30
FigureS1.Thenumberofsegregatingsites
20
perunitmutationalinput(orequivalently
theexpectedsojourntimeofanewly
10
0
0
arisingmutation),Eq.S30,isonlyweakly
10
20
30
40
dependentonthestrengthofselection.
Scaled selection coefficient Calculatedforapopulationsizeof20,000.
9
3.2.
Densities
Here,weconsiderhowtheadditivegeneticvarianceisdistributedamongsites.The
densityofsegregatingsiteswithagivencontributiontovariancevcanbecalculated
bysubstitutingDirac’sdeltafunctionδ 𝑣 − ½𝑎0" 𝑞 1 − 𝑞 intoEq.S23,yielding
𝜌 𝑣|𝑆 = E 𝛿 𝑣 − ½𝑎0" 𝑞 1 − 𝑞
=
𝑎1
=
τ q? 𝑣, 𝑎0 𝑆
𝑎1
"
šœ
0
"
𝑞 𝑎1
δ 𝑣 − ½𝑎0" 𝑞 1 − 𝑞 τ 𝑞 𝑆 η 𝑎0 𝑆 =
+ τ q8 𝑣, 𝑎0 𝑆
τ q? 𝑣, 𝑎0 𝑆 + τ q8 𝑣, 𝑎0 𝑆
whereq± 𝑣, 𝑎0 =
0
š›= œ,ST
𝑆 =
š›Y œ,ST
"
ST5 08•œ/ST5
η 𝑎0 𝑆 šœ
η 𝑎0 𝑆 ,
(S31)
1 ± 1 − 8𝑣/𝑎0" arethetwofrequenciesthatyield𝑣 =
𝑎0" 𝑞 1 − 𝑞 .ThisintegralcanbecalculatednumericallyforanySandη 𝑎0 𝑆 (e.g.,
fordifferentdegreesofpleiotropy).
AswediscussinthemaintextandinSectionS6,thetailsofthisdistributionare
especiallyimportantfortheinterpretationofGWAS.Notably,toafirst
approximation,thelocicapturedinagivenGWASwouldbethosewhose
contributiontoadditivevarianceexceedssomethresholdcontribution𝑣 ∗ .
Weareparticularlyinterestedintheproportionofadditivegeneticvariance
capturedinGWAS,whichweapproximateintermsoftheadditivevariancecarried
bysitesinthe𝑣 > 𝑣 ∗ tail.Theexpectedvarianceinsuchatailis
E 𝑉 𝑣∗, 𝑆 =
œ¡œ ∗
𝑣𝜌 𝑣 𝑆 ,
(S32)
(S33)
anditsproportionofthetotaladditivegeneticvarianceis
G 𝑣∗ 𝑆 ≡
£(2|œ * ,y)
£(2|y)
=
¥¦¥∗
œ¤(œ|y)
œ¤(œ|y)
¥
. Foradistributionofselectioncoefficients,theproportionofvariancebecomes
G 𝑣∗ =
}
§ œ ∗ |y £(2|y)|(y)
£(2|y)|(y)
}
=
y
G 𝑣 ∗ |𝑆
£(2|y)|(y)
}
£(2|y)|(y)
. (S34)
10
Forcaseswithoutpleiotropyorwithextensivepleiotropy,wecangreatlysimplify
theseexpressionsinthelimitsofeffectivelyneutralandstrongselection.We
illustratehowtheseapproximationscanbeobtainedfortheproportionofadditive
varianceinatail,G 𝑣 ∗ 𝑆 (TableS1).Whenselectioniseffectivelyneutral,then
τ 𝑞 𝑆 = 2/𝑞andthus
¥¦¥∗
G 𝑣 ∗ |𝑆 =
¥
=
=
"
y œ¡œ ∗ 𝑎1
"
y œ¡œ ∗ 𝑎1
"
= y
𝑣
𝑎1
›= œ,ST
"
𝑣
0?
"
𝑣>𝑣∗
"
08•œ/ST5
08•œ/ST5
+
œ¤(œ|y)
œ¤(œ|y)
"
"
›Y œ,ST
ST5 08•œ/ST5
"
+
η 𝑎0 𝑆 =
08
"
08•œ/ST5
𝑎1 > •œ ∗
η 𝑎0 𝑆 ST5
y
ST5
08•œ/ST5
η 𝑎0 𝑆 1 − 8𝑣 ∗ /𝑎0" η 𝑎0 𝑆 ,
(S35)
withvariancemeasuredinunitsof𝑣” andeffectsizemeasuredinunitsof 𝑣” .
Withoutpleiotropy(𝑛 ≈ 1),weknowthat𝑎0 = ± 𝑆andtheexpressionforthe
proportionofvariancesimplifiesto
G 𝑣 ∗ |𝑆 =
1 − 8𝑣 ∗ /𝑆, (S36)
where𝑆/8isthemaximalcontributiontovarianceforamutationwithselection
coefficient𝑆.Whenthereisahighdegreeofpleiotropy(𝑛 ≫ 1),𝑎0 isapproximately
normallydistributedwithmean0andvariance𝑆andtheexpressionforthe
proportionofvariancesimplifiesto
G 𝑣 ∗ |𝑆 =
ST5
𝑎1 > •œ ∗ y
1 − 8𝑣 ∗ /𝑎0"
0
"1y
exp(−𝑎0" /2𝑆) = exp − 4𝑣∗ 𝑆 .(S37)
Whenselectionisstrong,derivedallelesarerare(𝑞 ≪ 1),implyingthat𝑣 ≪ 𝑎0" and
𝑞 ≈ 2𝑣/𝑎0" ,andthesojourntimeiswellapproximatedbyτ 𝑞 𝑆 ≈ 2 exp(−𝑆𝑞)/𝑞.
Theproportionofvariancethensimplifiesto
11
G 𝑣 ∗ |𝑆 ≈
≈
œ¡œ ∗ 𝑎1
𝑣>𝑣∗ ST
=
𝑣>𝑣∗ ST
=
ST5
𝑎1 y
𝑣
"
𝑣τ 𝑞 𝑣, 𝑎0 𝑆
𝑎21
exp
𝑣
−2𝑆𝑣/𝑎21
ST5
2
η
𝑎21
η 𝑎0 𝑆 𝑎1 𝑆 2 exp −2𝑆𝑣/𝑎21 η 𝑎1 𝑆 exp −2𝑆𝑣 ∗ /𝑎0" η 𝑎0 𝑆 .
(S38)
(S39)
Withoutpleiotropy,thisexpressionfurthersimplifiesto
G 𝑣 ∗ |𝑆 ≈ exp(−2𝑣∗ ),
andwhenthereisahighdegreeofpleiotropy(𝑛 ≫ 1),then
G 𝑣 ∗ |𝑆 ≈
ST5
𝑎1 y
exp −
"yœ ∗
ST5
0
"1y
exp −
= 1 + 2 𝑣∗ exp −2 𝑣∗ .
ST5
"y
(S40)
Selection
Effectivelyneutral(𝑆 ≪ 1)
Stronglyselected(𝑆 ≫ 1)
#oftraits
𝑛 ≈ 1
𝑛 ≫ 1
𝑛 ≈ 1
𝑛 ≫ 1
E 𝑉𝑆 𝑆/2
𝑆/2
1
1
exp −4𝑣 ∗ /𝑆 exp(−2𝑣 ∗ )
1 + 2 𝑣 ∗ exp(−2 𝑣 ∗ )
G 𝑣∗ 𝑆 1 − 8𝑣 ∗ /𝑆
TableS1.Simplelimitsfortheexpectedproportionofvariancearisingfromsites
withadditivecontributiongreaterthan𝑣 ∗ .
Anoteworthypropertyoftheproportionofvariancefromsiteswithcontributions
𝑣 > 𝑣 ∗ ,G 𝑣 ∗ |𝑆 ,isthatitalwaysappearstoincreasewiththedegreeofpleiotropy,n
(FigureS2).Wedonothaveaproofforthispropertybutcansuggestanintuitive
explanation.Withoutpleiotropy(n=1),theselectioncoefficientdeterminesthe
effectsize,suchthatanycontribution𝑣 ∗ togeneticvariancecorrespondstoa
specificminorallelefrequency𝑞∗ .Thesiteswithcontributions𝑣 > 𝑣 ∗ aretherefore
12
thosewithminorallelefrequencies𝑞 > 𝑞∗ .Pleiotropycausessiteswithagiven
selectioncoefficienttohaveadistributionofeffectsizesonthetraitunder
consideration.Asaresult,somesiteswithfrequenciesabove𝑞∗ endupwith
contributionstovariancebelow𝑣 ∗ whileothersexceed𝑣 ∗ .Tounderstandhowthis
affectsG 𝑣 ∗ |𝑆 ,recallthatforanyselectioncoefficient,thedensityofvariants
alwaysrapidlyincreasesas𝑞∗ decreases.Aslongasthecontribution𝑣 ∗ andthe
correspondingfrequencywithoutpleiotropy𝑞∗ arenotcloseto0,wemaytherefore
expectthatintroducingpleiotropywouldresultinpushingmoresitesabove𝑣 ∗ than
thosepushedbelow𝑣 ∗ ,resultinginanetincreasetotheproportionG 𝑣 ∗ |𝑆 .
S=1
0.6
0.4
0.8
0.6
0.4
0.5
1
v
1.5
*
0.0
2 xvs 0
S=100
1.0
n=1
n=2
n=3
n=10
n ∞
0.8
0.6
0.4
0.2
0.2
0.2
0.0
0
n=1
n=2
n=3
n=10
n ∞
GHv* L
0.8
S=10
1.0
n=1
n=2
n=3
n=10
n ∞
GHv* L
G(v*|S)
GHv* L
1.0
2
4
6
0.0
10 xvs 0
8
2
4
Thresholdcontributiontovariance(v*)
v
6
8
v*
*
10 xvs
FigureS2.Theeffectofpleiotropyontheproportionofthevarianceexplainedby
sitescontributingmorethen𝑣 ∗ tothevariance,𝐺(𝑣 ∗ |𝑆)(seeEq.S33).Forall
selectioncoefficients,theproportionofvarianceexplainedincreasesasthenumber
oftraits,i.e.thedegreeofpleiotropy,increases.
Lastly,weconsiderhowthenumberofsegregatingsitesdependsontheir
contributiontogeneticvariancevaryasafunctionofthiscontribution(Fig.S13).
ThenumberofsegregatingsitesKwith𝑣 > 𝑣 ∗ perunitmutationalinputis
E 𝐾 𝑣∗, 𝑆 =
œ¡œ ∗
𝜌 𝑣 𝑆 , (S41)
(S42)
andthereforethedistributionofvarianceatthesesitesis
f(𝑣) = 𝜌 𝑣 𝑆 /
𝑣>𝑣∗
𝜌 𝑣 𝑆 . Weusetheseresultsbelow,whenweestimatemodelparametersfromGWAS
results(SectionS7).Theyalsoallowustotranslatethepropertiesofthedensityof
13
geneticvarianceintopropertiesofthedensityofsites(Fig.S13).Notably,by
applyingthesamereasoningonecanshowthatthedensityofsitesisinsensitiveto
thedistributionofstrongselectioncoefficientsandpleiotropyalwaysincreasesthe
numberofsitesabove𝑣 ∗ (i.e.,forany𝑣 ∗ >0).
3.3.
Comparingpredictionsagainstsimulations
Wetestedourtheoreticalderivationsforthetotalgeneticvarianceandits
distributionamongsitesagainstcomputersimulations.Tothisend,weranforward
simulationofthefullmodel(thecodeisavailableathttps://github.com/sellalab)
witharangeofparametervalueschosentobalancebetweenbiologicallyplausibility
andmaintainingmanageablerunningtimes.Notably,weusedapopulationsizeof
𝑁 = 1000,withaburn-intimeof10,000generations.Wevariedthenumberof
traitstoincluden=1,3,and10,andvariedthemutationrateperhaploidgenome
pergenerationwithintherange1/2N ≤ 𝑈 ≤ 1.Selectioncoefficientswerechosen
fromanexponentialdistributionwithmeanE(S)=0.1,10,and50.Forsimplicity,we
took𝑤 " = 1,whichistantamounttochoosingtheunitsthatweusetomeasure
effectsizes.Forthesakeofcomputationalefficiency,thesimulationswere
performedwithfecundityselection,butweransimulationswithviabilityselection
forasubsetoftheparameterstoensurethatthischoicedoesnotaffectourresults
(notshown).
Ourtheoreticalderivationsforboththetotalgeneticvarianceanditsdistribution
amongsitesagreewellwithresultsfromsimulations(Fig.S3).Notably,wefound
thataslongas1 2𝑁 ≪ 𝜎 " /𝑤 " ≪ 1(cf.SectionS3.3),ourtheoreticalpredictionsfor
thetotalgeneticvariance(Eq.S25)areindistinguishablefromthesimulationresults
(Fig.S3A).Theoreticalandsimulationresultsseemtoagreeevenwhen𝜎 " /𝑤 " ≤
1/2N,althoughweconsiderthisrangetobeoflesserbiologicalrelevance(cf.
SectionS4.4).Wetestedourpredictionsforthedistributionofvarianceacrosssites
intermsofG 𝑣 ∗ (Eq.S34),i.e.,theproportionofthevariancearisingfromsitesthat
contributemorethan𝑣 ∗ (Fig.S3B),andalsofoundthemtobehighlyaccurate.
14
10-5
1
0.5
10
10-2
10-3
10-2
10-1
Mutation rate (U)
Mutationrate(U)
E(S)=0.1
100
10-4
10-3
1
10-2
10-1
100
Mutation
rate (U)
Mutationrate(U)
E(S)=10
0.5
0
2
Totalgeneticvariance(!
Total
genetic variance (<2))
10-3
E(S)=10
E(S)=10
0
E(S)=50
E(S)=50
10
0
0
0.05
0.1
0.15
0.2 Xvs
0
1
2
3
4
5 xvs
Threshold
contribution to variance (v*)
Threshold
contribution to variance (v*)
Thresholdcontributiontovariance(v*)
Thresholdcontributiontovariance(v*)
0
10-2
10
-4
10
1
G(v*)
G(v*)
E(S)=0.1
E(S)=0.1
-1
G(v*)
G(v*)
G(v*)
G(v*)
(b)
10
2
Total
genetic variance (<2)
)
Totalgeneticvariance(!
2
Total
genetic variance (<2))
Totalgeneticvariance(!
(a)
-3
-2
-1
10
10
Mutation rate (U)
Mutationrate(U)
E(S)=50
10
0
0.5
0
0
1
2
3
4
5 xvs
Threshold
contribution to variance (v*)
Thresholdcontributiontovariance(v*)
n=1 n=3 n=10Data Theory
FigureS3.Testingtheoreticalpredictionsforgeneticvarianceagainstsimulations.
(a)Totalgeneticvariance(inunitsof𝑤 " )asafunctionofthemutationrate.The
rangeofrelevantmutationrates,1/2𝑁 ≪ 𝑈 ≪ 1,andtheexpectedrangeforthe
validityofourderivation,𝑤 " /2𝑁 ≪ 𝜎 " ≪ 𝑤 " ,aremarkedbyagreybox.(b)The
distributionofvarianceamongsites,for𝑈 = 0.01;G(𝑣 ∗ )istheproportionof
variancefromsitescontributingmorethan𝑣 ∗ (Eq.S34).Errorbarsrepresentone
standarddeviation.Foreachsetofparameters,thenumberofsimulationswas
chosentoobtainstandarddeviationsbelow10%.Inpractice,weoftenreached
standarddeviations≪ 10%,whichiswhyerrorbarsaremostlytoosmalltobe
visible.
4.
Justificationforassumptions
Here,wejustifytheassumptionsthatwerelieduponinderivingthefirsttwo
momentsofchangeinallelefrequency(cf.SectionS2.1;modelingassumptionsare
motivatedintheintroductiontothemaintext).Werelyinpartonself-consistency
arguments,whichshouldnotbemistakenforbeingcircular:specifically,wemake
assumptionsaboutthebehaviorofthesystemandshowthatthesolutiontowhich
wearrivesatisfiestheseassumptions.
15
4.1.
Normalandisotropicphenotypicdistributionaroundtheoptimum
Theassumptionthatthephenotypicdistributioniswellapproximatedbyanormal
distributionstemsfromanadditivemodelofquantitativetraits.Byassumingthat
thephenotypearisesfrommanyadditivecontributionsandthattheseadditive
contributionsarisefromsomeunderlyingdistribution,normalityfollowsfromthe
lawoflargenumbers.Intermsofmodelparameters,wewouldexpectnormalityto
holdiftherateofmutationsaffectingthetraitissufficientlylarge,i.e.,when2𝑁𝑈 ≫
1.
Wefurtherassumethatthephenotypicdistributionisisotropicanditsmeanisat
theoptimum.Isotropyofthephenotypicdistributionfollowsfromassuming
isotropyinthemutationalinput.InSectionS5.4,weexploretheconsequencesof
anisotropyinthemutationalinput.InSectionS4.4,wefurthershowthatthe
fluctuationsofthemeanphenotypearoundtheoptimumovertimehavenegligible
effectsonallelicdynamics;asimilarargumentappliestofluctuationsinthe
variance.
4.2.
Thephenotypicvariancesatisfies𝝈𝟐 ≪ 𝒘𝟐 Withthemeanphenotypecenteredattheoptimum,requiringthat𝜎 " ≪ 𝑤 " is
equivalenttoassumingthatmovingastandarddeviationawayfromthemean
phenotypeentailsonlyaminorreductioninfitness.Thisseemsplausibleformany
phenotypes:if,forexample,thisassumptiondidnotholdforhumanheight,then
individualswhoseheightisastandarddeviationormoreawayfromthepopulation
meanwouldsufferfromasubstantialreductioninfitness.Arguably,deviationsfrom
themeanheightwouldthenberecognizedasaverycommonandseveredisease.
Anotherlineofargumentthatitislikelythat𝜎 " ≪ 𝑤 " isbasedonourresults.Ifwe
assumethatmutationsarestronglyselected,thenourresultssuggestthat
𝜎 " = 2𝑁𝑈 ∙ 𝑣” =
s¯
O
𝑤 " .
(S43)
16
Itfollowsthatiftherateofmutationsaffectingthephenotypeunderconsideration
satisfies𝑈 ≪ 1then𝜎 " ≪ 𝑤 " .Thenumberofmutationsperdiploidhumangenome
pergenerationisestimatedtobe~60(5),andlessthan10%ofthegenomeis
assumedtobefunctional(6),suggestingthatthenumberofdenovomutationswith
anyeffectonfunctionislessthan3perhaploidpergeneration.Itthenseems
plausiblethatthe(haploid)mutationrateaffectingaspecifictraitsatisfies𝑈 ≪ 1.
AssumingthatmutationsareweaklyselectedincreasesthevarianceinEq.S43only
moderatelyandassumingthemutationsareeffectivelyneutralwouldsuggestitis
muchsmaller,leavingtheaboveargumentintact.
4.3.
Mutationaleffectsizessatisfy𝒂𝟐 ≪ 𝒘𝟐 Aswearguedintheintroductionofthemaintext,variantsforwhichthestronger
condition𝑎" ≪ 𝜎 " holdsaccountformostoralloftheheritabilityexplainedin
GWASformanytraits(e.g.(7-10)).Moreover,evidenceformanytraitssuggeststhat
thesameistrueforthevariantsthatunderlietheheritabilitythatremainstobe
explained(11-14).Indeed,forthisassumptiontobeviolated,muchofthegenetic
variancewouldhavetoarisefrommutationsthathaveaverylargeimpactonfitness
(i.e.,withsontheorderof1).Whilethismaybethecaseforsomediseases(e.g.,
autism(15)),itdoesnotappeartobethecaseformostphenotypesthathavebeen
examined.
4.4.
Deviationsofthemeanphenotypefromtheoptimumcanbeneglected
Inreality,themeanphenotypeofthepopulationfluctuatesaroundtheoptimum.
Here,wederiveequationsforthedynamicofthemeanphenotypeinorderto
estimatethemagnitudeandtimescaleofthesefluctuations.Wethenshowthatthese
fluctuationshaveanegligibleeffectonthefirsttwomomentsofchangeinallele
frequencyandthusontheresultsthatfollowfromthesemoments.
Webeginbyderivingthefirstandsecondmomentofchangeinmeanphenotype.To
thisend,weassumethedistributionofphenotypesisamultivariatenormal
centeredaroundameanphenotype,𝑟,i.e.that
17
𝑓 𝑟 =
0
"1e 5 4 5
𝑒𝑥𝑝 −
686 5
"e 5
. (S44)
Theexpectedchangeinmeanphenotypeduetoselectioninonegenerationis
therefore
E Δ𝑟 =
9
| 6 ² 6 6
9
| 6 ² 6
−𝑟 =−
e5
A 5 ?e 5
𝑟≈−
e5
A5
𝑟. (S45)
Bythesametoken,thevarianceinΔ𝑟issimplythesamplingvariance
V Δ𝑟 ≈
e5
v
,
(S46)
whereinbothcaseswereliedontheassumptionthat𝜎 " ≪ 𝑤 " .
ThesetwomomentsdefineanOrnstein-Uhlenbeckprocessin𝑟,allowingustorely
onwell-knownresults(16).Notably,whenthemeanphenotype𝑟startsfarfromthe
optimum,itdecaysexponentiallytotheoptimumwithexponent𝜎 " /𝑤 " (seeSection
S5.1below).Atsteadystate,𝑟willfluctuatewithmeanzeroand𝐸 𝑟 " = 𝑛 𝑤 " 2𝑁
overatimescaleof𝑤 " 𝜎 " generations.Thetypicaldisplacementof𝑟inanygiven
directionwillbe 𝑤 " 2𝑁,reflectingabalancebetweendriftandthepullof
selectiontowardtheoptimum.
Nextweshowthatthesefluctuationsofthemeanhavenegligibleeffectsonallelic
trajectories.Tothisend,wederivethefirsttwomomentsofchangeinallele
frequency,butthistime,weincludetheeffectofthedisplacementof𝑟fromthe
optimum.Whilethesecondmomentremainsthesame,thefirstmomentbecomes
E Δ𝑞 ≈ −
6⋅S
"A 5
𝑝𝑞 −
S5
sA 5
𝑝𝑞
0
"
−𝑞 =−
6´
y
A 5 /"v "v
𝑝𝑞 −
y
"v
𝑝𝑞
0
"
− 𝑞 ,(S47)
where𝑟S is𝑟’scomponentinthedirectionof𝑎.However,ouranalysisestablishes
that
6´
A 5 /"v
isascalaroftheorderof1,whichfluctuatesaroundzeroonatimescale
of𝑤 " 𝜎 " .Wecanthereforecomparethefirsttermintheaboveequation,which
representsdirectionalselection,andthesecondterm,whichrepresentsstabilizing
selection.Whenstabilizingselectionisstrong,𝑆 ≫ 1,thestabilizingselectionterm
18
dominatesoverthedirectionalselectionterm.Incontrast,whenselectionisweak,
i.e.,𝑆 ≈ 1orsmaller,theninanygivengeneration,thedirectionaltermisnot
necessarilynegligible.However,inthiscase,bothtermsaffectsubstantialchangein
allelefrequencyonlyoveratimescaleof2𝑁generations;onthistimescale,if2𝑁 ≫
𝑤 " 𝜎 " ,thedirectionaleffectwouldaveragetozero.Thedirectionaltermwill
becomeimportantonlywhen2𝑁 ≤ 𝑤 " 𝜎 " ,thatis𝜎 " ≤ 𝑤 " 2𝑁.For𝜎 " tobethat
small,virtuallyallallelesmusthave𝑆 ≪ 1,suchthattheirtrajectorieswillbe
determinedbydrift,notselection.Insummary,regardlessoftheselectionactingon
anallele,fluctuationsofthemeanphenotypearoundtheoptimumwillhavea
negligibleeffectonitstrajectory.
5.
Modelrobustness
Inthissection,weconsiderthesensitivityofourresultstorelaxingsomeofthe
simplifyingmodelingassumptionsaboutselectionandmutation.Specifically,we
showourresultstoberobusttomoderatechangestotheoptimalphenotype;small
asymmetryinthemutationalinput;thepresenceofmajorlocimaintainedathigh
frequencybyselectionontraitsthatarenotincludedinthemodel;aswellasto
mostformsofanisotropicmutation.
5.1.
Changestotheoptimalphenotype
Wefirstconsiderhowchangestotheoptimalphenotypeovertimewouldaffectour
results.ItiseasytoimaginehoweventssuchasmigrationfromAfricatoEuropeor
theonsetofagriculturemayhaveintroducedrapidchangesinbeneficial
phenotypes.Inordertoevaluatethepotentialimpactofsuchevents,weconsider
howaninstantaneouschangetotheoptimalphenotypewouldaffecttheallelic
dynamics.
Webeginbyconsideringhowsuchaninstantaneouschangetotheoptimumwould
affectthemeanphenotype.Iftheshifttotheoptimumissmall,ontheorderofthe
fluctuationsinthemeanphenotypeatsteadystateorsmaller,thenthearguments
providedinSectionS4.4willstillholdandtheshiftwouldhaveanegligiblysmall
effectonourresults.Wethereforeassumethattheshiftinoptimum,𝑧,islarge
19
comparedtothescaleoffluctuations(𝑧 " ≫ 𝑤 " /2𝑁).Thisassumptionmeansthat
wecanuseadeterministicapproximation(basedonEq.S45)anddescribethe
changeinmeanphenotypeinasinglegenerationby
Δ𝑟 ≈ E Δ𝑟 = −
e5
𝑟−𝑧 A5
(S48)
(neglectinghighermoments).Furtherassumingthatthemeanphenotypewasatthe
optimum,0,beforetheoptimumshifted(attime𝑡 = 0)andneglectingchangesto
thegeneticvariance𝜎,wefindthat
𝑟 𝑡 = 𝑧 1 − exp −
e5
A5
𝑡
(S49).
Thus,themean𝑟adaptstothenewoptimumonatimescaleof𝑤 " /𝜎 " generations
(cf.(17)forasimilarderivation).
Wecanrelyonthisapproximationtolearnwhenashiftinoptimumwillhave
negligibleeffectsonalleletrajectories.RecallingEq.S47,thefirstmomentofchange
inallelefrequencyisgivenby
E Δ𝑞 ≈ −
{ · 8¸ ⋅S
"A 5
𝑝𝑞 −
S5
sA 5
𝑝𝑞
0
"
− 𝑞 ,
(S50)
where,basedonourapproximation(Eq.S49),thedirectionalselectionterm
introducedbytheshiftinoptimumtakesthetime-dependentform
E Δ¹ 𝑞 = −
{ · 8¸ ⋅S
"A 5
𝑝𝑞 ≈
¸⋅S
e5
"A
A5
exp −
5
𝑡 𝑝𝑞.
(S51)
Theeffectofthisdirectionaltermovertheentireadaptivetrajectorycanbe
quantifiedbycomparingtheexpectedallelefrequencyafteradaptationtotheshift,
𝑞¹ ,withinitialfrequencybeforetheshift,𝑞n ,i.e.,
ln 𝑞¹ 𝑞n =
·
£ »¼ k
k(·)
=
¸⋅S
"A 5
·
𝑝 𝑡 exp −
e5
A5
𝑡 <
¸⋅S
"A 5
·
exp −
e5
A5
𝑡 =
¸⋅S
"e 5
.(S52)
Thisresultsuggeststhattherelativechangeinallelefrequencywillbenegligibleso
longas
20
¸⋅S
"e 5
≪ 1.
(S53)
Thisconditionsuggeststhatmutationswithsmallereffectswouldbelessaffectedby
theshiftintheoptimum.Itfurthersuggeststhatallelesthatsatisfy𝑎" ≪ 𝜎 " ,as
appearstobethecaseformostlocidiscoveredinGWAS(e.g.,(7-10)),willbe
negligiblyaffectedbyshiftsinoptimumontheorderofthetotalgeneticvariation
(i.e.,𝑧 ≤ 𝜎).Theseanalyticpredictionsareconfirmedbysimulations(Fig.S4).
G(v*)
G(v*)
1
z E(a 2 )
= 0
2σ 2
0.5
1
2.5
0.5
0
0
5
10
15 xvs
Thresholdcontributiontovariance(v*)
Threshold
contibution to variance (v*))
FigureS4.Distributionofthecontributionsofsitestovarianceafterashiftinthe
optimum.They-axisistheproportionofthevarianceexplainedbysitesthat
contributemorethan𝑣 ∗ tothevariance.Thedashedblackisthetheoretical
predictionwithoutadaptationandincolorsimulationresultswithadaptation.
Whentherootmeansquareof
¸⋅S
"e 5
becomeslargerthan1,directionalselection
substantiallyaffectsallelefrequenciesandthereforethecontributionsofsitesto
variance,aspredictedbyEq.S53.(Sincemutationissymmetricthemeanof
zeroandwequantifyitscharacteristicvaluebyitsrootmeansquare
¸ ¾ S5
"e 5
¸⋅S
"e 5
is
.)
Simulationswererunwithanexponentialdistributionofselectioncoefficientswith
E(𝑆) = 25,𝑁 = 1,000,𝑛 = 1,𝑈 = 0.01,andaburn-intimeof10,000generations.
Resultsweretaken50generationsaftertheshiftinoptimum,which,forthese
parameters,isjustafterthepopulationmeanhasreachedthenewoptimum.
21
5.2.
Asymmetricmutationalinput
Inthissection,weconsiderthesensitivityofourresultstoasymmetriesinthe
mutationalinput,i.e.,tothecaseinwhichmutationsinagivendirectionintrait
space,𝑏,aremorelikelytoarisethanmutationsintheoppositedirection,−𝑏(see
(18)fortreatmentofthisprobleminthelimitofhighper-sitemutationrate).
Anasymmetricmutationalinputintroducesashiftinthemeanphenotypeevery
generation.Withnewmutationsarisingatfrequency1/2𝑁,theexpectedshiftis
ΔÁ 𝑟 = 2𝑁𝑈EÁ 𝑎 ∙ 1 2𝑁 = 𝑈EÁ 𝑎 , (S54)
whereEÁ istheexpectationovernewlyarisingmutations.Foreachtrait,effects
haveacharacteristicsize E 𝑎"
𝑛= 𝑣” E(𝑆).Thecharacteristiceffectsizesets
thescaleforthemaximalshiftinanydirection,thatis ΔÂ 𝑟 isoftheorderof
𝑈 E 𝑎"
𝑛orsmaller.Wethereforeparameterizetheshiftinmeanphenotypedue
tonewmutationsby
ΔÁ 𝑟 = 𝑈EÁ 𝑎 = 𝑈 E 𝑎2
𝑛 𝑏, (S55)
wherethevector𝑏parameterizesthestrengthanddirectionofthebiasand𝑏 = 𝑏 isassumedtobe<<1.
Atsteadystate,themutationalshiftmustbeoffsetbyselection,suchthat
ΔÁ 𝑟 + ΔÃ 𝑟 + ΔX 𝑟 = 0,
(S56)
whereΔÃ 𝑟andΔX 𝑟aretheexpectedshiftsduetodirectionalandstabilizing
selection,respectively.Wepreviouslyfoundthattheexpecteddirectionalshiftis
Δ¹ 𝑟 = −
e5
A5
𝑟,
(S57)
where𝑟denotesthemeanphenotype(cf.Eq.S45).Asweshownext,when
mutationsarestronglyselected,stabilizingselectionoffsetsthemutationalshiftto
maintainthemeanphenotypeattheoptimum,implyingthatdirectionalselectionis
negligible.Incontrast,whenmutationsareeffectivelyneutral,stabilizingselectionis
22
negligibleandadirectionaltermmightnotbenegligiblebycomparison.However,
aslongasasymmetryissmall,𝑏 ≪ 1,weshowthatthisdirectionaltermisnotlarge
enoughtochangethealleledynamics,bothwhenallmutationsareeffectively
neutralandwhensomemutationsarestronglyselected.
First,weconsidertheshiftinmeanphenotypeduetostabilizingselection.Thisshift
arisesbecause,withasymmetricmutationalinput,thedistributionofphenotypes
becomesskewed.Therefore,evenifthemeanphenotypeisattheoptimum,
individualswithagivenfitnessmayhaveanasymmetricdistributionofphenotypes
aroundtheoptimum,leadingstabilizingselectiontochangethemeanphenotype.
Wehavealreadyshown(Eq.S15)thattheexpectedchangeinallelefrequenciesper
generationduetostabilizingselectionatanygivensiteiis
𝐸 Δ𝑞Ä = −
SÅ5
sA 5
𝑝Ä 𝑞Ä
0
"
− 𝑞Ä .
(S58)
Theexpectedchangeinmeanphenotypecanthenbecalculatedbyaddingupthe
contributionsoversites
SÅ5
Ä 𝑎Ä sA 5 𝑝Ä 𝑞Ä
ΔX 𝑟 = −E
0
"
− 𝑞Ä
.
(S59)
Theright-handsideofthisequationreflectstheskewnessofthephenotypic
distribution.Indeed,inonedimension,itcanbeshownthat
Δy 𝑟 = −
Æ] 6
"A 5
,
(S60)
withµÈ (𝑟)beingthethirdcentralmomentofthephenotypicdistribution.Inndimensions,foreverydirection𝑥,
Δy 𝑟€ = −
withµÈ 𝑟
ÊÉË
0
"A 5
𝐸 𝑟−𝑟
=E 𝑟−𝑟
Ê
€
𝑟−𝑟
𝑟−𝑟
É
"
=−
𝑟−𝑟
0
"A 5
Ë
É µÈ
𝑟
€ÉÉ ,
(S61)
.
Whensitesareunderstrongselection,ΔX 𝑟takesasimpleform.Assumingthe
asymmetryissmall,theshiftduetostabilizingselectioncanbeexpandedinorders
of𝑏.Theleadingterminthefrequencydistributiontakesthesameformasitdoes
23
withoutthebias.Forstronglyselectedalleleswithnobias,𝑞 ≪ 1andthereforethe
frequencydependenceinthistermcanbeapproximatedby𝑝𝑞
0
"
0
− 𝑞 ≈ 𝑞.
"
Moreover,qscaleswith1/a2,implyingthatthedistributionof𝑎" 𝑞isindependentof
𝑎andthatE 𝑎 " 𝑞 = 4𝑤 " /𝑁(cf.SectionS3.1).Therefore,whenallsitesarestrongly
selected,theleadingtermintheshiftduetostabilizingselectionis
SÅ5 kÅ
Ä 𝑎Ä sA 5 " Δny 𝑟 = −E
= −ΔÂ 𝑟. =−
¾ S5 k
•A 5
E
Ä 𝑎Ä =−
0
"v
2𝑁𝑈EÂ 𝑎 = −𝑈 𝑣” E 𝑆 𝑏
(S62)
Thus,toafirstorderin𝑏,theshiftofthemeanphenotypeduetostabilizing
selectionoffsetsthemutationalshift,implyingthattherewillbenodirectionalterm
andthatthealleledynamicswillnotbeaffectedbyasymmetry.
Whenallelesareinsteadeffectivelyneutral,then𝑎" /4𝑤 " ≪ 1/2𝑁(cf.SectionS2.2)
andallelefrequenciesarewellapproximatedbytheneutralsojourntime,𝜏 𝑞 ≈
2/𝑞.Theshiftduetostabilizingselectionthensatisfies
Δy 𝑟 = −E
Ä 𝑎Ä
SÅ5
0
sA
"
𝑝𝑞
5 Ä Ä
SÅ5
Ä 𝑎Ä sA 5
0
− E
Ì
≪−
− 𝑞Ä
0
"v
E
≈ −E 𝑝𝑞
Ä 𝑎Ä 0
"
−𝑞
= −ΔÂ 𝑟,
E
Ä 𝑎Ä
SÅ5
=
sA 5
(S63)
implyingthatitmakesanegligiblecontributiontooffsettingthemutationalshift.In
thiscase,themutationaleffectonthemeanphenotypeisthereforeoffsetby
directionalselection,where
Δ¹ 𝑟 = −
e5
A5
𝑟 ≈ −ΔÂ 𝑟,
(S64)
indicatingadisplacementofthemeanphenotypefromtheoptimum
𝑟=
A5
e5
Δ𝑀 𝑟. (S65)
Thisdisplacementintroducesadirectionalselectiontermintothefirstmomentof
changeinallelefrequencythat,iflargeenough,couldalteralleledynamics(cf.
SectionS4.4).
However,whenallallelesareeffectivelyneutral,wehave
24
A5
𝑟=
A5
"A 5
1
Δ𝑀 𝑟 = "v¯œ ¾(y) " 𝑈 𝑣𝑠 E(𝑆)𝑏 = 2𝑁
e5
𝑣𝑠 E(𝑆)
Î
𝑏,
(S66)
andthereforethescaleddirectionalselectioncoefficient,foranallelewitheffectsize
𝑎andscaledstabilizingselectioncoefficient𝑆 =
2𝑁
6⋅S
"A 5
=
Ï´ 𝑎
œÎ £(y)
~
(Ï / 𝑛)𝑎
œÎ £(y)
=
Ï œÎ y
œÎ £(y)
vS 5
"A 5
y
=
£ y
,isoftheorderof
𝑏, (S67)
with𝑏S ~𝑏/ 𝑛beingtheprojectionof𝑏inthedirectionof𝑎.Since𝑏 ≪ 1,forall
allelesotherthanthosewithunusuallylargeselectioncoefficients,thescaled
directionalselectioncoefficientwillbemuchsmallerthan1andthetrajectorieswill
stillbedeterminedbydriftandnotselection.Eveninthiscase,therefore,wedonot
expectasymmetrytoaffectalleledynamics.
Next,weconsiderthecasewherethereisamixofeffectivelyneutralandstrongly
selectedmutations.Theexistenceofstronglyselectedmutationsinadditionto
effectivelyneutralonesreducesthedeviationofthemeanphenotypefromthe
optimum.Denotingtheproportionofstronglyselectedmutationsby𝑝” ,wehave
𝑟=
𝑤2
𝜎2
𝑈 1 − 𝑝”
𝑣” E 𝑆 *.O. 𝑏,
(S68)
whereE 𝑆 z.Ž. ≤ 1isthemeanscaledstabilizingselectioncoefficientforeffectively
neutralmutations.Since𝜎 " > 2𝑁𝑈𝑝” 𝑣” ,wecanthenobtainanupperboundtothe
magnitudeofscaleddirectionalselectioncoefficientforanallelewitheffectsize𝑎
andscaledstabilizingselectioncoefficient𝑆 =
2𝑁
<
0
1
" 𝑈𝑝𝑠 𝑣𝑠
6⋅S
"A 5
= 2𝑁
𝑈 1 − 𝑝”
0 1
" 𝜎2
𝑈 1 − 𝑝”
vS 5
"A 5
:
𝑣” E 𝑆 *.O. 𝑏 ⋅ 𝑎 𝑣” 𝐸 𝑆 *.O. 𝑏 ⋅ 𝑎~
1−pÎ
2𝑝𝑠
E 𝑆 *.O. 𝑆𝑏. Withasubstantialproportionofstronglyselectedsites,
08pÑ
"pÎ
(S69)
isoftheorderof1,and
08𝑝
therefore "p 𝑠 E 𝑆𝑒.𝑛. 𝑏 ≪ 1.Thisconditionimpliesthatforeffectivelyneutral
Î
alleles(i.e.,𝑆 ≤ 1),thescaleddirectionalselectioncoefficientis≪ 1andallele
trajectorieswillbedeterminedbygeneticdrift,whereasforstronglyselectedalleles
25
(i.e.,when𝑆 ≫ 1),thescaleddirectionalselectioncoefficientis≪ 𝑆andtherefore
negligiblecomparedtothescaledstabilizingselectioncoefficient.
Weaklyselectedalleles(with𝑆~10)behavelargelylikestronglyselectedalleles
exceptthatstabilizingselectiononthemonlypartiallycancelsoutthemutational
bias(forexample,for𝑆 = 10only85%ofthebiasiscanceled).Therestofthebiasis
canceledbydirectionalselectionandthereforeinducesasmallshiftinthemean
phenotype.Itisstraightforwardtorepeattheargumentsgivenaboveandshowthat
theshiftinthemeanphenotypeforatraitwithonlyweaklyselectedallelesora
mixturethatincludesweaklyselectedallelesnegligiblyaffectsalleletrajectories.
Thus,weconcludethatsmallasymmetryinmutationwillnotaffecttheallelic
dynamic(seeFig.S5).
*
G(v*)
% heritability
from loci v>v *
1
0.8
0.6
0.4
0.2
0
0.6
0.4
0.4
0.6
0.80.8
Asymmetry
Asymmetry(b)
Bias (b)
b=
b=
0
0.05
0.1
0.2
0.35
0.5
0.7
0.9
1
1
0.5
0.5
00 00
00 00 0.2
0.20.2
0.4
0.6 0.8
0.8
0.8 11
0.2 0.4
0.4
0.4
0.6
0.6
0.8
Asymmetry
Asymmetry
(b)
Bias
Bias (b)
Asymmetry(b)
(b)
11
0
0.02 0.04 0.06 0.08
0.1 xvs
Threshold contribution to variance (v* )
Thresholdcontributiontovariance(v*)
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
Meanphenotype
phenotype
Mean
0.5
0.5
0.5 0.5
0.5
0.5
b=
b=
b=
00
0.05
0.05
0.1
0.1
0.2
0.2
0.35
0.35
0.5
0.5
0.7
0.7
0.9
0.9
11
Meanphenotype
phenotype
Mean
11
vvss
11
1.5
1.5
11
0.5
0.5 0.5
0.5
00 0
00 0 000
0
0
0.2 0.2 0.4
0.4
0.6
0.8
0.4
0.8
0.6
0.6 0.8
0.8
0.8 111 11 1 00 00 0.2
0.20.2
000 0.20.2
0.2
0.2 0.4
0.4 0.6
0.6
0.8
0.2
Asymmetry
(b)
Bias
Asymmetry
Asymmetry
Asymmetry(b)
Bias
Bias (b)(b)
1
111
(b)0 0
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
b=
b=
b=
b=
000
0.05
0.05
0.05
0.1
0.1
0.1
0.2
0.2
0.2
0.35
0.35
0.35
0.5
0.5
0.5
0.7
0.7
0.7
0.9
0.9
0.9
xvs
00
000
0.02
0.04
0.06
0.08
5
00
0.02
1
0.04
2
0.06
3
0.08
4
0.1
5 xvs
000
11
22
33
44
50.1
**
* **))
Threshold
contribution
to
variance
(v
Thresholdcontributiontovariance(v*)
Threshold
contribution
to
variance
(v
Thresholdcontributiontovariance(v*)
Threshold
contribution
to
variance
(v
)
Threshold contribution to variance (v )
Threshold contribution to variance (v )
0.8
0.6
0.4
0.4
0.6
0.4
0.6
0.4
0.6
0.4
0.6
Asymmetry
(b)
Bias
Asymmetry
Bias (b)
0.8
0.8
0.80.
b=
0
0.
0.
0.
0.
0.
0.
0.
0.2
0
FigureS5.Theeffectofasymmetricmutationalinputonmeanphenotypeandthe
contributionofsitestovariance.(a)Themeanphenotype𝑟,inunitsof 𝑣” ,asa
functionofmutationalbias𝑏.(b)Proportionofgeneticvarianceasafunctionofthe
thresholdcontributiontovariance𝑣 ∗ ,i.e.,G(𝑣 ∗ ),fordifferentbiasstrengths.
Simulationswererunwith𝑁 = 1,000,𝑛 = 1andwithdifferentmixturesof
effectivelyneutral(distributedaround𝑆 = 0.1)andstrong(distributedaround𝑆 =
50)selectioncoefficients.Asymmetrywassimulatedbyhavingmoretraitincreasing
10% strong
100%
strong
22
1.5
1.5
1.5
1.5
1.5
11 1
111
1.5
1.5
XX
22 2.5
2.5
*
(b)
0 0.2 0.2
222
1.5
1.5
1.5
11
0.5
0.5
0.5
0
2.5
22 22.5
2.5
2.5
2.5
0%strong
strong
100%
strong
100%strong
10%
% heritability from loci v>v*
1
0
s
ss
Mean
Meanphenotype
phenotype
Mean phenotype
Mean
phenotype
Mean
phenotype
Meanphenotype
Mean phenotype
Mean phenotype
Mean phenotype
1.5
*
0
2.5
2.5
2.5
(a) XXX vvv
10%strong
10%
0%
strong
strong
22
1.5
1.5
1
0.5
vvs s
22 2.5
2.5
2
1.5
XX
*
%
from
G(v*)
%heritability
heritability
fromloci
lociv>v
v>v***
%
heritability
from
loci
v>v
2.5
2.5
(a)
0%strong
0%
strong
2.5
Meanphenotype
Mean phenotype
Mean phenotype
2
vs
*
G(v*)
%
from
% heritability
heritability
from loci
loci v>v
v>v *
X
Meanphenotype
Mean
Mean phenotype
phenotype
2.5
(a)
26
0
1
2
3
4
Threshold contribution to variance
thantraitdecreasingmutations;if𝛽istheproportionoftraitincreasingmutations
thentheasymmetrycoefficientis𝑏 = 2𝛽 − 1.Asexpected,onlyforlargebiases
(when𝑏~1),aretheresubstantialchangesinthedistributionofthecontributionof
sitestovariance.Simulationswererunwitha10,000generationsburn-inperiod
withoutasymmetryandthen10,000generationswithasymmetryandaveraged
overmanyruns(>300),withthenumberofrunsvariedacrossplotskeeperrorsin
(b)below1%.
5.3.
Majoreffectloci
Inthissection,weshowthatourresultsareinsensitivetothepresenceofmajorloci,
i.e.,individuallocithatcontributesubstantiallytoquantitativegeneticvariation.We
haveinmind,forexample,lociwhoseallelesaremaintainedathighfrequenciesby
balancingselectiononaMendeliantraitbuthavepleiotropiceffectsonthe
quantitativetraitsunderconsideration(e.g.,MHCloci;(19,20)).Whilesuchloci
violateourassumptions,weshowthattheydonotaffectthedynamicsatotherloci
thatfulfillthem.
Tothisend,wecalculatethefirsttwomomentsofchangeinallelefrequencyinthe
presenceofamajorlocus.Wedenotethefrequencyandeffectsizeofthefocalallele
by𝑞and𝑎,andthefrequencyandeffectsizeofthemajoralleleby𝑞Â and𝑎Â ,
respectively.Asinourpreviousderivations(SectionS2),thedistributionof
backgroundphenotypiccontributionfromallotherloci,𝑅,iswellapproximatedby
thenormaldistribution
f 𝑅 𝑎Â , 𝑞Â , 𝑎, 𝑞 =
0
5)
"1(e 5 8eÓ
4/5 exp −
j?kS?kÓ SÓ
5)
"(e 5 8eÓ
5
, (S70)
where𝜎Â" isthecontributiontogeneticvariancefromthemajorlocus.The
populationmeanremainsclosetotheoptimumbecauseanyshiftcausedbythe
majorlocusisquicklycompensatedforbytheotherloci(seeSectionS4.4).Wethen
averageoverboththisdistributionandthethreegenotypesatthemajorlocusto
calculatethemeanfitnessassociatedwitheachgenotypeatthefocallocus.Namely,
27
𝑊nn = 1 − 𝑞Â
+2𝑞Â 1 − 𝑞Â
"
+𝑞Â
j
"
j
f 𝑅 𝑎Â , 𝑞Â , 𝑎, 𝑞 W 𝑅 + 𝑎Â 2 j
f 𝑅 𝑎Â , 𝑞Â , 𝑎, 𝑞 W 𝑅 f 𝑅 𝑎Â , 𝑞Â , 𝑎, 𝑞 W 𝑅 + 𝑎Â ,
(S71)
andsimilarlyfortheothergenotypes.Inthisway,weobtainthefirstmomentofthe
changeinallelefrequency
E Δ𝑞 = −𝑝𝑞
p qrr 8qrT ?k qrT 8qTT
q
≈−
S5
sA 5
𝑝𝑞 𝑞 −
0
"
, (S72)
whichisthesameaswederivedintheabsenceofamajorlocus(Eq.S15).Similarly,
wefindthesecondmomenttobeunaffected.
5.4.
Anisotropicmutation
Inthissection,weconsiderhowrelaxingtheassumptionthatthedistributionof
newlyarisingmutationsisisotropicintraitspacewouldaffectourresults.Asnoted,
wecanalwayschooseanorthonormalcoordinatesystemcenteredattheoptimum,
inwhichthetraitunderconsiderationvariesalongthefirstcoordinateandaunit
changeinothertraits(i.e.,inothercoordinates)neartheoptimumhavethesame
effectonfitness.Thereis,however,noobviousreasonforthedistributionofnewly
arisingmutationstobeisotropicinthiscoordinatesystem(see(21)for
generalizationsofFisher’sGeometricModelalongsimilarlines).
Anisotropyinmutationdoesnotaffectthemomentsofchangeinallelefrequency,as
thesedependonlyontheselectiononanalleleorequivalentlyonitseffectsizebut
notonitsdirectionintraitspace.Anisotropycouldaffectthedistributionofallelic
effectsizesonthefocaltraitconditionalontheselectionactingonthem.Here,we
provideheuristicargumentssuggestingthat,barringextremecases,wecandefine
aneffectivenumberoftraits𝑛* andaneffectivestrengthofselectionwe2forwhich
therelationshipbetweenselectionandeffectsizeinanisotropicmodelsiswell
approximatedbytherelationshipfoundforisotropicones(SectionS1.2).
28
Wefocusonafamilyofanisotropicmutationaldistributionsthatcanbedescribed
asaprojectionofamultivariatenormaldistributionontheunitsphereintrait
space.Namely,wedrawthesizeofamutation𝑎 = 𝑎 fromsomedistributionand
toobtainitsdirection,wedrawavector𝛼fromamulti-variatenormaldistribution
MVN(0, 𝜮)andnormalizeit,i.e.,
×
𝑎 = 𝑎 ,
×
(S73)
(S74)
andtherefore
𝑎0 = 𝑎
×T
×
.
Thisfamilyofmutationaldistributionsgivesusamathematicallytractable
frameworkwithwhichtoexaminethebehaviorofourmodelwithanisotropy.
Withanisotropy,thebehaviorofourmodelgreatlydependsontherelative
contributionofthefocaltraittoselection,whichweparameterizeby
𝛾0 ≡
£ ×T5
£ ×5
=
𝜮TT
Ù{ 𝜮
.
(S75)
Whenselectionactsmainlyonourfocaltrait,i.e.when𝛾0 ≈ 1,then|𝛼0 | ≈ 𝛼and
therefore𝑎0 ≈ ±𝑎.Sucharelationshipbetweenthestrengthofselectionandeffect
sizeiswellapproximatedbyanisotropicmodelwith𝑛* = 1.Wethereforefocuson
casesinwhichthereisasignificantpleiotropiccontributiontoselection,i.e.,𝛾0 is
substantiallylessthan1.Anisotropythenhastwoeffects:thefirstistointroduce
heterogeneityinthestrengthofselectionondifferenttraitsandthesecondisto
introducecorrelationsintheeffectsofamutationondifferenttraits,notably
betweenthefocaltraitandothers.
Wefirstconsiderthecaseinwhichthestrengthofselectiondiffersamongtraits,but
traitsareuncorrelated,correspondingtoadiagonalcovariancematrix,𝜮.When
manytraitshaveanon-negligiblecontributiontoselection,𝛼 " = 𝛼0" + 𝛼"" + ⋯ 𝛼O" wouldhaveasmallcoefficientofvariation,i.e.,CÜ" 𝛼 " = V 𝛼 " /E 𝛼 "
"
≪ 1,
becauseofthelawoflargenumbers.Inthiscase,
29
𝑎0 = 𝑎
=
×T
×
=𝑎
S
0/ÝT
×T
¾ ×5
×T
£ ×T5
,
1 + O CÜ" 𝛼 "
≈𝑎
Since𝛼0 / E 𝛼0" ~N(0,1)and𝑆 =
×T
£ ×5
v
"A 5
(S76)
𝑎" thisimpliesthat,conditionalonthe
selectioncoefficient𝑆,theeffectsizeonthefocaltraitwillbedistributedas
𝑎0 ~N 0,
"A 5
O3 v
𝑆 (S77)
with𝑛* = 1/𝛾0 .Thisisthesamerelationshipbetweenselectionandeffectsizeas
thehighpleiotropyisotropicmodelwith𝑛 = 𝑛* (Eq.8&10ofmaintext).Thisresult
suggeststheconceptofaneffectivenumberoftraits,whichcanbethoughtofasthe
numberoftraitsthathavethesameeffectonfitnessasthefocaloneandare
requiredtoproducethesamestrengthofselectiononalleles.Theeffectivenumber
oftraitsdescribesthedistributionofeffectsizesbothinthelimitofhighpleiotropy
𝑛* ≫ 1andlowpleiotropy𝑛* ≈ 1andsimulationsshowthatitdescribesthe
distribution,atleastqualitatively,alsoforintermediatevaluesof𝑛* (Fig.S6).
However,thereisanextremescenarioinwhichaneffectivenumberoftraitscannot
describethedistributionofeffectsizes.ThishappenswhenCÜ" 𝛼 " ≥ 1,thatiswhen
selectionactsmainlyonasmallnumberoftraitsbutourfocaltraitcontributesvery
littletoselection(𝛾0 ≪ 1).Inthiscase,wemightbetemptedtouse𝑛* = 1/𝛾0 ≫ 1
but,asEq.S76suggests,thehighpleiotropylimitwouldbeinadequate.Infact,the
varianceinselectiononnewly-arisingmutations(duetothecontributionofthe
selectedtraits)willresultinalong-taileddistributionofeffectsizesonthefocal
trait,whichisnotwell-approximatedbyanyisotropicmodel.Insummary,except
fortheseextremecases,isotropicmodelsprovideagoodapproximationforthe
relationshipbetweenselectionandeffectsize,evenwhenthereisheterogeneityin
thestrengthofselectionondifferenttraits.
Toillustratetheeffectofheterogeneityinthestrengthofselectionamongtraits,we
considerasimpleexampleinwhichallnon-focaltraitsmakethesamecontribution
toselectionandthereforecanbemodeledby
30
1 0
"
𝜮= 0 𝑐
0 0
⋮
0
0
𝑐"
⋯
,
(S78)
⋱
with𝑐 " beingtheratiobetweentheexpectedcontributionofanon-focaltraitto
selectionandtheexpectedcontributionofthefocaltraittoselection.Itiseasyto
seethatinthismodel𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " .Numericalresultsforthismodel
areshowninFig.S6.
Largecontribution
offocaltrait
c2=0.4
CV2(!2)≈0.2
"1=0.2
Manytraits
n=11
Density
0.4
0.3
(b) 0.5
0.4
ne=5
Density
(a) 0.5
0.2
0
-4
-2
0
Effectsize
2
4×
0.2
(d) 0.5
CV2(!2)≈1
"1≈0.5
0.4
0.6
ne≈2
0.4
0.2
0
-4
0.3
0
-4
a2
n
Density
0.8
Density
Fewtraits
n=3
1
CV2(!2)≈0.2
"1≈0.02
ne≈50
0.1
0.1
(c)
Smallcontribution
offocaltrait
c2=5
-2
0
2
Effectsize
CV2(!2)≈1
"1≈0.1
4
×
a2
n
×
a2
n
ne≈10
0.3
0.2
0.1
-2
0
Effectsize
2
4
×
a2
n
0
-4
-2
0
2
Effectsize
4
Anisotropicmodel Effectiveisotropicmodel
FigureS6.Theeffectsofheterogeneityinthestrengthofselectionondifferenttraits
onthedistributionofeffectsizesinthefocaltrait.Numericalresultsformodelswith
thecorrelationmatrixdefinedinEq.S78areshowninblueandthecorresponding
isotropicmodelinblackdashes.Whentherearemanyselectedtraits,anisotropic
modelwith𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " providesagoodapproximationofthe
distributionofeffectsizes,bothwhenthefocaltraitcontributessubstantiallyto
selection(a)andwhenitdoesnot(b).Whentherearefewtraits,anisotropicmodel
31
with𝑛* = 1/𝛾0 = 1 + (𝑛 − 1)𝑐 " providesagoodapproximationonlywhenthefocal
traitcontributessubstantiallytoselection(c&d).
Next,weconsiderthecaseinwhichtheeffectsizesondifferenttraitsarecorrelated,
i.e.,whenthecovariancematrix𝜮hasoff-diagonalterms.𝑎0" ∝ 𝛼0" /𝛼 " andtherefore
weparameterizetheeffectofthesetermsusingthecorrelationbetween𝛼0" and𝛼 " ,
𝜌" ≡ corr α" , 𝛼0" .Ifthecorrelationissmall,𝜌" ≪ 1,thenourpreviousreasoning
holds.Intheotherextreme,whenallselectedtraitsarehighlycorrelatedwiththe
focaltrait,i.e.𝜌" ≈ 1,thentheproportionalcontributionofthefocaltraitto
×T5
selectionisconstant,
×5
≈
£(×T5 )
£(× 5 )
= 𝛾0 ,andtheeffectsizeis𝑎0 = ± 𝛾0 𝑎.Thismodel
isthereforeequivalenttoanisotropiconewith𝑛* = 1and𝑤*" = 𝛾0 𝑤 " ;thelatter
changecorrespondstoincreasingthestrengthofselectiononthefocaltraitto
accountforselectionontheothertraitswhicharehighlycorrelatedwiththefocal
trait.Intermediatecasesaremorecomplex:whileeffectsizesarestilloftheorderof
𝛾0 𝑎,theshapeofthedistributionofeffectsizesisintermediatebetweenthesingle
traitandhighpleiotropylimits.Isotropicmodelswithaneffectivenumberoftraits,
𝑛* < 1/𝛾0 ,andincreasedselection𝑤*" = 𝛾0 𝑛* 𝑤 " canqualitativelydescribethese
casesbutmaynotcompletelycapturethedistributionofeffectsizes.Thevalueof𝑛* wouldchangefrom1when𝜌" → 1to1/𝛾0 when𝜌" → 0.Notethatwithalarge
numberoftraits,verystrongcorrelationsamongmanyofthetraitswillbe
necessaryinordertocreatealargeenough𝜌" tohaveasignificanteffecton𝑛* (see
Fig.S7).
Toillustratetheeffectofcorrelationsamongtraits,weconsiderthefollowingsimple
example(FigureS7).Weassumethecorrelationmatrix𝜮takestheform
1
"
𝜮= 𝑟
𝑟"
𝑟"
1
𝑟"
⋮
𝑟"
𝑟"
1
⋯
,(S79)
⋱
meaningthatthatalltraitscontributeequallytothefitnessandeverypairoftraits
hasthesamecorrelationcoefficient𝑟 " .When𝑟 " = 0thisbecomesanisotropic
32
model.When𝑟 " = 1,effectsizesarealwaysidenticalforeverytrait;thus,thiscaseis
equivalenttohavingonlyonetraitwithselectionthatisincreased𝑛-fold.
Intermediatecasescanbeapproximatedbyfindinganeffectivenumberoftraits
𝑛* < 1/𝛾0 = 𝑛,suchthatanisotropicmodelwith𝑛* and𝑤*" = 𝛾0 𝑛* 𝑤 " = 𝑤 " 𝑛* /𝑛
qualitativelydescribesthedistributionofeffectsizes.Numericalresultsofthis
modelareshowninFig.S7.
(a) 0.5
0.4
ne=n=50
0.3
Density
Density
0.4
(b)0.5
Mediumcorrelations
r2=0.5,!2≈0.5
0.2
0.1
0.2
-2
0
Effectsize
2
4
×
a2
n
0
-4
ne=1
1
0.5
0.1
0
-4
Highcorrelations
r2=0.99,!2≈0.99
1.5
ne=5
0.3
(c) 2
Density
Lowcorrelations
r2=0.2,!2≈0.25
-2
Anisotropicmodel
0
2
Effectsize
4×
a2
n
0
-4
-2
0
Effectsize
2
4×
a2
n
Effectiveisotropicmodel
FigureS7.Effectsofcorrelationsamongtraitsonthedistributionofeffectsizes.
NumericalresultsforourmodelwiththecorrelationmatrixdefinedinEq.S79and
𝑛 = 50traitsareshowninblueandthecorrespondingisotropicmodelinblack
dashes.(a)Whencorrelationsarelow,theisotropicmodelapproximatesthe
distributionofeffectsizeswell.(b)Withlargecorrelations,weneedtousean
effectivenumberoftraits,inthisexample𝑛* = 5,andrescaleselection,inthiscase
to𝑤*" = 𝑤 " 𝑛* /𝑛 = 𝑤 " /10,inordertoapproximatethedistributionofeffectsizes.
(c)Whenthecorrelationsapproach1,thedistributionofeffectsizesbecomes
singularandapproachesthedistributionforanisotropicmodelwith𝑛* = 1and
𝑤*" = 𝑤 " /𝑛 = 𝑤 " /50.
6.
ThepowertodetectlociinGWAS
Inthissection,wesummarizetheresultsthatwerelyoninconnectingour
theoreticalresultswiththeobservationsinGWAS(seeDiscussioninmaintext).
TheseresultsprovideafirstapproximationtothepowertodetectlociinGWASin
re-sequencingandgenotypingstudies.Theyneglectsomepotentialcomplications,
whichliebeyondthescopeofthisstudy(e.g.,syntheticassociations(22,23)).
33
6.1.
Re-sequencingstudies
First,weconsiderhowthepowertodetectalocusinaGWASdependsonits
contributiontogeneticvariance.Tothisend,wefollowShamandPurcell(22)in
assumingasimplifiedmodelforaGWASinwhichlociaredetectedusingalinear
regressionofthephenotypeagainstthegenotypeatindividualloci,andthe
dependenceofphenotypeongenotypefollowsanadditivemodel.Theslopeofthe
regression(theregressioncoefficient),whichisalsoourestimateoftheeffectsize,
𝑎0 isthenapproximatelynormallydistributedas
𝑎0 ~N 𝑎0 ,
2ä
,
å€ 08€ /"
(S80)
where𝑎0 isthetrueeffectsizeandxistheminorallelefrequencyatthelocus
(which,duetothelargestudysizes,weassumeisestimatedwithouterror),𝑉æ isthe
totalphenotypicvariance,and𝑚thestudysize(whichinrealitymaybeaneffective
sizereflectingstudydesign,e.g.,whenthesamplewassplitintodiscoveryand
validationpanels)(22).
Underthenullhypothesis,theeffectsizeis0,meaningthat
𝑎0 Žèéé ~N 0,
2ä
å€ 08€ /"
(S81)
andthereforetheestimatedcontributiontovariancehasachi-squareddistribution
œêëìì
2ä /å
=
T 5
S
€
5 T êëìì
08€
2ä /å
~𝜒0" .
(S82)
Thepowertoidentifyalocusassignificantwithp-value𝑝∗ istheprobabilitythatthe
estimatedcontributionofthelocustovariance,𝑣,islargeenoughthat
Pr 𝑣Žèéé > 𝑣 < 𝑝∗ . (S83)
Thisconditioncanbetranslatedintoathresholdcontributiontovariance𝑣 ∗ for
whichlociwith𝑣 > 𝑣 ∗ areconsideredsignificant,i.e.Pr 𝑣Žèéé > 𝑣 ∗ = 𝑝∗ ,with𝑣 ∗ givenby
34
œ∗
2ä /å
= 2 erf 80 1 − 𝑝∗
"
, (S84)
anderfdenotingtheerrorfunction.Thepowertoidentifyalocusassignificant
wouldthenbePr 𝑣 > 𝑣 ∗ ,andthedistribution𝑣givenby
œ
2ä /å
T 5
S €
=5
08€
2ä /å
~χ0"
œ
2ä /å
,
(S85)
with𝜒0" denotinganon-centralchi-squareddistributionwithonedegreeoffreedom.
Therefore,powerisgivenby
œ
H 𝑣, 𝑝∗ = Pr 𝑣 > 𝑣 ∗ = h?
withh± 𝑦, 𝑝∗ =
0
"
1 ± erf
2ä /å
, 𝑝∗ + h8
œ
2ä /å
, 𝑝 ∗ , (S86)
𝑦/2 ∓ erf 80 1 − 𝑝∗ andthetwotermscorrespond
totheestimatedandtrueeffectsizeshavingthesameoroppositesign.
Theformofthepowerfunctioncarriesimportantimplications(Eq.S86andFig.S8).
Notably,itshowsthat(inthisapproximation)powerdependsonlyonthe
contributionofalocustovariance,andthiscontributionshouldbemeasured
relativeto,orinunitsof,VP/m.Thisscalemakesintuitivesense,becausethetotal
phenotypicvariancegeneratesthebackgroundnoisefordetectinganyindividual
locus,andthebackgroundnoisedecreasesisinverseproportionaltothestudysize.
Inparticular,thethresholdcontributiontovariancev*,asdefinedabove,is
proportionalto𝑉æ 𝑚andisalsothecontributiontovarianceatwhichpoweris
50%,i.e.,
H 𝑣 ∗ , 𝑝∗ = 1 2.
(S87)
Thepowerfunctioncanthenbeapproximatedbyastepfunction(seeFig.S8)
H 𝑣 ≈ Θ 𝑣 − 𝑣∗ =
1 𝑣 > 𝑣∗
.
0 𝑣 < 𝑣∗
(S88)
35
Thiswillbeagoodapproximationwhenthenumberoflocithatfallatintermediate
range(e.g.,withpowerbetween0.1and0.9)isnegligiblecomparedtothenumber
thatfallsoutsidethisrange.
Power
Power
1.0
0.8
Exact
0.6
0.4
Stepfunction
approximation
0.2
0.0
1
Contributiontovariance
Contribution
to variance HvL
(inunitsofVP/m)
3
10
30
100 300 1000
FigureS8.Thepowertodetectlociasafunctionoftheircontributiontogenetic
variance(giveninunitsof𝑉æ /𝑚).Shownaretheexactpowerfunction(Eq.S86)and
itsstepfunctionapproximation(Eq.S88)for𝑝 = 5 ⋅ 108• .
Furtherinsightscomefromconsideringthispowerfunctioninconjunctionwithour
theoreticalresults(SectionS3).Notably,ourresultssuggestthatthefirstlocitobe
detected,thosethatcontributethemosttovariance,areweaklyandstrongly
selected,andthattheircontributionstovarianceareonthescaleofvs.Wetherefore
expectGWAStobegintoidentifyloci(andaccountforgeneticvariance)whenthe
studysizeissuchthat𝑣 ∗ ∝ 𝑉æ 𝑚isontheorderof𝑣” ,i.e.,when𝑚~ 𝑉æ 𝑣” .We
wouldfurtherexpecttherateofincreaseinidentifyingnewloci(andaccountingfor
thevariance)tobesimilarfordifferenttraitswhenthevarianceismeasuredin
unitsof𝑣” .
6.2.
Genotyping
MostcurrentGWASrelyongenotypinginsteadofre-sequencing,resultinginan
additionallossofpower(24).Specifically,thesestudiesimputetheallelesatloci
36
thatarenotincludedinthegenotypingplatform(25),andtheimputationbecomes
imprecisewhentheimputedallelesarerare(Fig.S9).Ifcausallociwithrarealleles
areincludedinGWAS,thisimprecisionleadstoanunder-estimationoftheireffect
size,resultinginalossofpower(24).ForlociwithMAFxandeffectsizea,the
expectedestimateoftheeffectsizewouldbereducedbyafactorof𝑟(𝑥),where
𝑟 " (𝑥)isthemeancorrelationbetweentheimputedandrealalleles(26),andthe
distributionofestimatescanbeapproximatedby
𝑎~N 𝑟𝑎,
2ä
å€ 08€ /"
.
(S89)
Employingthereasoningoftheprevioussubsection,wecanthereforeapproximate
thepowertodetectalocusbyH 𝑟 " 𝑣, 𝑝∗ ,whereHisthepowerfunctiondefinedin
Eq.S86.
1
0.9
r2
0.8
0.7
Cutoff
frequency
0.6
0.5
0.2
Minor allele frequency H%L
0.5
1
2
5
10 20
50
FigureS9.TheprecisionofimputationdecreaseswithMAF.Specificallyweshowthe
meancorrelationbetweenimputedandrealgenotypesasfunctionofminor allele frequency,forastudyusinganIllumina1MSNParrayandthe1000genomesphase
IIIasanimputationpanel(basedonExtendedFig.9Ain(27)).Weapproximatethe
effectonpowerbyexcludinglociwithMAF<1%andassumingthatlociwith
greaterMAFsareimputedcorrectly.
Inpractice,GWASoftenincludeonlylociwithMAFaboveathreshold,whichis
chosentoensurepreciseimputation.Wethereforeapproximatetheeffectof
37
genotypingonpowerbyexcludinglocibelowathresholdMAFandassumethatloci
thatexceedthisthresholdareimputedcorrectly.
7.
Inference
Inthissection,wedescribehowweusedourmodeltomakeinferencesbasedon
GWASresultsforheightandbodymassindex(BMI).AswenoteintheDiscussion,
theseinferencesaremeantasanillustrationanddonotincorporatetheeffectsof
demographyandafewotherfactors(e.g.,genotypinganderrorsintheestimationof
effectsizes(22,24)),whichliebeyondthescopeofthisstudy.
7.1.
Thecompositelikelihood
Ourinferencesarebasedonacomposite-likelihoodapproach.Webeginby
describingthecomposite-likelihoodfunctionanditsmaximization,whentheloci
detectedbyGWASarestronglyselectedandcanbedescribedbythehigh-pleiotropy
limit.Inthiscase,wehaveshownthatthedistributionofvarianceamonglociis
insensitivetothedistributionofselectioncoefficients,dependsonasingle
parameter𝑣” ,andiswellapproximatedbytheprobabilitydensity
𝜌 𝑣 =
" zôõ 8" œ/œÎ œ
(S90)
(SectionS3.2).FurtherapproximatingthepowerinGWASasastepfunction(cf.
SectionS6),wefindthattheprobabilitydensityofsitesthatexceedathreshold𝑣 ∗ canbeapproximatedby
f 𝑣 𝑣” , 𝑣 ∗ =
whereI 𝑥 ≡
·¡€
¤ œ
¥¦¥∗
¤ œ
=
zôõ(8" œ/œÎ )
"œö(" œ ∗ /œÎ )
,
(S91)
exp(−𝑡)/𝑡(cf.Eq.S42).Wethereforeapproximatethelog-
composite-likelihoodof𝑣” giventhecontributionstovarianceoftheKlocidetected
inaGWAS, 𝑣Ä
ù
Äø0 ,by
LCL 𝑣” 𝑣Ä
=− 2
𝑣”
ù
Äø0
ù
∗
Äø0 , 𝑣
=
ù
Äø0 log
f 𝑣Ä 𝑣”
𝑣Ä − 𝐾log I 2 𝑣 ∗ 𝑣”
−
= ù
Äø0 log
𝑣Ä . (S92)
38
Itfollowsthatthecomposite-likelihoodismaximizedwhen
𝑣” + log I 2 𝑣 ∗ 𝑣”
𝑣” = argminœÎ 2 𝑣
where 𝑣 ≡
0
ù
Äø0
ù
,
(S93)
𝑣Ä .
Wealsoconsiderthemodelswithoutpleiotropyandinwhichthedegreeof
pleiotropyisaparameter.Inthecasewithoutpleiotropy,
𝜌 𝑣 =
" zôõ 8"œ/œÎ
œ
(S94)
(cf.SectionS3.2).Byfollowingthesamesteps,wefindthatthecomposite-likelihood
isthenmaximizedwhen
𝑣” = argminœÎ 2𝑣 𝑣” + log I 2 𝑣 ∗ 𝑣”
where𝑣 ≡
0
, (S95)
ù
Äø0 𝑣Ä .Whenthedegreeofpleiotropy𝑛isaparameterofthemodel,
ù
wefoundthat
𝜌O 𝑣 =
"
exp
ST œ
−
"œ/œÎ
ST5 /(S 5 /O)
φO 𝑎0 𝑎 (S96)
(cf.SectionS6).Again,followingthesamesteps,wefindthattheprobabilitydensity
ofsitesthatexceedathreshold𝑣 ∗ is
fŽ 𝑣 𝑣” =
¤4 œ
¤
¥¦¥∗ 4
œ
(S97)
andthelog-composite-likelihoodis
LCL 𝑣” , 𝑛 𝑣Ä
ù
∗
Äø0 , 𝑣
=
ù
Äø0 log
𝜌O 𝑣Ä
− 𝐾 log
œ¡œ ∗
𝜌O 𝑣 . (S98)
Inthelattercase,weusednumericalmaximizationtoshowthatthecompositelikelihoodestimatesforheightandBMIconvergetothehighpleiotropylimit.
Specifically,wemaximizedthecomposite-likelihoodspecifyinganintervalof
[1,1000]forn,whereforbothtraitstheestimatesconvergedtotheupperlimitof
1000.Whilenumericaloptimizationdoesnotallowustospecifyaninfiniteinterval,
thelikelihoodfunctionandmaximalvalueforn=1000areindistinguishablefrom
thoseinthehigh-pleiotropylimit.
39
7.2.
Determining𝒗∗ andremovingoutliers
Ourlikelihoodmaximizationrequiresustospecifythevalueofthethreshold𝑣 ∗ .We
choosethisthresholdbasedontheempiricaldistributionsofthecontributionsto
varianceamonggenome-widesignificantassociations(Fig.S10AandB).Specifically,
whenthecontributionstovarianceapproachthelowerboundaryfordiscovery,we
observeadeclineinthedensityofloci.Thisislikelyduetoagradualreductionin
powerandsuggeststhatourapproximationforpower(asastepfunction)breaks
downforthesevaluesofvariance.Wethereforechoosethresholdsthatappeartobe
abovethisdecline(𝑣 ∗ = 1.4 ⋅ 108s 𝑉æ forheightand𝑣 ∗ = 1.35 ⋅ 108s 𝑉æ forBMI;Fig.
S10AandB),resultingintheremovalof53lociforheightand11forBMI.Wealso
examinehowourestimatesof𝑣” dependonthechoiceof𝑣 ∗ ,andfindthattheyare
muchmoresensitivetoreducingthethresholdthantoincreasingit;infact,the
estimatesweobtainbyincreasingthethresholdarewithintheconfidenceintervals
oftheestimatewiththechosenthresholds(Fig.S10CandD).Thisanalysisfurther
supportsourchoicetoexcludethelociwiththelowestcontributiontovariance.For
BMI,wealsodroppedthelocuswiththelargestcontributiontovariance(nearFTO),
whichappearstobeanoutlier(Fig.S10B)andhasbeensuggestedtobeunder
balancingselection(28).
40
10
(a)
21
20
19
18
17
1
5
1.4
1.6
1.8
2
0
10
20
30
Contribution to variance (v)
4
3
2
2
1
Estimatedvs+95%CI
1
1.2
1.4
1.6
1.8
2
Cutoff
contribution
to
variance
(
)
Thresholdcontributiontovariance
3
(b)
2.5
2
0.8
1
1.2
1.4
1.6
1.8
Contribution to variance (v)
nearFTO
0
0
10
20
30
Contribution to variance (v)
(d)
di Estim
ffe
re ated
nt
th vs re fo
sh r
old
s
3
1
0
3
4
(c)
E
di stim
ffe
re ated
nt
th vs re for
sh
old s
Estimated
0
1.2
Contribution to variance (v)
% variance from loci > v
15
BMI
22
% variance from loci > v
% variance from loci > v
20
Estimated
Estimatedvs
(inunitsof10-4×VP)
% variance from loci > v
%variancefromloci>v
Height
2
Estimatedvs+95%CI
1
0
0.8
1
1.2
1.4
1.6
1.8
Cutoff
contribution
to
variance
(
)
Thresholdcontributiontovariance
(inunitsof10-4×VP)
(inunitsof10-4×VP)
FigureS10.Determining𝑣 ∗ andremovingoutliers.Thetotalvariancefrom
significantassociationsasafunctionofthethresholdcontributiontovariance,for
height(a)andBMI(b).Theinsetsshowacloseupofthelowerrangeof
contributionstovariance,highlightingthedeclineinthedensityofdiscoveredloci.
Ourchosenthresholdsareshownbythedashedverticalline(inallgraphs).Our
estimatesof𝑣” asafunctionofthechosenthreshold,forheight(c)andBMI(d).
Whenweincreasethethreshold,theestimatesremainwithinthe95%CIofthe
estimatewithourchosenthreshold.
7.3.
Estimatingtargetsizeandexplainedvariance
Weestimatethetargetsizeandthevarianceexplained,bothforvaryingstudysize
andtotal,basedonourestimatesof𝑣” .Thepopulation-scaledmutationalinputper
generationfromstronglyselectedloci,2𝑁𝑈” ,isestimatedby
2𝑁𝑈” = 𝐾/
œ¡œ ∗
𝜌 𝑣 𝑣” ,
(S99)
(cf.Eq.S41)andthecorrespondingestimateforthetargetsizeis
41
𝐿” = 2𝑁𝑈” 2𝑁𝑢,
(S100)
wheretheestimateforthepopulationscaledmutationratepersitepergeneration
2𝑁𝑢 ≈ 0.5isbasedonheterozygosity(27).Theexplainedvariancecorrespondingto
GWASwithstudysize𝑚isestimatedby
𝜎”" 𝑚 = 2𝑁𝑈”
œ¡œ ∗ å
𝑣𝜌 𝑣 𝑣” = 𝐾
œ¡œ ∗ å
𝑣𝜌 𝑣 𝑣”
œ¡œ ∗ å
𝜌 𝑣 𝑣” ,(S101)
whereweapproximatethethresholdcorrespondingtostudysize𝑚basedonthe
studysize,𝑚n ,andthreshold,𝑣 ∗ ,incurrentGWAS,by
𝑣 ∗ 𝑚 = 𝑣 ∗ ⋅ 𝑚n 𝑚 .
(S102)
Toestimatethetotalvariancearisingfromstronglyselectedloci,wesimplysetthe
thresholdinEq.(S101)to0.
7.4.
Estimatingconfidenceintervals
Weuseacombinationofnon-parametricandparametricbootstraptoestimate
confidenceintervals(CI).Weusenon-parametricbootstraptoestimatetheCIfor
themodelparameters𝑣” and𝐿” :specifically,weperform10,000iterations,inwhich
weresamplethelociidentifiedbyGWASandrepeattheestimationof𝑣” .Weuse
parametricbootstraptoestimatetheconfidenceintervalsinFig.5A,describingthe
explainedvarianceasafunctionofthresholdbasedonourmodel.Tothatend,we
relyonourmodelwiththepointestimatesfor𝑣” and𝐿y ,togenerate10,000samples
fromGWASwiththespecifiedthreshold,andthencalculatethetotalvariance
explainedbythesesamples.Weuseacombinationofnon-parametricand
parametricbootstraptocalculatetheCIformodelpredictions,includingthetotal
variance,𝜎y" ,andtheexplainedvariance,𝜎y" 𝑚 ,andnumberoflociasafunctionof
studysize(Figs.5BandS16).Inthiscase,wegenerate10,000samplesby:i)
estimating𝑣” basedonaresampledsetofGWASloci(similartothenon-parametric
procedure),andii)usingtheestimated𝑣” andcorresponding𝐿y togenerateaGWAS
samplebasedonourmodel(similartotheparametricprocedure);wethencalculate
theappropriatesummarybasedonthelattersamples.Thistwostageprocedureis
intendedtocapturetheuncertaintygeneratedbyboththeerrorsinestimatingour
42
basicmodelparametersandthenoisegeneratedbythestochasticprocesses
underlyingthenumberandvarianceatsegregatinglocithatareyettobe
discovered.TheresultingestimatesandCIaresummarizedinTableS2.
Contributionto
varianceperstrongly
selectedlocus(inunits
ofthetotalphenotypic
variance)
Expectedstudysize
requiredtodescribe
50%ofthestrongly
selectedvariance
𝑣” /𝑉æ 𝑚%n% ≈
43𝑉æ /𝑣” Height
BMI
1.8[1.5,2.3]´10-4
1[0.6,1.7]´10-4
230[190,290]K
420[250,770]K
Numberofnewly
arisingstrongly
selectedmutationsper
generationinthe
population
2𝑁𝑈” 2300[1800,
3000]
600[300,1900]
Mutationaltargetsize
forstronglyselected
mutations
𝐿” 4.6[3.6,6.0]Mbp
1.3[0.6,3.8]Mbp
%contributionto
phenotypicvariance
fromstronglyselected
loci
𝜎”" /𝑉æ 42[39,45]%
7[5,10]%
Proportionof
heritabilityfrom
stronglyselectedloci
𝜎”" /𝑉&
= 𝜎”" /ℎ" 𝑉æ 53[49,57]%
13[10,21]%
TableS2.ParameterestimatesandtheirconfidenceintervalsforheightandBMI
basedonGWASresults;theheritabilitywasassumedtobe0.8forheightand0.5for
BMI(8,10).
43
7.5.
Testinggoodnessoffit
WeusetheKolmogorov-SmirnovD-statistic(29,30)totestthegoodnessoffitof
ourmodelswithoutpleiotropyandinthehighpleiotropylimit.Sinceourparameter
estimatesareinferredfromthedatathatwearetestingagainst,wecannotrelyon
thestandardtablesforthep-values.Wethereforegeneratenulldistributionsforthe
D-statisticusingparametricbootstrapbasedonourmodels.Specifically:i)we
generate10% samplesofKsignificantlocibasedonthemodelunderconsideration,
withthecorrespondingestimateof𝑣” ,ii)weinfer𝑣” basedoneachsample,andiii)
calculatetheKolmogorov-SmirnovD-statisticbetweenthedistributionofvariance
fortheKlociineachsampleandthecorrespondingtheoreticaldistributionbased
onthe𝑣” inferredfromthatsample.TheresultingdistributionofD-statistics
correspondstoournullhypothesis,i.e.,thatthelocidetectedinGWASarose
accordingtoourmodel,andspecificallytothewaywecalculatetheD-statistic
betweentheobserveddistributionofvariancefortheKdetectedlociandthe
theoreticaldistributionthatweinferredbasedontheseobservations.Wethen
calculatetheD-statistic,𝐷6 ,basedontherealdataandcorrespondingtheoretical
distribution,andestimatetheone-sidedp-valueby
𝑝ù8y =
#*+Ièé,Ùz--,Ù,*zÙ*.+Ù/¹¡¹9 #*+Ièé,Ùz--,Ù,*zÙ*
.
(S103)
Notethatunlikethecommoncase,heretheinabilitytorejectthenullindicatesthat
thedataisconsistentwithourmodel.
44
Withoutpleiotropy
No pleiotropy
Contributiontovariance
(inunitsof10-4VP)
Height
15
p̂K−S < 10 −5
p̂K−S =0.14
10
10
5
5
Contributiontovariance
(inunitsof10-4VP)
0
0
15
BMI
Highpleiotropy
High pleiotropy
15
5
10
No pleiotropy
15
0
0
15
10
10
5
5
0
5
15
p̂K−S =0.54
p̂K−S =0.05
0
5
10
High pleiotropy
10
Contributiontovariance
(inunitsof10-4×VP)
15
0
0
5
10
Contributiontovariance
(inunitsof10-4×VP)
15
FigureS11.Q-Qplotscomparingthedistributionofvarianceamongsignificantloci
takenfromtheGWASofheight(10)andBMI(8)withthetheoreticaldistributions
inferredfromthesedata,basedonthemodelswithoutpleiotropy(a)andinthehigh
pleiotropylimit(b).Theseplotsshowthatthemodelassuminghighpleiotropy
cannotberejectedforeithertraitandfitsthesedatamuchbetterthanthemodel
assumingno-pleiotropy.
45
8.
Glossaryofnotation
𝑟
W(𝑟)
Scaleofselection
𝑛
Numberoftraits(dimensions)
𝑎
n-dimensionaleffectsize
𝑎0 Effectsizeonfocaltrait
𝑈
Haploidmutationratepergeneration
𝜎 "
Phenotypicvarianceinatrait
𝐾
Numberofsegregatingsites
η(𝑎0 |𝑆)
Distributionoffocaltraiteffectsizesconditionalonoveralleffectsize
Distributionoffocaltraiteffectsizesconditionalonthescaledselection
coefficient
𝑁a"
2w "
Scaledselectioncoefficient
𝑞
Derivedallelefrequency
𝑝
Ancestralallelefrequency,𝑝 = 1 − 𝑞
𝑆=
τ(𝑞|𝑆)
Fitness
𝑤
φŽ (𝑎0 |𝑎)
Phenotype
Thesojourntimeforamutationwithscaledselectioncoefficient𝑆
0
𝑣
Contributiontovariancefromasite 𝑣 = 𝑎0" 𝑞 1 − 𝑞 𝑣” Expectedcontributionofastronglyselectedsitetovariance(𝑣” =
"
"A 5
Ov
).
E(𝑉|𝑆)
Expectedcontributiontogeneticvariancefromsitesunderselection𝑆
E(𝐾|𝑆)
Expectedcontributiontothenumberofsegregatingsitesfromsites
underselection𝑆
𝜌(𝑣)
Densityofsegregatingsitescontributingvariance𝑣
G(𝑣 ∗ )
Fractionofvarianceexplainedbysitescontributingmorethan𝑣 ∗ tothe
variance
𝑚
GWASstudysize
H
PowertoidentifyalocusinGWAS
46
9.
Bibliography
1.
LandeR(1975)Themaintenanceofgeneticvariabilitybymutationinapolygenic
characterwithlinkedloci.GenetRes26(3):221-235.
PoonA&OttoSP(2000)Compensatingforourloadofmutations:Freezingthe
meltdownofsmallpopulations.Evolution54(5):1467-1479.
EwensWJ(2004)Mathematicalpopulationgenetics1:I.Theoreticalintroduction
(SpringerScience&BusinessMedia).
KeightleyPD&HillWG(1988)Quantitativegeneticvariabilitymaintainedby
mutation-stabilizingselectionbalanceinfinitepopulations.GenetRes52(01):33-43.
KongA,etal.(2012)Rateofdenovomutationsandtheimportanceoffather'sageto
diseaserisk.Nature488(7412):471-475.
WardLD&KellisM(2012)Evidenceofabundantpurifyingselectioninhumansfor
recentlyacquiredregulatoryfunctions.Science337(6102):1675-1678.
PerryJRB,etal.(2014)Parent-of-origin-specificallelicassociationsamong106
genomiclociforageatmenarche.Nature514(7520):92-97.
LockeAE,etal.(2015)Geneticstudiesofbodymassindexyieldnewinsightsfor
obesitybiology.Nature518:197-206.
ScottRA,etal.(2012)Large-scaleassociationanalysesidentifynewlociinfluencing
glycemictraitsandprovideinsightintotheunderlyingbiologicalpathways.Nat
Genet44(9):991-1005.
WoodAR,etal.(2014)Definingtheroleofcommonvariationinthegenomicand
biologicalarchitectureofadulthumanheight.NatGenet46(11):1173–1186.
SchorkNJ,MurraySS,FrazerKA,&TopolEJ(2009)Commonvs.rareallele
hypothesesforcomplexdiseases.CurrOpinGenetDev19(3):212-219.
HuntKA,etal.(2013)Negligibleimpactofrareautoimmune-locuscoding-region
variantsonmissingheritability.Nature498(7453):232-235.
DoR,etal.(2015)Exomesequencingidentifiesrareldlrandapoa5alleles
conferringriskformyocardialinfarction.Nature518(7537):102-106.
PurcellSM,etal.(2014)Apolygenicburdenofraredisruptivemutationsin
schizophrenia.Nature506(7487):185-190.
ChenJA,PeñagarikanoO,BelgardTG,SwarupV,&GeschwindDH(2015)The
emergingpictureofautismspectrumdisorder:Geneticsandpathology.AnnuRev
PatholMechDis10:111-144.
UhlenbeckGE&OrnsteinLS(1930)Onthetheoryofthebrownianmotion.PhysRev
36(5):823.
LandeR(1976)Naturalselectionandrandomgeneticdriftinphenotypicevolution.
Evolution30(2):314-334.
CharlesworthB(2013)Stabilizingselection,purifyingselection,andmutationalbias
infinitepopulations.Genetics194(4):955-971.
LefflerEM,etal.(2013)Multipleinstancesofancientbalancingselectionshared
betweenhumansandchimpanzees.Science339(6127):1578-1582.
DennyJC,etal.(2013)Systematiccomparisonofphenome-wideassociationstudyof
electronicmedicalrecorddataandgenome-wideassociationstudydata.NatBiotech
31(12):1102-1111.
MartinG&LenormandT(2006)Ageneralmultivariateextensionoffisher's
geometricalmodelandthedistributionofmutationfitnesseffectsacrossspecies.
Evolution60(5):893-907.
ShamPC&PurcellSM(2014)Statisticalpowerandsignificancetestinginlargescalegeneticstudies.NatRevGenet15(5):335-346.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
47
23.
24.
25.
26.
27.
28.
29.
30.
DicksonSP,WangK,KrantzI,HakonarsonH,&GoldsteinDB(2010)Rarevariants
createsyntheticgenome-wideassociations.PLoSBiol8(1):e1000294.
EvangelouE&IoannidisJPA(2013)Meta-analysismethodsforgenome-wide
associationstudiesandbeyond.NatRevGenet14(6):379-389.
VisscherPM,BrownMA,McCarthyMI,&YangJ(2012)Fiveyearsofgwasdiscovery.
AmJHumGenet90:7-24.
PritchardJK&PrzeworskiM(2001)Linkagedisequilibriuminhumans:Modelsand
data.AmJHumGenet69(1):1-14.
The1000GenomesProjectConsortium(2015)Aglobalreferenceforhumangenetic
variation.Nature526(7571):68-74.
LiuX,etal.(2015)Signaturesofnaturalselectionatthefto(fatmassandobesity
associated)locusinhumanpopulations.PLoSOne10(2):e0117093.
GenestC&RemillardB(2008)Validityoftheparametricbootstrapforgoodness-offittestinginsemiparametricmodels.AnnInstHPoincareProbabStatist44(6):10961127.
StuteW,ManteigaWG,&QuindimilMP(1993)Bootstrapbasedgoodness-of-fittests.Metrika40(1):243-256.
48
10.
Additionalfigures
Density of variance
8
4
2
0
0.0
Strongselection(S=100)
Weakselection(S=10)
Effectivelyneutral(S=0.5)
6
0.1
0.2
0.3
0.4
Minor allele frequency
0.5
FigureS12.Thedistributionofvarianceamongsitesasafunctionofminorallele
frequency.Thisdensityisindependentoftheextentofpleiotropybecause,the
expectedvariancecorrespondingtoagivenselectioncoefficientandfrequency
dependsonlyontheexpectedsquaredeffectsize,whichisnotaffectedbythe
degreeofpleiotropy.
49
400
withoutpleiotropy
(a)
300
Number of loci > v
Number of loci > v
500
Strongselection(S=100)
Weakselection(S=10)
Effectivelyneutral(S=0.5)
200
100
0
0
Contribution to genetic variation HvL
1
2
3
4
5
X
vs
500
400
withpleiotropy
(b)
300
Strongselection(S=100)
Weakselection(S=10)
Effectivelyneutral(S=0.5)
200
100
0
0
Contribution to genetic variation HvL
1
2
3
4
5
X
vs
FigureS13.ThenumberoflociperMbcontributingmorethan𝑣tothevariance,asa
functionof𝑣.Calculatedforaone-dimensionalmodelwithoutpleiotropy(a)andfor
thehighpleiotropylimit(b).ThisfigurecorrespondstoFig.2butshowsthenumber
oflocidiscoveredperMbinsteadoftheproportionoftheheritablevariance
explained.Calculatedforaconstantpopulationsizeof20,000,withamutationrate
of1.2×108• (5).
50
Effectivelyneutral(S=0.5)
60
40
20
0
Study size Hin units of VP êvs L
10
20
50 100 200
500 1000
% heritability explained
80
(a)Strongselection(S=100)
Intermediateselection(S=10)
30
(b)
Re-sequencing
20
g
pin
oty
Gen
% heritability explained
100
10
0
1
Scaled selection coefficient HSL
3
10
30
100
300
FigureS14.TheoreticalpredictionsforGWASofcontinuoustraits,inthecase
withoutpleiotropy(𝑛 = 1).ThisfigureisequivalenttoFig.3fromthemaintext,
whichdescribesthecasewithpleiotropy(𝑛 ≫ 1).(a)Proportionofheritability
explainedasafunctionofstudysize.(b)Proportionofheritabilityexplainedasa
functionofscaledselectioncoefficientforstudiesbasedonre-sequencingand
genotyping.Calculatedforastudysizeof~43𝑉æ /𝑣” ,correspondingtohaving25%
ofstronglyselected(S>>1)varianceexplainedwithre-sequencing.Note,thatthis
studysizeis~2.5timeslargerthantheonerequiredtoexplain25%ofstrongly
selectedvariancewithpleiotropy(seeFig.3).
51
Number of loci
200
Strongselection(S=100)
Weakselection(S=10)
Effectivelyneutral(S=0.5)
150
100
50
0
1
Study size Hin units of VPêvs L
2
5
10 20
50 100 200
FigureS15.PredictionofthenumberperMboflociassociatedwithacontinuous
traitinGWASasafunctionofstudysize,inthelimitofhighpleiotropy.Calculated
foraconstantpopulationsizeof20,000,withamutationrateof1.2×108• (5).
52
Number of variants
104
Height
103
10
BMI
2
Data
101
10
Model
+95%CI
0
0
200
400
600
800
Study size (in thousands)
1000
FigureS16.ModelpredictionsforthenumberofdiscoveredvariantsinfutureGWAS
forheightandBMI.Thecorrespondingpredictionsforthe%heritabilityexplained
appearinFig.4.
53