The Future of Genetic Studies of Complex Human Diseases

The Future of Genetic Studies of Complex Human Diseases
Author(s): Neil Risch and Kathleen Merikangas
Reviewed work(s):
Source: Science, New Series, Vol. 273, No. 5281 (Sep. 13, 1996), pp. 1516-1517
Published by: American Association for the Advancement of Science
Stable URL: http://www.jstor.org/stable/2891043 .
Accessed: 20/09/2012 06:08
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Association for the Advancement of Science is collaborating with JSTOR to digitize, preserve and
extend access to Science.
http://www.jstor.org
The
Future
Complex
of
Genetic
Human
Studies
Diseases
of
Neil Risch and KathleenMerikangas
Geneticistshavemadesubstantial
in
progress
identifyingthe geneticbasisof manyhuman
diseases,at leastthosewithconspicuous
determinants.ThesesuccessesincludeHuntington's
disease,Alzheimer's
disease,andsomeformsof
breastcancer.However,the detectionof genetic factorsfor complex diseases-such as
anddiabetesschizophrenia,
bipolardisorder,
has been far more complicated.There have
been numerousreportsof genes or loci that
mightunderliethesedisorders,
butfewof these
findingshavebeenreplicated.
The modestnatureof the geneeffectsforthesedisorders
likely
explainsthe contradictoryand inconclusive
claimsabouttheiridentification.Despitethe
smalleffectsof such genes,the magnitudeof
theirattributable
risk(theproportion
ofpeople
affecteddueto them)maybelargebecausethey
arequitefrequentin the population,making
themof publichealthsignificance.
Hasthe geneticstudyof complexdisorders
reached its limits? The persistentlack of
replicabilityof these reportsof linkagebetween various loci and complex diseases
mightimplythat it has.We arguebelowthat
the methodthat has been used successfully
(linkageanalysis)to findmajorgeneshaslimited powerto detect genes of modesteffect,
but that a differentapproach(association
studies)that utilizescandidategeneshas far
greaterpower,even if one needsto test every
gene in the genome.Thus, the futureof the
geneticsofcomplexdiseasesislikelyto require
large-scaletestingby associationanalysis.
How largedoesa gene effectneed to be in
orderto be detectableby linkage analysis?
We considerthe followingmodel:Supposea
diseasesusceptibilitylocushas two allelesA
and a, with populationfrequenciesp andq =
1 - p, respectively.There are three genotypes:AA, Aa, and aa.We definegenotypic
relative risks (GRR, the increasedchance
that an individualwith a particulargenotype
has the disease) as follows:Let the risk for
individualsof genotypeAa be ytimes greater
than the riskfor individualswith genotype
aa, a GRR of y. We assumea multiplicative
relationfor two A alleles, so that the GRR
for genotypeAA is y2.The methodof link-
age analysiswe have chosen for this argument is a popularcurrentparadigmin which
pairsof siblings,both with the disease,are
examinedfor sharingof alleles at multiple
sitesin the genomedefinedbygeneticmarkers. The more often the affected siblings
sharethe sameallele at a particularsite, the
more likely the site is close to the disease
gene. Usingthe formulasin (1), we calculate
the expectedproportionYof allelessharedby
a pairof affectedsiblingsforthe bestpossible
case-that is, a closely linked markerlocus
(recombinationfraction0 = 0) that is fully
= 1) (2)-as
informative(heterozygosity
y
1 +W wherew= pq(y-1)2
2+w
(p7+q)2
If there is no linkage of a markerat a
particularsite to the disease, the siblings
wouldbe expectedto sharealleles50%of the
time;that is, Y wouldequal0.5. Valuesof Y
for variousvaluesof p and y aregiven in the
third column of the table. For an allele of
moderatefrequency(p is 0.1 to 0.5) thatconfersa GRR(y) of fourfoldorgreater,thereis a
detectabledeviationofYfromthenullvalueof
0.5.On the otherhand,foranalleleconferring
aGRRof2 orless,theexpectedmarker-sharing
only marginallyexceeds50%, for any allele
frequency(p). Thus,it is clearthat the useof
pq(,y+ 1)2
2 (Py + q)2+ pq(y- 1)2
Sib pairs
Singletons
Genotypic Frequency Probability No. of
Probabilityof
Proportionof
risk ratio of disease
of allele families
transmitting heterozygous
allele A
sharing required disease allele A
parents
(Y)
(N)
P(tr-A)
(Het)
(N)
(7)
(P)
4260
1098
4.0
0.01
0.520
0.800
0.048
150
185
0.10
0.597
0.800
0.346
297
103
0.50
0.576
0.800
0.500
2013
222
0.80
0.529
0.800
0.235
(Het) (N)
0.112 235
0.537 48
0.424 61
0.163 161
2.0
0.01
0.10
0.50
0.80
0.502
0.518
0.526
0.512
296,710
5382
2498
11,917
0.667
0.667
0.667
0.667
0.029
0.245
0.500
0.267
5823
695
340
640
0.043 1970
0.323 264
0.474 180
0.217 394
1.5
0.01
0.10
0.50
0.80
0.501
0.505
0.510
0.505
4,620,807
67,816
17,997
67,816
0.600
0.600
0.600
0.600
0.025
0.197
0.500
0.286
19,320
2218
949
1663
0.031 7776
0.253 941
0.490 484
0.253 941
N. Risch is in the Departmentof Genetics, Stanford
Schoolof Medicine,Stanford,CA94305-5120,
University
USA.E-mail:[email protected].
K.Merikangas
is in the Departmentsof Epidemiologyand Psychiatry,
Unit,Yale UniversitySchool of Medicine,New Haven, Comparison of linkage and association
CT 06510, USA. E-mail:[email protected]
disease gene.
1516
linkageanalysisfor loci conferringGRR of
about2 orlesswill neverallowidentification
because the number of families required
(morethan -2500) is not practicallyachievable.
Althoughtestsof linkageforgenesof modest effect are of low power,as shownby the
aboveexample,directtestsof associationwith
a diseaselocusitselfcan still be quitestrong.
To illustratethis point, we use the transmission/disequilibrium
testof Spielmanetal. (3).
In this test, transmission
of a particularallele
at a locusfromheterozygous
parentsto their
affectedoffspringisexamined.UnderMendelianinheritance,allallelesshouldhavea 50%
chanceof beingtransmittedto the next generation.In contrast,if one of the alleles is
associatedwith diseaserisk,it will be transmittedmoreoften than 50%of the time.
Forthis approach,we do not needfamilies
with multipleaffectedsiblings,but can focus
just on single affectedindividualsand their
parents.Forthe samemodelgiven above,we
can calculatethe proportionof heterozygous
parentsas pq(y + 1)/(py + q)(4). Similarly,
the probabilityfor a heterozygoteparentto
transmitthe highriskA alleleisjusty/(1 + y).
Associationtests can also be performedfor
pairsof affectedsiblings.When the locus is
associated
withdisease,thetransmission
excess
over50%isthe sameasforsingleoffspring,
but
the probability
of parentalheterozygosity
is increasedat lowvaluesofp;forhighervaluesofp,
ofparentalheterozygosity
theprobability
isdecreased.The formulaforparentalheterozygosity foran affectedpairof siblingsforthe same
geneticmodelasusedin the firstexampleis
studies. Number of families needed for identification of a
SCIENCE * VOL. 273 * 13 SEPTEMBER 1996
On the rightside of the table,we present over 2000), precludingtheir identification
the proportionof heterozygous
parents(Het) by this approach.In contrast,the required
and the probabilityof transmissionof the A samplesizefor the associationtest, even alallele from a heterozygousparentto an af- lowing for the smallersignificancelevel, is
fected child [P(tr-A)]for the samevaluesof vastlyless than for linkage,especiallyfor afGRRas consideredabovefor the exampleof fectedsiblingpairfamilieswhen the valueof
linkageanalysis.The deviationfromthe null p is small.Evenfora GRRof 1.5, the sample
hypothesisof 50% transmissionfrom het- sizesaregenerallylessthan 1000,wellwithin
erozygousparents is substantiallygreater reason.
thanthe excessallelesharingthatisfoundby
Thus, the primarylimitationof genomelinkageanalysisin siblingpairs.This dispar- wide associationtests is not a statisticalone
ity betweenthe methodsis particularlytrue but a technologicalone. A largenumberof
forlowervaluesof y(that is, with lowerrela- genes (up to 100,000) and polymorphisms
tive risk). For example,for y = 1.5, allele (preferentially
ones that createalterationsin
sharingis at most 51%,while the A allele is derivedproteinsortheirexpression)mustfirst
transmitted60%of the time fromheterozy- be identified,andan extremelylargenumber
gousparents.
of suchpolymorphisms
willneed to be tested.
In this respectthen, associationstudies Althoughtestingsucha largenumberof polyseem to be of greaterpower than linkage morphismson several hundred,or even a
studies.But of course,the limitationof as- thousandfamilies,mightcurrentlyseem imsociation studiesis that the actualgene or plausiblein scope,moreefficientmethodsof
genes involvedin the diseasemustbe tenta- screeninga largenumberof polymorphisms
tively identifiedbeforethe test can be per- (for example,samplepooling) may be posformed. In fact, the actual polymorphism sible. Furthermore,
the numberof tests we
withinthe gene(orat leasta polymorphism
in have used as the basisfor our calculations
strong disequilibrium)must be available. (1,000,000)is likelyto be farlargerthannecHowever,we show that this requirementis essaryif one allowsforlinkagedisequilibrium,
onlydauntingbecauseof limitationsimposed whichcouldsubstantially
reducethe required
by currenttechnologicalcapabilities,not be- numberof markersand familiesneeded for
causesufficientfamilieswith the diseaseare initialscreening.
not availableor the statisticalpoweris inadSome of the importantloci for complex
equate (5). For example,imaginethe time diseaseswill undoubtedlybe found by linkwhen all humangenes (say 100,000in total) age analysis.However,the limitationsto dehave been found and that simple, diallelic tectingmanyof the remaininggenesby linkpolymorphismsin these genes have been age studiescan be overcome;numerousgeidentified.Assume that five such diallelic netic effectstoo weakto identifyby linkage
have been identifiedwithin can be detectedbygenomicassociationstudpolymorphisms
each gene, so that a total of 10 x 105 = 106 ies. Fortunately,the samplescurrentlycolallelesneedto be tested.The statisticalprob- lected for linkagestudies (for example,aflemis thatthe largenumberof teststhatneed fectedpairsof siblingsandtheirparents)can
to be madeleadsto an inflationof the type 1 also be used for such association studies.
errorprobability.Fora linkagetest withpairs Thus, investigatorsshould preserve their
of affectedsiblings,we use a lod score(loga- samplesforfuturelarge-scaletesting.
rithmof the oddsratiofor linkage)criterion
The human genome project can have
of 3.0, whichasymptotically
correspondsto a more than one reward.In addition to setype 1 errorprobabilityoxof about 104. In a quencingthe entire human genome, it can
linkage genome screen with 500 markers, lead to identificationof polymorphismsfor
this significance level gives a probability all the genes in the humangenomeand the
greaterthan 95%of no false positives.The diseases to which they contribute.It is a
equivalentfalse positive rate for 1,000,000 chargeto the moleculartechnologiststo deindependent association tests can be ob- velop the tools to meet this challenge and
tainedwith a significancelevel oc= 5 x 10-8. providethe informationnecessaryto identify
We illustratethe powerof linkageversus the geneticbasisof complexhumandiseases.
associationtestsat differentsignificancelevels by determiningthe samplesizeN (num- References and Notes
ber of families) necessaryto obtain 80%
1. N. Risch, Am. J. Hum.Genet. 40, 1 (1987); ibid.
power(the probabilityof rejectingthe null
46, 229 (1990).
2. Fromthe formulasin (1), we have ko = 1 + O.5VA/
hypothesiswhen it is false) (6) (see table).
K2and Xs = 1 + (0.5VA+ 0.25VD)IK2,
where K=
With a linkageapproachand a diseasegene
2,2 + 2pqy + q2= (py + q)2,VA = 2pq(y- 1)2 (py
with a GRR of 4 or greater,the numberof
+ q)2, and VD = p2q2(y- 1)2. Hence, k0 = 1 + w
affectedsiblingpairsnecessaryto detectlinkand XS= (1 + 0.5w)2,where w= pq(y- 1)2. The
proportionof alleles shared is given by Y = 1 age is realistic (185 or 297), providedthe
0.5z, - zo, wherez1 and zo are the probabilitiesof
allelefrequencyp is between5 and 75%.For
the
1 and 0
a gene with a GRR of 2 or less, however, the
sample sizes are generally beyond reach (well
sib pairsharing
disease alleles ibd, respectively.From(1), zo = 0.25/XSand z1 = 0.5ko/
Xs. Thus,aftersome algebra, Y= 1 - 0.25(ko + 1)/
SCIENCE * VOL. 273 0 13 SEPTEMBER 1996
XS= (1 = w)I(2 + w).
3. R. Spielman,R. E. McGinnis,W.J. Ewens,Am.J.
Hum.Genet.52, 506 (1993).
4. By Bayes theorem,the probability
of a parentof an
affected child being heterozygousis given by
P(HetlAff
child)= P(Het)P(Aff
ChildlHet)/P2Aff
Child)
= 2pq[0.5p(-y2 + y)]+ 0.5q(-y+ 1)I(py + q) = pq(y +
1)/(py+ q).
5. E. S. Landerand N. J. Schork,Science 265, 2037
(1994).
6. Considera set of M independent,identicallydistributedrandomvariablesBiof discretevalue.Under the nullhypothesisHo,assume E(B1)= 0 and
= 1. Underthe alternativehypothesisH1,
Var(B1)
let E(Bi) = ,t and Var(B1)= ?2. For a sample of size
M,let T= FBi\/I7 ThenunderHo, Talso has mean
0 and variance1, whileunderH1,it has mean Ni
,t and varianceG2.We assume that Tis approximatelynormallydistributedbothunderHoand H1.
Thenthe samplesize Mrequiredto obtaina power
of 1 - X fora significancelevel a. is given by
M= (Za - ?Z1 _ p)2/M2(1)
Foreach affectedsib pair,we score the numberof
alleles shared ibd fromeach of 2N parents.Define
Bi=1 if an allele is sharedfromthe ith parentand
B = -1 if unshared.Underthe nullhypothesisof
no linkage, P(B1= 1) = P(B1= -1) = 0.5, so E(B1)=
= 1. Forthe genetic modeldescribed
0 and Var(B1)
above withgenotypicrelativerisksof y2, y,and 1,
allele sharingby affected sibs is independentfor
the two parents;thus, we can considersharingof
alleles one parentat a time.Thus,foraffectedsib
pairsassuming0 = 0 and no linkagedisequilibrium,
the formulais
(Za-Z1
)2
4
2N
where
g=2Y-1
2
a = 4Y(1-Y)
y1+w
2+ w
w=
pq (y- 1)2
2
(py+ q)
Zax=
3.72 (correspondingto a = 10-4), and Z1-
= -0.84 (corresponding to 1 - a = 0.80).. For an
association test using the transmission/disequilibriumtest, withthe disease locus or a nearbylocus in completedisequilibrium,
the number(N) of
familieswithaffected singletonsrequiredfor80%
power is also calculatedfromformula1. Forthis
of allele
case, we score the numberof transmissions
A fromheterozygousparents.Leth be the probabilitya parentis heterozygousunderthe alternative
hypothesis,namely,h = pq(y+ 1)/(py+ q).Thendefine Bi= hZ05if the parentis heterozygousand allele A is transmitted;
Bi= 0 ifthe parentis homozygous;and Bi= -hO05if the parentis heterozygous
and transmitsallele a. Underthe nullhypothesis,
= 1. Underthe alternativehyE(Bi)= 0 and Var(B1)
pothesis, A = E(B) = h(y - 1)/(y + 1) and g2 Var(Bi)= 1 - h('y- 1)2/(y + 1)2. Inthis case, there are
two parentsperfamilyand they act independently,
so the requirednumber(N) of familiesis givenby
halfof formula1 where,uand ?2 are given above.
to a.= 5 x 10-8).For
Here,Za,= 5.33 (corresponding
the same test butwithaffectedsib pairsinsteadof
singletons,the numberof familiesrequiredis given
fromtwoparents
by halfof formula1 (transmissions
to twochildren)withthe same formulasfor,t and ?2
as forsingletonfamiliesbutnowusingthe heterozygote frequencyforparentsof affectedsib pairs.Using the above formulas,we can calculatesample
sizes forthe threestudydesigns.
27 October1995;accepted 6 June 1996.
1517