The Future of Genetic Studies of Complex Human Diseases Author(s): Neil Risch and Kathleen Merikangas Reviewed work(s): Source: Science, New Series, Vol. 273, No. 5281 (Sep. 13, 1996), pp. 1516-1517 Published by: American Association for the Advancement of Science Stable URL: http://www.jstor.org/stable/2891043 . Accessed: 20/09/2012 06:08 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Association for the Advancement of Science is collaborating with JSTOR to digitize, preserve and extend access to Science. http://www.jstor.org The Future Complex of Genetic Human Studies Diseases of Neil Risch and KathleenMerikangas Geneticistshavemadesubstantial in progress identifyingthe geneticbasisof manyhuman diseases,at leastthosewithconspicuous determinants.ThesesuccessesincludeHuntington's disease,Alzheimer's disease,andsomeformsof breastcancer.However,the detectionof genetic factorsfor complex diseases-such as anddiabetesschizophrenia, bipolardisorder, has been far more complicated.There have been numerousreportsof genes or loci that mightunderliethesedisorders, butfewof these findingshavebeenreplicated. The modestnatureof the geneeffectsforthesedisorders likely explainsthe contradictoryand inconclusive claimsabouttheiridentification.Despitethe smalleffectsof such genes,the magnitudeof theirattributable risk(theproportion ofpeople affecteddueto them)maybelargebecausethey arequitefrequentin the population,making themof publichealthsignificance. Hasthe geneticstudyof complexdisorders reached its limits? The persistentlack of replicabilityof these reportsof linkagebetween various loci and complex diseases mightimplythat it has.We arguebelowthat the methodthat has been used successfully (linkageanalysis)to findmajorgeneshaslimited powerto detect genes of modesteffect, but that a differentapproach(association studies)that utilizescandidategeneshas far greaterpower,even if one needsto test every gene in the genome.Thus, the futureof the geneticsofcomplexdiseasesislikelyto require large-scaletestingby associationanalysis. How largedoesa gene effectneed to be in orderto be detectableby linkage analysis? We considerthe followingmodel:Supposea diseasesusceptibilitylocushas two allelesA and a, with populationfrequenciesp andq = 1 - p, respectively.There are three genotypes:AA, Aa, and aa.We definegenotypic relative risks (GRR, the increasedchance that an individualwith a particulargenotype has the disease) as follows:Let the risk for individualsof genotypeAa be ytimes greater than the riskfor individualswith genotype aa, a GRR of y. We assumea multiplicative relationfor two A alleles, so that the GRR for genotypeAA is y2.The methodof link- age analysiswe have chosen for this argument is a popularcurrentparadigmin which pairsof siblings,both with the disease,are examinedfor sharingof alleles at multiple sitesin the genomedefinedbygeneticmarkers. The more often the affected siblings sharethe sameallele at a particularsite, the more likely the site is close to the disease gene. Usingthe formulasin (1), we calculate the expectedproportionYof allelessharedby a pairof affectedsiblingsforthe bestpossible case-that is, a closely linked markerlocus (recombinationfraction0 = 0) that is fully = 1) (2)-as informative(heterozygosity y 1 +W wherew= pq(y-1)2 2+w (p7+q)2 If there is no linkage of a markerat a particularsite to the disease, the siblings wouldbe expectedto sharealleles50%of the time;that is, Y wouldequal0.5. Valuesof Y for variousvaluesof p and y aregiven in the third column of the table. For an allele of moderatefrequency(p is 0.1 to 0.5) thatconfersa GRR(y) of fourfoldorgreater,thereis a detectabledeviationofYfromthenullvalueof 0.5.On the otherhand,foranalleleconferring aGRRof2 orless,theexpectedmarker-sharing only marginallyexceeds50%, for any allele frequency(p). Thus,it is clearthat the useof pq(,y+ 1)2 2 (Py + q)2+ pq(y- 1)2 Sib pairs Singletons Genotypic Frequency Probability No. of Probabilityof Proportionof risk ratio of disease of allele families transmitting heterozygous allele A sharing required disease allele A parents (Y) (N) P(tr-A) (Het) (N) (7) (P) 4260 1098 4.0 0.01 0.520 0.800 0.048 150 185 0.10 0.597 0.800 0.346 297 103 0.50 0.576 0.800 0.500 2013 222 0.80 0.529 0.800 0.235 (Het) (N) 0.112 235 0.537 48 0.424 61 0.163 161 2.0 0.01 0.10 0.50 0.80 0.502 0.518 0.526 0.512 296,710 5382 2498 11,917 0.667 0.667 0.667 0.667 0.029 0.245 0.500 0.267 5823 695 340 640 0.043 1970 0.323 264 0.474 180 0.217 394 1.5 0.01 0.10 0.50 0.80 0.501 0.505 0.510 0.505 4,620,807 67,816 17,997 67,816 0.600 0.600 0.600 0.600 0.025 0.197 0.500 0.286 19,320 2218 949 1663 0.031 7776 0.253 941 0.490 484 0.253 941 N. Risch is in the Departmentof Genetics, Stanford Schoolof Medicine,Stanford,CA94305-5120, University USA.E-mail:[email protected]. K.Merikangas is in the Departmentsof Epidemiologyand Psychiatry, Unit,Yale UniversitySchool of Medicine,New Haven, Comparison of linkage and association CT 06510, USA. E-mail:[email protected] disease gene. 1516 linkageanalysisfor loci conferringGRR of about2 orlesswill neverallowidentification because the number of families required (morethan -2500) is not practicallyachievable. Althoughtestsof linkageforgenesof modest effect are of low power,as shownby the aboveexample,directtestsof associationwith a diseaselocusitselfcan still be quitestrong. To illustratethis point, we use the transmission/disequilibrium testof Spielmanetal. (3). In this test, transmission of a particularallele at a locusfromheterozygous parentsto their affectedoffspringisexamined.UnderMendelianinheritance,allallelesshouldhavea 50% chanceof beingtransmittedto the next generation.In contrast,if one of the alleles is associatedwith diseaserisk,it will be transmittedmoreoften than 50%of the time. Forthis approach,we do not needfamilies with multipleaffectedsiblings,but can focus just on single affectedindividualsand their parents.Forthe samemodelgiven above,we can calculatethe proportionof heterozygous parentsas pq(y + 1)/(py + q)(4). Similarly, the probabilityfor a heterozygoteparentto transmitthe highriskA alleleisjusty/(1 + y). Associationtests can also be performedfor pairsof affectedsiblings.When the locus is associated withdisease,thetransmission excess over50%isthe sameasforsingleoffspring, but the probability of parentalheterozygosity is increasedat lowvaluesofp;forhighervaluesofp, ofparentalheterozygosity theprobability isdecreased.The formulaforparentalheterozygosity foran affectedpairof siblingsforthe same geneticmodelasusedin the firstexampleis studies. Number of families needed for identification of a SCIENCE * VOL. 273 * 13 SEPTEMBER 1996 On the rightside of the table,we present over 2000), precludingtheir identification the proportionof heterozygous parents(Het) by this approach.In contrast,the required and the probabilityof transmissionof the A samplesizefor the associationtest, even alallele from a heterozygousparentto an af- lowing for the smallersignificancelevel, is fected child [P(tr-A)]for the samevaluesof vastlyless than for linkage,especiallyfor afGRRas consideredabovefor the exampleof fectedsiblingpairfamilieswhen the valueof linkageanalysis.The deviationfromthe null p is small.Evenfora GRRof 1.5, the sample hypothesisof 50% transmissionfrom het- sizesaregenerallylessthan 1000,wellwithin erozygousparents is substantiallygreater reason. thanthe excessallelesharingthatisfoundby Thus, the primarylimitationof genomelinkageanalysisin siblingpairs.This dispar- wide associationtests is not a statisticalone ity betweenthe methodsis particularlytrue but a technologicalone. A largenumberof forlowervaluesof y(that is, with lowerrela- genes (up to 100,000) and polymorphisms tive risk). For example,for y = 1.5, allele (preferentially ones that createalterationsin sharingis at most 51%,while the A allele is derivedproteinsortheirexpression)mustfirst transmitted60%of the time fromheterozy- be identified,andan extremelylargenumber gousparents. of suchpolymorphisms willneed to be tested. In this respectthen, associationstudies Althoughtestingsucha largenumberof polyseem to be of greaterpower than linkage morphismson several hundred,or even a studies.But of course,the limitationof as- thousandfamilies,mightcurrentlyseem imsociation studiesis that the actualgene or plausiblein scope,moreefficientmethodsof genes involvedin the diseasemustbe tenta- screeninga largenumberof polymorphisms tively identifiedbeforethe test can be per- (for example,samplepooling) may be posformed. In fact, the actual polymorphism sible. Furthermore, the numberof tests we withinthe gene(orat leasta polymorphism in have used as the basisfor our calculations strong disequilibrium)must be available. (1,000,000)is likelyto be farlargerthannecHowever,we show that this requirementis essaryif one allowsforlinkagedisequilibrium, onlydauntingbecauseof limitationsimposed whichcouldsubstantially reducethe required by currenttechnologicalcapabilities,not be- numberof markersand familiesneeded for causesufficientfamilieswith the diseaseare initialscreening. not availableor the statisticalpoweris inadSome of the importantloci for complex equate (5). For example,imaginethe time diseaseswill undoubtedlybe found by linkwhen all humangenes (say 100,000in total) age analysis.However,the limitationsto dehave been found and that simple, diallelic tectingmanyof the remaininggenesby linkpolymorphismsin these genes have been age studiescan be overcome;numerousgeidentified.Assume that five such diallelic netic effectstoo weakto identifyby linkage have been identifiedwithin can be detectedbygenomicassociationstudpolymorphisms each gene, so that a total of 10 x 105 = 106 ies. Fortunately,the samplescurrentlycolallelesneedto be tested.The statisticalprob- lected for linkagestudies (for example,aflemis thatthe largenumberof teststhatneed fectedpairsof siblingsandtheirparents)can to be madeleadsto an inflationof the type 1 also be used for such association studies. errorprobability.Fora linkagetest withpairs Thus, investigatorsshould preserve their of affectedsiblings,we use a lod score(loga- samplesforfuturelarge-scaletesting. rithmof the oddsratiofor linkage)criterion The human genome project can have of 3.0, whichasymptotically correspondsto a more than one reward.In addition to setype 1 errorprobabilityoxof about 104. In a quencingthe entire human genome, it can linkage genome screen with 500 markers, lead to identificationof polymorphismsfor this significance level gives a probability all the genes in the humangenomeand the greaterthan 95%of no false positives.The diseases to which they contribute.It is a equivalentfalse positive rate for 1,000,000 chargeto the moleculartechnologiststo deindependent association tests can be ob- velop the tools to meet this challenge and tainedwith a significancelevel oc= 5 x 10-8. providethe informationnecessaryto identify We illustratethe powerof linkageversus the geneticbasisof complexhumandiseases. associationtestsat differentsignificancelevels by determiningthe samplesizeN (num- References and Notes ber of families) necessaryto obtain 80% 1. N. Risch, Am. J. Hum.Genet. 40, 1 (1987); ibid. power(the probabilityof rejectingthe null 46, 229 (1990). 2. Fromthe formulasin (1), we have ko = 1 + O.5VA/ hypothesiswhen it is false) (6) (see table). K2and Xs = 1 + (0.5VA+ 0.25VD)IK2, where K= With a linkageapproachand a diseasegene 2,2 + 2pqy + q2= (py + q)2,VA = 2pq(y- 1)2 (py with a GRR of 4 or greater,the numberof + q)2, and VD = p2q2(y- 1)2. Hence, k0 = 1 + w affectedsiblingpairsnecessaryto detectlinkand XS= (1 + 0.5w)2,where w= pq(y- 1)2. The proportionof alleles shared is given by Y = 1 age is realistic (185 or 297), providedthe 0.5z, - zo, wherez1 and zo are the probabilitiesof allelefrequencyp is between5 and 75%.For the 1 and 0 a gene with a GRR of 2 or less, however, the sample sizes are generally beyond reach (well sib pairsharing disease alleles ibd, respectively.From(1), zo = 0.25/XSand z1 = 0.5ko/ Xs. Thus,aftersome algebra, Y= 1 - 0.25(ko + 1)/ SCIENCE * VOL. 273 0 13 SEPTEMBER 1996 XS= (1 = w)I(2 + w). 3. R. Spielman,R. E. McGinnis,W.J. Ewens,Am.J. Hum.Genet.52, 506 (1993). 4. By Bayes theorem,the probability of a parentof an affected child being heterozygousis given by P(HetlAff child)= P(Het)P(Aff ChildlHet)/P2Aff Child) = 2pq[0.5p(-y2 + y)]+ 0.5q(-y+ 1)I(py + q) = pq(y + 1)/(py+ q). 5. E. S. Landerand N. J. Schork,Science 265, 2037 (1994). 6. Considera set of M independent,identicallydistributedrandomvariablesBiof discretevalue.Under the nullhypothesisHo,assume E(B1)= 0 and = 1. Underthe alternativehypothesisH1, Var(B1) let E(Bi) = ,t and Var(B1)= ?2. For a sample of size M,let T= FBi\/I7 ThenunderHo, Talso has mean 0 and variance1, whileunderH1,it has mean Ni ,t and varianceG2.We assume that Tis approximatelynormallydistributedbothunderHoand H1. Thenthe samplesize Mrequiredto obtaina power of 1 - X fora significancelevel a. is given by M= (Za - ?Z1 _ p)2/M2(1) Foreach affectedsib pair,we score the numberof alleles shared ibd fromeach of 2N parents.Define Bi=1 if an allele is sharedfromthe ith parentand B = -1 if unshared.Underthe nullhypothesisof no linkage, P(B1= 1) = P(B1= -1) = 0.5, so E(B1)= = 1. Forthe genetic modeldescribed 0 and Var(B1) above withgenotypicrelativerisksof y2, y,and 1, allele sharingby affected sibs is independentfor the two parents;thus, we can considersharingof alleles one parentat a time.Thus,foraffectedsib pairsassuming0 = 0 and no linkagedisequilibrium, the formulais (Za-Z1 )2 4 2N where g=2Y-1 2 a = 4Y(1-Y) y1+w 2+ w w= pq (y- 1)2 2 (py+ q) Zax= 3.72 (correspondingto a = 10-4), and Z1- = -0.84 (corresponding to 1 - a = 0.80).. For an association test using the transmission/disequilibriumtest, withthe disease locus or a nearbylocus in completedisequilibrium, the number(N) of familieswithaffected singletonsrequiredfor80% power is also calculatedfromformula1. Forthis of allele case, we score the numberof transmissions A fromheterozygousparents.Leth be the probabilitya parentis heterozygousunderthe alternative hypothesis,namely,h = pq(y+ 1)/(py+ q).Thendefine Bi= hZ05if the parentis heterozygousand allele A is transmitted; Bi= 0 ifthe parentis homozygous;and Bi= -hO05if the parentis heterozygous and transmitsallele a. Underthe nullhypothesis, = 1. Underthe alternativehyE(Bi)= 0 and Var(B1) pothesis, A = E(B) = h(y - 1)/(y + 1) and g2 Var(Bi)= 1 - h('y- 1)2/(y + 1)2. Inthis case, there are two parentsperfamilyand they act independently, so the requirednumber(N) of familiesis givenby halfof formula1 where,uand ?2 are given above. to a.= 5 x 10-8).For Here,Za,= 5.33 (corresponding the same test butwithaffectedsib pairsinsteadof singletons,the numberof familiesrequiredis given fromtwoparents by halfof formula1 (transmissions to twochildren)withthe same formulasfor,t and ?2 as forsingletonfamiliesbutnowusingthe heterozygote frequencyforparentsof affectedsib pairs.Using the above formulas,we can calculatesample sizes forthe threestudydesigns. 27 October1995;accepted 6 June 1996. 1517
© Copyright 2026 Paperzz