Rare Variants and the Return of Family Studies Laura Almasy Department of Genetics Texas Biomedical Research Institute Epidemiology The potential risk factors are effectively infinite and cannot be searched exhaustively. Genetics The genome is finite and can be searched exhaustively! The key is finding the right phenotypes Endophenotype •Heritable •Associated with the illness •Independent of clinical state •Co-segregate with illness within a family •Found in some unaffected relatives Gould & Gottesman, Genes Brain Behav, 2006 Optimal Endophenotypes Joint genetic determination of endophenotype and disease risk is fundamental to the endophenotype concept Criteria of Optimal Endophenotype: Must be heritable Must be genetically correlated with the illness (pleiotropy) Glahn et al, Biol Psychiatry, 2011 Oct 7 Example: Endophenotype Discovery in Major Depression Search for neuroimaging-derived endophenotypes genetically correlated with risk of major depressive disorder. Optimal Endophenotypes Joint genetic determination of endophenotype and disease risk is fundamental to the endophenotype concept Criteria of Optimal Endophenotype: Must be heritable Must be genetically correlated with the illness (pleiotropy) Glahn et al, Biol Psychiatry, 2011 Oct 7 Endophenotypic Ranking Value The ERV measures the potential utility of an endophenotype for a given illness. ERVie = 2 2 |√hi √he ρg| h2i =heritability of the illness h2e=heritability of the endophentype ρg=genetic correlation Glahn et al, Biol Psychiatry, 2011 Oct 7 Properties of the ERV Values vary between 0 and 1 Higher values indicate stronger shared genetic influence Very large numbers of endophenotypes can be efficiently assessed Applicable to any heritable disease and any set of potentially relevant traits Genetics of Brain Structure and Function Study Extension of San Antonio Family Heart 1,500 Mexican-American individuals from ~50 randomly ascertained extended families 1,122 individuals examined to date Genotyping: 1M SNPs 500 whole exome sequence 1000 whole genome sequence Transcriptional profiling of lymphocytes Phenotypes: structural & functional brain imaging, neurocognitive assessments, psychiatric diagnoses Glahn et al, Biol Psychiatry, 2011 Oct 7 Recurrent Major Depression Consensus diagnosis based on interviews 215 individuals met criteria for lifetime rMDD (19% of the sample) 86 individuals clinically depressed at the time of the assessment. h2=0.463, p=4.0x10-6 Household effects were non-significant Glahn et al, Biol Psychiatry, in press 10 Top Ranked Neuroimaging Endophenotypes ERV P-value Genetic Correlation (ρg) Ventral Diencephalon Volume 0.240 3.9×10-3 -0.425 Parietal Hyperintensity Volume 0.282 7.8×10-3 0.569 Hippocampus Volume 0.204 1.2×10-2 -0.347 Pallidum Volume 0.203 1.3×10-2 -0.396 Cerebellar White Matter Volume 0.218 1.3×10-2 -0.443 Frontal Hyperintensity Volume 0.255 1.3×10-2 0.483 CorticoSpinal Tract (FA) 0.208 2.1×10-2 -0.900 Subcortical Hyperintensity Volume 0.213 4.1×10-2 0.473 Superior Parietal Gyrus Thickness 0.178 4.5×10-2 0.363 Thalamus Proper Volume 0.172 4.8×10-2 -0.294 Glahn et al, Biol Psychiatry, 2011 Oct 7 Subcortical Brain Regions & rMDD Glahn et al, Biol Psychiatry, 2011 Oct 7 The Goals: Genetic Analysis of Complex Phenotypes QTL Localization Where in the genome is the QTL located? QTL Identification What is (are) the gene(s) involved? QTL Allelic Architecture What are the specific QTNs? How many QTNs? What are their frequencies and effect sizes? The Search for “Functional” Variants Molecular Functionality Phenotype-Specific Functionality Localization of Human QTLs Linkage mapping Genome-wide linkage analysis in families Disequilibrium mapping Genome-wide association analysis in families or unrelateds Joint linkage/association mapping Simultaneous use of linkage and association information in families Localization of Human QTLs: Current Status Accumulated large numbers of QTL localizations but VERY FEW gene identifications. QTL region size from linkage: ~10-15Mb QTL region size from association: ~ 500kb Identification of the underlying genes will require deep comprehensive sequencing of these regions to find functional variants. QTL Effect Sizes QTL Type Heritability due to Variant Rare monogenic 1.0 α (sdu) Displacement between genotypic means >3.5 GWA-derived common variant <0.01 <0.15 GWA-Derived eQTL Effect Sizes: Best Case for Common Variants? From the San Antonio Family Study N = 1,240 Mexican Americans PBMC-derived transcriptional profiles QTL Effect Sizes α (sdu) Displacement between genotypic means >3.5 QTL Type Heritability due to Variant Rare monogenic 1.0 Rare partially penetrant >0.10 in a given pedigree 0.50 < α < 3.5 GWA-derived common variant <0.01 <0.15 Growing Realization from GWA Studies Associated common variants account for only a small proportion of the observed heritability of a given trait (disease). Inference: GWA fails to detect the biological signal of most functional variants. What accounts for this “missing heritability”? Growing Realization from Deep Sequencing Studies Vast amount of rare (even private) sequence variation in human populations. Ever increasing evidence that rare functional variants contribute to disease risk. Most Important DNA Variation is Rare! We need to find these! We now find these. Rare Variant Hypothesis Human quantitative trait variation has a substantial component due to the effects of “rare” sequence variants in multiple genes. Larger effects of rare variants will make diseaserelated gene discovery easier. It is easier to perform required functional experiments on variants with larger mean displacements. What is “Rare”? Classical Population Genetic Definition: MAF < 0.05 or MAF < 0.01 Private variants are the ultimate “rare” variant. They are pedigree-specific. How Will We Detect Rare Functional Variants? Major Study Designs of Relevance: Cases/Controls sampled from the tails of the distributions to enrich for rare variants. Pedigree-based studies that naturally exploit the increased probability of seeing additional copies of rare variants amongst relatives. How Can We Study Rare Variants: Return of the Family Study Very rare functional variants are best detected (and their genes identified) using a large pedigreebased design. Pedigrees allow observation of multiple copies of a private variant. Efficient! Allows exact imputation = free sequence! San Antonio Family Study Origin: San Antonio Family Heart Study (SAFHS) designed in 1991 to investigate the genetics of CVD in Mexican Americans Expanded to > 3,000 individuals from 80 families 3 recalls since 1991 Rich phenotypic data anthropometry, blood pressure, lipids, obesity, diabetes, inflammation, oxidative stress, hormones, osteoporosis, brain structure and function High density SNPs (1M) currently on all individuals Genome-wide transcriptional profiles from lymphocytes from Visits 1 and 4 miRNA profiling WGS (~1040) November 2011, Exome sequencing for all remainders in progress. QTL Localization Analysis Can Reduce the Causal Search Space Fewer tests = Greater power We’ve Been Missing the Rare Variant Signals: Pedigree-Specific and Lineage-Specific Analysis Recovers Them 1.0 linkage analysis of all pedigrees 0.9 0.8 0.7 linkage analysis of pedigree carrying private variant power 0.6 0.5 0.4 measured genotype analysis of IBD status (relative to private variant carrying founder) 0.3 0.2 measured genotype analysis of private variant 0.1 0.0 1 2 3 4 5 ‐lg p‐value cut‐off 6 7 8 Assumes α = 2 Pedigree-Specific Analysis to Detect Rare Variants Analysis of spatial working memory in SAFS. Localizes one QTL at chr12:143cM with LOD = 3.64 Pedigree-specific analysis reveals three additional QTLs: Chr 8:67cM LOD=4.09 Chr 4:22cM LOD = 3.85 Chr 13:12cM LOD = 3.09 Classical QTL linkage analysis misses rare variant signals due to signal attenuation induced by assumption of single QTL-specific heritability. Whole Genome Sequencing in the SAFS T2D-GENES Consortium NIH U01: DK085524, DK085584,DK085501, DK085526, DK085545 Principal Investigators: DK085524: Duggirala, Blangero,Lehman DK085584: Boehnke, Abecasis DK085501: Hanis, Cox, Bell DK085526: Altshuler, Meigs, Wilson DK085545: McCarthy, Seielstad T2D-GENES Project 2: Whole Genome Sequencing in the SAFS Main task: Detect rare (even private) functional variants influencing diabetes risk and diabetes-related phenotypes Detection of private functional variants requires multiple copies leading to need for large pedigrees Assessed available pedigrees for potential to generate large number of copies of private variants, sequencing efficiency, diabetes prevalence Sequencing performed at Complete Genomics. ~600 samples at 40x coverage. ~500 completed to date. San Antonio Family Study: Pedigree 1 Requires 81 direct WGS to obtain 129 total WGS 171 subjects, 129 measured SAFS Pedigree 1: Founder 1 Lineage 105 possible copies of founder private variant Distribution of Private Variant Copies Founders in Pedigree 1 Founder 1: Prob(≥10 copies) = 0.43 WGS in Mexican American Pedigrees PedID DNA SEQ Founders Max copies Pr(≥10) 6 64 43 28 44 0.762 5 69 40 28 58 0.943 8 68 31 21 32 0.82 3 78 39 25 53 0.876 17 42 25 17 39 0.63 2 87 48 30 49 0.919 10 64 42 26 43 0.735 4 64 41 29 48 0.821 27 35 18 12 30 0.373 20 36 21 15 18 0.17 47 22 12 7 19 0.164 21 35 20 15 21 0.237 7 62 32 25 53 0.782 9 60 29 26 44 0.884 11 62 31 17 54 0.824 16 48 30 17 39 0.676 14 40 24 19 23 0.436 15 42 29 20 38 0.712 23 32 18 14 21 0.237 25 33 21 12 30 0.650 Total 1043 594 403 37.8 1 Power to Detect a Private Variant by Localization Hypothesis Gene GWA Linkage WGS Assumes α = 1 Power to Detect a Private Variant by Effect Size α=1.5 α=1 α=0.5 SNP Summary: 483 WGS QC Class # SNPs % in dbSNP Ts/Tv Passed 25,868,029 29.2 2.11 Failed 943,387 33.3 1.57 # Copies 1 2 3 ≥4 # SNPs 7,494,983 3,496,336 1,765,907 13,110,803 Sequence Variation in Candidate Genes for Psychiatric Diseases Identified in WGS (n=483) Gene Size(bp) 1125291 Number of Variants 41634 Fraction Novel 0.52 Number of NS Variants 41 NRG1 DISC1 512618 97826 0.56 376 DTNBP1 140231 4302 0.60 4 FGF14 680920 8046 0.50 2 COMT 28234 1889 0.49 39 Subcortical Brain Regions & rMDD Glahn et al, Biol Psychiatry, 2011 Oct 7 Example: Pallidum Volume Subcortical structure associated with modulation of motor actions and cognitive functions. h2 = 0.718 p = 1.2×10-22 N = 820 GWA: Pallidum Volume Chromosome 14q13.3 rs3850281 p = 3×10-8 MAF = 0.174 β = 0.306 sdu Pleiotropic Effects of rs3850281 on Subcortical Regions Subcortical Region P-value Effect Direction Accumbens 0.0044 ↑ Amygdala 0.00062 ↑ Caudate 0.0035 ↑ Hippocampus 0.0131 ↑ Putamen 0.000074 ↑ Thalamus 0.00096 ↑ Ventral Diencephalon 0.00060 ↑ Chromosome 14 QTL: Pallidum NKX2-1 NKX2-1 is a transcription factor that is a major player in forebrain development and in the pallidum. Haploinsifuciency leads to disturbances of motor abilities and delayed speech. Sequence Variants in NKX2-1: 191 individuals Previously unknown rare variant Conclusions Family studies will become important (again!) since they are an optimal design for studying the effect of rare functional variants on quantitative variation. Family-based designs are efficient for whole genome sequencing due to ability to impute sequence of nonfounders. The causal state space of the genome is finite. WGS allows us to comprehensively assess it. WGS will greatly speed causal variant/gene identification but we’ll have to intelligently sort through vast amounts of data to achieve success. Acknowledgements Texas Biomed John Blangero Claire Bellis Eugene Drigalenko Matthew Johnson Harald Göring Thomas Dyer Melanie Carless Juan Peralta Complete Genomics Steve Lincoln Jason Laramie Rick Tearle Jack Kent Jr. Joanne Curran Michael Mahaney Eric Moses Anthony Comuzzie Vince Diego Marcio Almeida Yale David Glahn Anderson Winkler UTHSCSA Rene Olvera Peter Fox T2D-GENES: WGS Project Team Goo Jun, Tanya Teslovich, Andy Wood, Tim Frayling, Christian Fuchsberger Dan Nicolae, Jason Grunstad, Bob Grossman + OTHERS T2D-GENES Consortium Mike Boehnke Gonçalo Abecasis David Altshuler Nancy Cox Mark McCarthy Craig Hanis Jose Florez Graham Bell Mark Seielstad Donna Lehman
© Copyright 2026 Paperzz