Rare Variants and the Return of Family Studies

Rare Variants and the
Return of Family Studies
Laura Almasy
Department of Genetics
Texas Biomedical Research Institute
Epidemiology
The potential risk factors are effectively infinite and
cannot be searched exhaustively.
Genetics
The genome is finite and can be searched exhaustively!
The key is finding the right
phenotypes
Endophenotype
•Heritable
•Associated with the illness
•Independent of clinical state
•Co-segregate with illness within a family
•Found in some unaffected relatives
Gould & Gottesman, Genes Brain Behav, 2006
Optimal Endophenotypes
Joint genetic determination of endophenotype
and disease risk is fundamental to the
endophenotype concept
Criteria of Optimal Endophenotype:
Must be heritable
Must be genetically correlated with the illness
(pleiotropy)
Glahn et al, Biol Psychiatry, 2011 Oct 7
Example: Endophenotype
Discovery in Major Depression
Search for neuroimaging-derived
endophenotypes genetically correlated
with risk of major depressive disorder.
Optimal Endophenotypes
Joint genetic determination of endophenotype
and disease risk is fundamental to the
endophenotype concept
Criteria of Optimal Endophenotype:
Must be heritable
Must be genetically correlated with the illness
(pleiotropy)
Glahn et al, Biol Psychiatry, 2011 Oct 7
Endophenotypic Ranking Value
The ERV measures the potential utility
of an endophenotype for a given illness.
ERVie =
2
2
|√hi √he ρg|
h2i =heritability of the illness
h2e=heritability of the endophentype
ρg=genetic correlation
Glahn et al, Biol Psychiatry, 2011 Oct 7
Properties of the ERV
Values vary between 0 and 1
Higher values indicate stronger shared genetic
influence
Very large numbers of endophenotypes can be
efficiently assessed
Applicable to any heritable disease and any set of
potentially relevant traits
Genetics of Brain Structure and
Function Study
Extension of San Antonio Family Heart
1,500 Mexican-American individuals from ~50 randomly
ascertained extended families
1,122 individuals examined to date
Genotyping: 1M SNPs
500 whole exome sequence
1000 whole genome sequence
Transcriptional profiling of lymphocytes
Phenotypes: structural & functional brain imaging,
neurocognitive assessments, psychiatric diagnoses
Glahn et al, Biol Psychiatry, 2011 Oct 7
Recurrent Major Depression
Consensus diagnosis based on interviews
215 individuals met criteria for lifetime rMDD (19% of
the sample)
86 individuals clinically depressed at the time of the
assessment.
h2=0.463, p=4.0x10-6
Household effects were non-significant
Glahn et al, Biol Psychiatry, in press
10 Top Ranked Neuroimaging
Endophenotypes
ERV
P-value
Genetic
Correlation (ρg)
Ventral Diencephalon Volume
0.240
3.9×10-3
-0.425
Parietal Hyperintensity Volume
0.282
7.8×10-3
0.569
Hippocampus Volume
0.204
1.2×10-2
-0.347
Pallidum Volume
0.203
1.3×10-2
-0.396
Cerebellar White Matter Volume
0.218
1.3×10-2
-0.443
Frontal Hyperintensity Volume
0.255
1.3×10-2
0.483
CorticoSpinal Tract (FA)
0.208
2.1×10-2
-0.900
Subcortical Hyperintensity Volume
0.213
4.1×10-2
0.473
Superior Parietal Gyrus Thickness
0.178
4.5×10-2
0.363
Thalamus Proper Volume
0.172
4.8×10-2
-0.294
Glahn et al, Biol Psychiatry, 2011 Oct 7
Subcortical Brain Regions & rMDD
Glahn et al, Biol Psychiatry, 2011 Oct 7
The Goals: Genetic Analysis of
Complex Phenotypes
QTL Localization
Where in the genome is the QTL located?
QTL Identification
What is (are) the gene(s) involved?
QTL Allelic Architecture
What are the specific QTNs? How many QTNs?
What are their frequencies and effect sizes?
The Search for “Functional” Variants
Molecular Functionality
Phenotype-Specific
Functionality
Localization of Human QTLs
Linkage mapping
Genome-wide linkage analysis in families
Disequilibrium mapping
Genome-wide association analysis in families
or unrelateds
Joint linkage/association mapping
Simultaneous use of linkage and association
information in families
Localization of Human QTLs:
Current Status
Accumulated large numbers of QTL localizations
but VERY FEW gene identifications.
QTL region size from linkage: ~10-15Mb
QTL region size from association: ~ 500kb
Identification of the underlying genes will require
deep comprehensive sequencing of these
regions to find functional variants.
QTL Effect Sizes
QTL Type
Heritability
due to Variant
Rare monogenic
1.0
α (sdu)
Displacement
between genotypic
means
>3.5
GWA-derived common
variant
<0.01
<0.15
GWA-Derived eQTL Effect Sizes:
Best Case for Common Variants?
From the San Antonio Family Study
N = 1,240 Mexican Americans
PBMC-derived transcriptional profiles
QTL Effect Sizes
α (sdu)
Displacement
between genotypic
means
>3.5
QTL Type
Heritability
due to Variant
Rare monogenic
1.0
Rare partially
penetrant
>0.10 in a
given pedigree
0.50 < α < 3.5
GWA-derived common
variant
<0.01
<0.15
Growing Realization from
GWA Studies
Associated common variants account for only a
small proportion of the observed heritability of a
given trait (disease).
Inference: GWA fails to detect the biological signal
of most functional variants.
What accounts for this “missing heritability”?
Growing Realization from Deep
Sequencing Studies
Vast amount of rare (even private) sequence
variation in human populations.
Ever increasing evidence that rare functional
variants contribute to disease risk.
Most Important DNA Variation is Rare!
We need to find these!
We now find these.
Rare Variant Hypothesis
Human quantitative trait variation has a substantial
component due to the effects of “rare” sequence
variants in multiple genes.
Larger effects of rare variants will make diseaserelated gene discovery easier. It is easier to
perform required functional experiments on
variants with larger mean displacements.
What is “Rare”?
Classical Population Genetic Definition:
MAF < 0.05 or MAF < 0.01
Private variants are the ultimate “rare” variant.
They are pedigree-specific.
How Will We Detect Rare
Functional Variants?
Major Study Designs of Relevance:
Cases/Controls sampled from the tails of the
distributions to enrich for rare variants.
Pedigree-based studies that naturally exploit the
increased probability of seeing additional
copies of rare variants amongst relatives.
How Can We Study Rare Variants:
Return of the Family Study
Very rare functional variants are best detected (and
their genes identified) using a large pedigreebased design.
Pedigrees allow observation of multiple copies of a
private variant.
Efficient! Allows exact imputation = free sequence!
San Antonio Family Study
Origin: San Antonio Family Heart Study (SAFHS) designed in 1991 to
investigate the genetics of CVD in Mexican Americans
Expanded to > 3,000 individuals from 80 families
3 recalls since 1991
Rich phenotypic data
anthropometry, blood pressure, lipids, obesity, diabetes,
inflammation, oxidative stress, hormones, osteoporosis, brain
structure and function
High density SNPs (1M) currently on all individuals
Genome-wide transcriptional profiles from lymphocytes from Visits 1 and 4
miRNA profiling
WGS (~1040) November 2011, Exome sequencing for all remainders in
progress.
QTL Localization Analysis Can Reduce
the Causal Search Space
Fewer tests = Greater power
We’ve Been Missing the Rare Variant Signals:
Pedigree-Specific and Lineage-Specific
Analysis Recovers Them
1.0
linkage analysis of all pedigrees
0.9
0.8
0.7
linkage analysis of pedigree carrying private variant
power
0.6
0.5
0.4
measured genotype analysis of IBD status (relative to private variant carrying founder)
0.3
0.2
measured genotype analysis of private variant
0.1
0.0
1
2
3
4
5
‐lg p‐value cut‐off
6
7
8
Assumes α = 2
Pedigree-Specific Analysis to Detect
Rare Variants
Analysis of spatial working memory in
SAFS. Localizes one QTL at
chr12:143cM with LOD = 3.64
Pedigree-specific analysis reveals three
additional QTLs:
Chr 8:67cM LOD=4.09
Chr 4:22cM LOD = 3.85
Chr 13:12cM LOD = 3.09
Classical QTL linkage analysis misses rare variant signals due to signal
attenuation induced by assumption of single QTL-specific
heritability.
Whole Genome Sequencing in the SAFS
T2D-GENES Consortium
NIH U01: DK085524, DK085584,DK085501,
DK085526, DK085545
Principal Investigators:
DK085524: Duggirala, Blangero,Lehman
DK085584: Boehnke, Abecasis
DK085501: Hanis, Cox, Bell
DK085526: Altshuler, Meigs, Wilson
DK085545: McCarthy, Seielstad
T2D-GENES Project 2: Whole Genome
Sequencing in the SAFS
Main task: Detect rare (even private) functional variants
influencing diabetes risk and diabetes-related phenotypes
Detection of private functional variants requires multiple copies
leading to need for large pedigrees
Assessed available pedigrees for potential to generate large
number of copies of private variants, sequencing efficiency,
diabetes prevalence
Sequencing performed at Complete Genomics. ~600 samples at
40x coverage. ~500 completed to date.
San Antonio Family Study: Pedigree 1
Requires 81 direct WGS to obtain 129 total WGS
171 subjects, 129 measured
SAFS Pedigree 1: Founder 1 Lineage
105 possible copies of founder private variant
Distribution of Private Variant Copies
Founders in Pedigree 1
Founder 1: Prob(≥10 copies) = 0.43
WGS in Mexican American Pedigrees
PedID
DNA
SEQ
Founders
Max
copies
Pr(≥10)
6
64
43
28
44
0.762
5
69
40
28
58
0.943
8
68
31
21
32
0.82
3
78
39
25
53
0.876
17
42
25
17
39
0.63
2
87
48
30
49
0.919
10
64
42
26
43
0.735
4
64
41
29
48
0.821
27
35
18
12
30
0.373
20
36
21
15
18
0.17
47
22
12
7
19
0.164
21
35
20
15
21
0.237
7
62
32
25
53
0.782
9
60
29
26
44
0.884
11
62
31
17
54
0.824
16
48
30
17
39
0.676
14
40
24
19
23
0.436
15
42
29
20
38
0.712
23
32
18
14
21
0.237
25
33
21
12
30
0.650
Total
1043
594
403
37.8
1
Power to Detect a Private Variant
by Localization Hypothesis
Gene
GWA
Linkage
WGS
Assumes α = 1
Power to Detect a Private Variant
by Effect Size
α=1.5
α=1
α=0.5
SNP Summary: 483 WGS
QC Class
# SNPs
% in dbSNP
Ts/Tv
Passed
25,868,029
29.2
2.11
Failed
943,387
33.3
1.57
# Copies
1
2
3
≥4
# SNPs
7,494,983
3,496,336
1,765,907
13,110,803
Sequence Variation in Candidate Genes for
Psychiatric Diseases Identified in WGS (n=483)
Gene
Size(bp)
1125291
Number of
Variants
41634
Fraction
Novel
0.52
Number of
NS Variants
41
NRG1
DISC1
512618
97826
0.56
376
DTNBP1
140231
4302
0.60
4
FGF14
680920
8046
0.50
2
COMT
28234
1889
0.49
39
Subcortical Brain Regions & rMDD
Glahn et al, Biol Psychiatry, 2011 Oct 7
Example: Pallidum Volume
Subcortical structure associated
with modulation of motor actions
and cognitive functions.
h2 = 0.718
p = 1.2×10-22
N = 820
GWA: Pallidum Volume
Chromosome 14q13.3
rs3850281
p = 3×10-8
MAF = 0.174
β = 0.306 sdu
Pleiotropic Effects of rs3850281 on
Subcortical Regions
Subcortical Region
P-value
Effect Direction
Accumbens
0.0044
↑
Amygdala
0.00062
↑
Caudate
0.0035
↑
Hippocampus
0.0131
↑
Putamen
0.000074
↑
Thalamus
0.00096
↑
Ventral Diencephalon
0.00060
↑
Chromosome 14 QTL: Pallidum
NKX2-1
NKX2-1 is a transcription factor that is a major player in forebrain
development and in the pallidum.
Haploinsifuciency leads to disturbances of motor abilities and delayed
speech.
Sequence Variants in NKX2-1:
191 individuals
Previously unknown rare variant
Conclusions
Family studies will become important (again!) since
they are an optimal design for studying the effect of
rare functional variants on quantitative variation.
Family-based designs are efficient for whole genome
sequencing due to ability to impute sequence of nonfounders.
The causal state space of the genome is finite. WGS
allows us to comprehensively assess it.
WGS will greatly speed causal variant/gene
identification but we’ll have to intelligently sort
through vast amounts of data to achieve success.
Acknowledgements
Texas Biomed
John Blangero
Claire Bellis
Eugene Drigalenko
Matthew Johnson
Harald Göring
Thomas Dyer
Melanie Carless
Juan Peralta
Complete Genomics
Steve Lincoln
Jason Laramie
Rick Tearle
Jack Kent Jr.
Joanne Curran
Michael Mahaney
Eric Moses
Anthony Comuzzie
Vince Diego
Marcio Almeida
Yale
David Glahn
Anderson Winkler
UTHSCSA
Rene Olvera
Peter Fox
T2D-GENES: WGS Project Team
Goo Jun, Tanya Teslovich,
Andy Wood, Tim Frayling,
Christian Fuchsberger
Dan Nicolae, Jason Grunstad,
Bob Grossman + OTHERS
T2D-GENES Consortium
Mike Boehnke
Gonçalo Abecasis
David Altshuler
Nancy Cox
Mark McCarthy
Craig Hanis
Jose Florez
Graham Bell
Mark Seielstad
Donna Lehman