Genome- and Phenome-Wide Analysis of Cardiac Conduction

DOI: 10.1161/CIRCULATIONAHA.112.000604
Genome- and Phenome-Wide Analysis of Cardiac Conduction Identifies
Markers of Arrhythmia Risk
Running title: Ritchie et al.; QRS GWAS and PheWAS in electronic records
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Marylyn D. Ritchie, PhD1*; Joshua C. Denny, MD, MS2,3*; Rebecca L. Zuvich, PhD4*;
Dana C. Crawford, PhD4,5; Jonathan S. Schildcrout, PhD6; Lisa Bastarache, MS2; Andrea H.
Ramirez, MD3; Jonathan D. Mosley, MD, PhD3; Jill M. Pulley, MBA7; Melissa A. Basford,
MBA7; Yuki Bradford, MS5; Luke V. Rasmussen8; Jyotishman Pathak, PhD9; Christopher G.
Chute, MD, DrPH9; Iftikhar J. Kullo, MD10; Catherine A. McCarty, PhD11; Rex L. Chisholm,
PhD12; Abel N. Kho, MD, MS13; Christopher S. Carlson, PhD14; Eric B. Larson, MD, MPH15;
Gail P. Jarvik, MD, PhD16,19; Nona Sotoodehnia, MD, MPH17,18 on behalf of the CHARGE QRS
Group; Teri A. Manolio, MD, PhD21; Rongling Li, PhD21; Daniel R. Masys, MD20;
Jonathan L. Haines, PhD4,5; Dan M. Roden, MD3,22
1
Dept of Biochemistry and Molecular Biology, The Pennsylvania State University, Uni
University
nive
vers
ve
rsit
ity
it
y Pa
Park
Park,
rk,, PA
rk
PA;;
2
Dept of Biomedical Informatics, 3Dept of Medicine; 4Dept of Molecular Physiology and Biophysics;
5
Center for Human Genetics Research; 6Dept of Biostatistics, 7Office of Research; 22Dept of
Pharmacology,
Vanderbilt
Medicine,
Phar
arma
maco
ma
colo
co
l gy
lo
gy, Va
and
n erbilt University School of Me
edi
dici
c ne, Nashville, TN; 8De
ci
Dept
p of Preventive Medicine;
122
13
Dept
D
De
eppt
pt ooff Ce
Cell
ell and
ndd M
Molecular
olecular Biology; Dept
p of Me
Medicine,
edi
d cine, Northweste
Northwestern
ern University
Uniive
versity
y School of Medicine,
9
10
Chicago
C
Chic
hiccag
a o IL; Di
Divs
D
ivs
vs ooff Bi
Biom
Biomedical
omed
om
edic
ed
i al IInformatics
ic
nfor
nf
orma
or
mati
ma
tics
ti
cs aand
nd S
Statistics;
tattisstics;
ta
s;; Di
Div
v off C
Cardiology,
ardi
ar
d ol
di
olog
ogy, M
og
Mayo
ayo
ay
o Cl
C
Clinic,
inic
in
ic, Ro
ic
Roch
Rochester
ches
ch
e ter
es
MN;
MN; 11Esse
Essentia
ent
ntiia IInstitute
nsstiitu
tutte
te ooff Ru
Rura
Rural
rall Hea
H
Health,
ealth
h, Du
Duluth,
ulu
uth, MN
M
MN;; 1144Fred
Fred
dH
Hutchinson
utcchin
ut
chinson
n Canc
C
Cancer
anccer R
Research
esseaarc
rch
h Cent
C
Center,
enteer,,
Seattle,
WA;
S attle, W
Se
A 1155G
A;
Group
rouup He
Health
ealth
hR
Research
e eaarc
es
rchh Institute,
Instiitu
ute, S
Seattle,
eaatt
ttle
lee, WA
WA; 16D
Div
iv of M
Medical
ed
dic
icaal G
Genetics;
enetticcs; 1177D
Div
iv of
of
18
19
Cardiology;
Card
Ca
rdio
rd
i lo
io
ogy
gy;; Ca
Cardiovascular
Card
r io
rd
ovaasc
scul
u ar H
Health
ealt
ea
lth
lt
h Rese
R
Research
eseear
arcch
ch U
Unit,
niit, D
Dept
ep
pt of M
Medicine;
edic
ed
iccin
ne; De
Dept
p ooff Ge
pt
G
Genome
nom
no
me S
Sciences;
ciien
encces;
s;;
20
21
Dept
De
pt off Bi
Biom
Biomedical
med
ediccal aand
nd Health
Hea
e lt
lth
h Informatics,
Info
In
formatic
ics,
s, U
University
nive
ni
vers
r ity
y of W
Washington,
ashi
as
hing
n to
on, S
Seattle,
eattlee, WA
ea
WA;; 21
National
Na
ati
tion
onal
al H
Human
uman
um
a
Genome
Research
Health,
Bethesda,
Geno
Ge
noome R
essea
e rc
rch
h Institute,
Inst
In
s it
st
itut
utte,
e National
Nat
atio
io
ona
nall Institutes
Inst
In
s it
st
itut
uttes ooff He
eal
alth
th
h, Be
eth
thes
esda
es
d , MD
da
*Contributed
*Con
*C
ontr
trib
ibut
uted
ed eequally
qual
qu
allly
Address for Correspondence:
Dan M. Roden, MD
Professor of Medicine and Pharmacology
Director, Oates Institute for Experimental Therapeutics
Assistant Vice-Chancellor for Personalized Medicine
Vanderbilt University School of Medicine
1285 Medical Research Building IV
Nashville, TN 37232-0575
Tel: 615-322-0067
Fax: 615-343-4522
Email: [email protected]
Journal Subject Codes: Basic science research:[132] Arrhythmias - basic studies, Diagnostic
testing:[106] Electrophysiology, Etiology:[5] Arrhythmias, clinical electrophysiology, drugs
1
DOI: 10.1161/CIRCULATIONAHA.112.000604
Abstract:
Background—Electrocardiographic QRS duration, a measure of cardiac intraventricular
conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote
reentrant arrhythmias.
Methods and Results—We performed a genome-wide association study (GWAS) to identify
genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from
electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and
Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE
consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were
g
in the chromosome 3 SCN5A and SCN10A loci,, where the most significant
SNPs were rs1805126
n SCN5A with p=1.2x10-8 (eMERGE) and p=2.5x10-20 (CHARGE) and rs6795970
rs679597
9770 in SCN10A
SCN1
SC
N10A
N1
0A
in
with p=6x10-6 (eMERGE) and p=5x10-27 (CHARGE). The other loci were in NFIA, near
CDKN
CDKN1A,
N1A, and near C6orf204. We then performed phenome-wide
p enome-wide association studies (PheWAS)
ph
these
European
Americans
search
diagnoses
on
n vvariants
ariantts in the
aria
hese
se ffive
iv
ve lo
loci
c inn 13
113,859
,8
859
9 Eur
u opea
e nA
meriica
cans
n to se
earch
h ffor
or dia
agn
gnos
o es associated
asssocciated
w
with
ith
h these markers.
mar
arke
kerss. PheWAS
ke
PheW
Ph
eW
WAS iidentified
dennti
de
ntifie
ifiedd atri
atrial
ial fibr
fibrillation
rilllatiion aand
nd ccardiac
arrdiac aarrhythmias
rrhy
rr
hyth
thm
th
miass aass th
mias
the most
mos
co
common
omm
mon ass
associated
sssoc
o ia
iate
tedd di
diag
diagnoses
gnose
sees wi
w
with
th
h SC
SCN
SCN10A
N10A
N10A
A and
and SC
SCN5A
SCN5
CN5
N5A
A va
variants.
ariian
nts. SCN
SCN10A
N10A va
N10A
variants
ari
rian
ants
ts we
were
ere al
ere
also
lso
so
associated
original
asso
oci
ciat
ated
ed with
witth subsequent
subs
bseq
equent
nt development
dev
evel
elop
o ment
nt of
of atrial
atri
at
rial
al fibrillation
fibri
rill
llat
atio
ionn and
an arrhythmia
arrh
ar
rhyt
y hm
mia iin
n th
the or
orig
i in
nal 55,272
,2272
7
“heart-healthy”
study
population.
“heart-healt
thy
hy”” st
stud
udyy po
ud
popu
pu
ula
lattion
n.
Conclusions—We conclude that DNA biobanks coupled to EMRs provide a platform not only
for GWAS but may also allow broad interrogation of the longitudinal incidence of disease
associated with genetic variants. The PheWAS approach implicated sodium channel variants
modulating QRS duration in subjects without cardiac disease as predictors of subsequent
arrhythmias.
Key words: cardiac conduction, QRS duration, atrial fibrillation, genome-wide association
study, phenome-wide association study, electronic medical records
2
DOI: 10.1161/CIRCULATIONAHA.112.000604
Introduction
Electrocardiographic (ECG) parameters of cardiac conduction and repolarization, PR, QRS, and
QT intervals, are widely used in clinical medicine and display substantial variability when
measured across large populations. QRS duration represents activation time in the cardiac
ventricle, and prolongation of this interval – representing global or regional slow conduction –
has been associated with adverse outcomes such as sudden cardiac death.1 This variability
reflects modulators such as abnormal electrolytes, underlying heart disease, or concomitant drug
therapy, as well as heritable components; published estimates suggest that up to 40% of
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
variability in the QRS interval is heritable.2–4 Using automated methods described further below,
we have shown that 99% of QRS durations in over 30,000 normal subjects not receiving
reece
ceiv
i in
iv
ingg
confounding medications fall between 65 to 108 msec.5
DNA
DN
A repo
repositories
possito
po
sit ries linked to Electronic Me
Medi
Medical
ical Record (EM
(EMR)
EM
MR) ssystems
ys
ystems
have been
proposed
prop
posed as on
one
ne po
potential
oteentia
iaal sour
ssource
our
urce
ce ooff su
subj
subjects
bjeectss ffor
or analyzing
ana
nalyzzin
zing
ng the
the relationship
rela
re
lati
tion
onsh
s ip
ip bbetween
ettwe
weeen
en ggenetic
en
nettic
ic
–10
–10
variation
vari
va
riat
ri
atio
at
ionn and
io
an
nd a range
rang
ngee of
of human
huma
hu
mann traits.
ma
trai
tr
aits
ai
ts.66–10
The
The advantages
adv
dvan
nta
tage
gess of
ge
of this
thi
h s approach
ap
ppr
p oach
ch may
may include
inc
nclu
ludee rapid
lu
rap
a id
d
generation ooff patient
pati
pa
tien
entt sets
en
s ts
se
t for
for study
stu
udy (since
(si
s nc
ncee electronic
elec
el
ectr
ec
trron
nic
i ddata
atta ar
aree al
alre
read
re
ad
dy in pplace),
lac
ace)
e),, an
andd th
thee ability to
already
study large numbers of subjects accrued without bias with respect to factors such as disease or
age. The National Human Genome Research Institute’s electronic MEdical Records and
GEnomics (eMERGE) Network11 has, as one of its primary goals, the evaluation of the utility of
EMR systems coupled to DNA repositories as a tool for genome science. Initial studies from
eMERGE sites support the potential utility of EMR systems for discovery and validation of
genotype-phenotype associations.12–16
We report here a GWAS of QRS duration in European-descent subjects whose first ECG
in an EMR system was normal, and who at the time of the ECG lacked evidence of heart disease,
3
DOI: 10.1161/CIRCULATIONAHA.112.000604
potentially confounding medications, and electrolyte abnormalities. We then used a phenomewide association study (PheWAS)16,17 to demonstrate that polymorphisms associated with QRS
variability in subjects without cardiac disease were also associated with subsequent diagnoses of
cardiac arrhythmias. This coupling of GWAS and PheWAS further validates the relationship
between abnormal conduction and arrhythmias, and points to the development of genome-based
predictors of arrhythmia susceptibility.
Methods
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Normal QRS algorithm
We developed and deployed an algorithm to identify individuals with normal ECG
CGss an
CG
nd wi
with
thou
th
o t
ECGs
and
without
any cardiac disease, abnormal electrolyte values, or QRS-active medications – across the five
eM
MER
ERGE
GE-1
GE
-1 ssites
ites
ess iidentified
d ntified 5,272 Caucasian pati
de
tien
ti
e ts (2,488 male
en
es an
nd 2,78
22,784
,7 4 females; Table 1).
eMERGE-1
patients
males
and
T
hee algorithm
a gorithm
al
m was
wass developed
d ve
de
velo
lope
lo
pedd and
pe
and validated
va idaatedd inn thee S
vali
ynnth
ntheti
heticc De
D
rivvati
tive
vee ((SD),
SD
D), a dde-identified
e-id
iden
id
en
ntiifi
fied
ed
d
The
Synthetic
Derivative
mag
agee of tthe
he V
an
nderb
derbiiltt EM
EMR
R th
that
at ccurrently
urrrent
ur
rent
ntly
ly co
ont
n ainns oover
verr 12
ve
1200 m
ill
lllio
ionn do
ocu
ume
ment
ntss on
on aabout
boout 2
image
Vanderbilt
contains
million
documents
million patients.
patien
en
nts
ts.7 T
The
h S
he
SD
D is rrefreshed
e re
ef
resh
s ed
sh
e rregularly
egul
eg
ular
ul
arly
ar
ly tto
o ad
addd ne
new
w cl
cclinical
inic
in
icall iinformation
ic
nfor
nf
orma
or
mati
ma
tion
on ffrom
rom
ro
m th
tthee EMR aass
it is accrued.
The study population consisted of subjects with a normal ECG without evidence of
cardiac disease any time before or within one month following the ECG, concurrent use of
medications that interfere with ventricular conduction, and who did not have abnormal
potassium, calcium, or magnesium lab values at the time of the ECG. The algorithm has been
described in detail previously.13 The algorithm used natural language processing (NLP)18,19 to
analyze narrative text, billing code queries, and lab queries to exclude any subjects with evidence
of arrhythmia, heart failure, cardiomyopathy, myocardial ischemia/infarct, or cardiac conduction
4
DOI: 10.1161/CIRCULATIONAHA.112.000604
defect. The algorithm considered all physician-generated clinical documentation, including
clinical notes and cardiologist-generated ECG impressions. Patients with family histories of
cardiac disease were allowed by the NLP algorithm. In addition, ECGs had normal Bazett’s
corrected QT intervals (<450ms), heart rates (between 50-100 bpm), and QRS (60-120 ms). The
algorithm was reviewed by two physicians not involved in algorithm development, and achieved
a PPV of 97% to identify patients with normal ECGs who did not have known exclusions on a
random selection of 100 subjects.13 Analysis of clinical covariates in ~30,000 records with
algorithm-defined normal ECGs identified gender and ancestry as modulators of QRS duration.5
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Complete details of the algorithm are available from PheKB (http://phekb.org/).
The final algorithm was applied in BioVU, the Vanderbilt DNA databank
nkk that
tha
h t li
link
links
nkss DN
nk
D
DNA
NA
extracted from discarded blood samples to the SD.7 Patients at Vanderbilt were genotyped
peccif
ific
ical
ally
lly ffor
or tthe
he ppurpose
urpose of studying normal QRS
QRS duration. Th
he algo
goori
rith
t m was then deployed
specifically
The
algorithm
across
ac
cro
oss
s the DNA
NA
A rrepositories
epos
osit
i orie
orie
iess at the
thee oth
oother
ther four
fouur eMERGE-I
eME
ERGEE-II sites
Esiitees (Marshfield
(M
Mar
arsh
shhfi
f eldd Clinic,
Clin
Cl
inic
icc, Northwestern
Norrthw
No
rthw
hwes
essteern
University,
Un
niv
iver
ersi
er
sity
ty
y, Ma
M
Mayo
yo C
yo
Clinic,
linnic,
li
c, G
Group
ro
oup H
Health
ealt
ea
lthh Re
Rese
Research
sear
arch
ar
c In
ch
Institute)
ns itu
nsti
ute
te)) to
o iidentify
deent
ntif
ifyy su
if
subj
subjects
bjec
ecctss w
with
ithh ex
it
extant
xtaant
n
eMERGE-based
eMERGE-bas
assed ggenotyping
enot
en
ottyp
y in
ng da
ddata
ta ((based
baseed oon
ba
n ot
othe
other
h r ph
he
phen
phenotypes;
e ot
en
otyp
ypes
yp
es;; as sho
es
shown
hown
ho
wn iin
n Table
Tabl
Ta
blee 11)) who
bl
w o met
wh
algorithm-defined criteria for normal QRS. The eMERGE cohorts are described in more detail in
McCarty et al. 201111 and at the Phenotype Knowledge Base (PheKB.org). Thus, all eMERGE
individuals used in the analysis underwent the same algorithm to select those with normal ECGs
and without prior heart disease, interfering medications, and abnormal electrolytes. To assess the
performance of the algorithm when applied within external EMR systems, trained chart
abstracters at Northwestern and Marshfield reviewed randomly-selected subsets of 100 subjects
at Marshfield and 45 subjects at Northwestern to determine the algorithm’s accuracy at external
sites. Northwestern’s evaluation also included an independent review by a board-certified
5
DOI: 10.1161/CIRCULATIONAHA.112.000604
internal medicine physician, with discrepancies resolved by consensus. This study included only
subjects designated as “non-Hispanic white” European American in the EMR from each site.
We have previously shown the EMR ancestry performs similar to self-report.20
This study was approved by each site’s Institutional Review Board. Because BioVU is
de-identified and accrues individuals through left-over blood remaining after routine clinical
testing, it operates as non-human subjects research according to the provisions of 45 CFR 46, as
described previously.7 Individuals at other eMERGE sites were consented as part of each site’s
DNA biobank.11
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Genotyping and data analysis
Genotyping was performed at the Center for Genotyping and Analysis at the Bro
Broad
oad In
Inst
Institute
sttittut
utee an
and
the
he Center for Inherited Disease Research (CIDR) at Johns Hopkins University. Samples of
European
Eu
uro
rope
pean
pe
an ancestry
anc
ncestr
trry or
or unknown ancestry were analyzed
ana
naly
na
yzed using thee Illumina
I lumi
Il
minna
mi
na Human660WQuadv1_A
Q
uaadv1_A
ad
genotyping
gen
enot
otyypin
ingg platform,
plat
pl
atfo
at
forrm,
fo
rm, consisting
cons
co
n istting off 561,490
ns
561,,4900 SNPs
SNPs and
and 95,876
95,
5 87
8 6 in
intensity-only
nte
tens
nsit
ityy-oonly
only pprobes.
robe
ro
bes.
be
s
Data
were
cleaned
using
(QC)
pipeline
Genomics
Da
ata
t w
eree cl
er
clea
eane
need us
usi
ing th
ing
the qu
qquality
ali
lity
ty ccontrol
ontr
ontr
trol
ol (QC
QC)) pi
QC
ipeeli
line
ne ddeveloped
eve
velo
lo
ope
pedd bby
y tthe
hee eeMERGE
MERG
ME
RGE
RG
E Geno
G
eno
omics
mic
Working Grou
Group.
ouup.
p 21 T
This
hiis pr
pprocess
oces
oc
e s in
es
incl
includes
cllud
u es eevaluation
valu
va
luat
lu
a io
at
ionn of
o sa
samp
sample
mple
mp
le aand
n m
nd
marker
arke
ar
kerr ca
ke
ccall
lll rrate,
ate,
at
e, ggender
e der
en
mismatch and anomalies, duplicate and HapMap concordance, batch effects, Hardy-Weinberg
equilibrium (HWE), sample relatedness, and population stratification. After QC, 528,508 SNPs
were used for analysis based on the following QC criteria: SNP call rate >99%, sample call rate
>99%, minor allele frequency > 0.0001, unrelated samples only (removing all parent-offspring,
full and half siblings), and individuals of European-descent only (based on STRUCTURE22
analysis of >90% probability of being in the CEU cluster).
Each eMERGE site used the QC pipeline to clean their initial datasets prior to merging
all the samples. QC procedures were then performed on the merged eMERGE dataset in which
6
DOI: 10.1161/CIRCULATIONAHA.112.000604
data from all five sites were combined, and no significant differences across sites or genotyping
center were identified. As well, all sites had comparable QC results including similar SNP and
sample call rates, HWE p-values overall, and minor allele frequencies. The detailed QC report
on the merged dataset will be deposited in dbGaP along with the merged dataset.
Single-locus tests of association were performed using linear regression assuming an
additive genetic model for all 528,508 SNPs in a total of 5,272 individuals with a normal QRS
duration. Our studies of ECG intervals in 32,949 normal individuals identified sex as a major
modulator of normal QRS duration, with minor effects of age and ancestry.5 All analyses were
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
performed unadjusted and then adjusted for age, sex, BMI, and the first principal component
from Eigenstrat23 to adjust for potential population stratification, without signifi
significantly
icant
cant
ntly
ly
y changing
cha
hang
ngin
ng
ingg
the
he key results. Since only sex is significantly associated with QRS duration via the literature, we
report
epo
port
rt tthat
hatt here.
ha
herre. Analyses
he
Anal
An
a yses were also performed aadjusting
djusting for heigh
height
gh
ht and
and/or
nd
d/o
/or BMI, but these did not
no
change
ch
hange
an
n the res
results.
sullts.
Associations
Asso
As
soci
so
ciat
atiionns
ns with
witth the
t e lowest
th
lo
owe
west
st P values
val
alue
ues (p<10
ue
(p<
(p
<100-4
) were
were then
the
henn submitted
suubm
bmittted
ted to
t the
the recent
recen
en
nt QRS
QR
Cohorts for Heart
Hear
He
a t and
ar
and Aging
A in
Ag
ng Research
Reese
sear
arrch in
in Ge
Geno
nomi
no
m c Epidemiology
mi
E id
Ep
dem
emio
iolo
io
logy
lo
g ((CHARGE)
gy
CHAR
CH
ARGE
AR
G ) meta-analysis
GE
meta
me
ta-a
ta
-analysis
Genomic
group24 and a table of P-values from that analysis was generated. The CHARGE meta-analysis
of QRS duration has been described in detail previously24; briefly, it involved 40,407 individuals
selected from 15 sites restricted to those of European ancestry. Individuals with prior myocardial
infarction, heart failure, arrhythmias, pacemakers, antiarrhythmic medication use, or whose QRS
durations >120 ms were excluded.
To calculate the variance explained by all SNPs in the dataset, we analyzed the data using
GCTA.25 Only those SNPs with minor allele frequency > 0.01, genotyping efficiency > 99.9 and
HWE > 0.001 were included in the analysis (n=505,502 SNPs). The genetic relationship matrix
7
DOI: 10.1161/CIRCULATIONAHA.112.000604
(GRM) was computed for all 5272 subjects and all SNPs using GCTA. In order to eliminate
possible cryptic relationships, subjects with GRM>0.25 were pruned from the analysis, which
removed 310 subjects. The proportion of variance explained by either all SNPs or all SNPs
excluding the subset of 23 SNPs significant in the CHARGE GWAS was computed on the
remaining subjects for QRS duration. We compared this to a linear regression analysis using the
five SNPs in Table 2 to estimate the proportion of variance explained by these loci. All analyses
were adjusted for age, gender and the first principal component (previously computed).
Phenome-wide association study of QRS-associated SNPs
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
We selected the most significant SNP associations for analysis by PheWAS.16,17 For this
analysis, we combined the entire eMERGE cohort of European American individ
individuals
id
duaals ((n=13,859)
n=13
n=
13,8
13
,8559)
identified
dentified across the five eMERGE sites. These individuals represent a superset of the 5,272
individuals
ndiivi
vidu
dual
du
alss with
al
w th
wi
h nnormal
or
ormal
ECGs and without hear
heart
rt disease
dissease usedd forr the
di
the GWAS.
GWAS. To define
diseases,
Classification
ddise
i eas
a es, we qqueried
ueri
ue
r ed aall
lll IInternational
nter
nt
errna
nati
tion
onaal Cla
on
C
lassiificcation
on off Disease
Diseeasse (ICD),
Dise
(ICD),
CD) 9thh eedition,
diti
di
tion
ti
on, codes
co
ode
d s from
from
m the
the
respective
esp
spec
ecti
ec
t ve EMRs
ti
EMR
MRss of
of the
the five
fiv
ive eMERGE
eMER
eM
ERGE
ER
GE sites.
sit
ites
es..
es
The PheWAS
PhheW
eWAS
AS software
sof
o tw
war
aree uses
uses
e occurrences
occcur
urre
renc
re
nces
nc
e of
es
of ICD
ICD codes
code
co
dess to classify
de
cla
lass
la
ssif
ss
iffy each
each
h person
per
erso
sonn as having
so
one or more of 778 possible clinical phenotypes (typically diseases). For each disease, the
PheWAS algorithm constructs a control population by selecting all patients that do not have the
case disease or closely related diseases (e.g., a patient with a bundle branch block cannot serve as
a control for complete heart block). The PheWAS methodology has previously been validated
through rediscovery of known associations.16,17 Analysis of each phenotype then proceeds using
a pairwise analysis of all case and control groups for each tested SNP (n=23). We have observed
that positive predictive values increase when individual codes are present more than once in the
EMR, and here we required each case to have at least four instances of the same ICD code in a
8
DOI: 10.1161/CIRCULATIONAHA.112.000604
PheWAS case group. In addition, we did not analyze phenotypes occurring in less than 50
patients (a prevalence of 0.36% in the dataset). Association analyses were performed with
PLINK using logistic regression adjusted for age, gender, and the first three principal component
analyses as calculated by Eigenstrat, since on this larger population, the third principal
component was statistically significant.23 Analysis adjusted with and without principal
components did not substantively change the results. After identification of PheWAS case and
control groups using the PheWAS software, the association analyses were performed using
PLINK.26
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Survival analysis of QRS population
Following PheWAS analysis, we analyzed the original set of 5272 patients that m
met
ett ou
our
ur
algorithm definition for normal cardiac conduction/normal heart for subsequent development of
atrial
at
tri
rial
al ffibrillation
ibri
ib
rill
ri
llat
ll
atio
ti n an
andd cardiac arrhythmias with the
he SSCN5A
CN5A rs1805126
rs18051
51
126 and
and SCN10A rs6795970
SNPs.
S
SNP
NPs. Phenotype
Phenottyp
ypee definitions
defi
defi
fini
niti
tion
ti
onss were
on
weree drawn
dra
rawn
wn from
fro
om thee P
PheWAS
heW
eWA
eW
AS aanalysis
naaly
lysi
siss uusing
siing bi
bill
billing
llin
ingg code
ccodes.
ode
d s..
Kaplan-Meier
proportional
models
starting
Ka
apl
plan
an-M
an
-Mei
eieer
ei
er aanalysis
naly
lysi
ly
siss and
and Cox
Cox pr
prop
opor
orrtiion
onaal hhazard
a ardd mode
az
m
ode
dells
ls were
werre ccalculated,
a cu
al
ulaate
ted,
d, uusing
sing
si
ng tthe
hee st
tartiing
tart
i ng
time
ime as the initial
in
nit
itia
iaal normal
norm
no
rm
mal
a ECG
ECG with
wit
i h a time-to-event
time
ti
me-t
me
-ttoo eveent
n analysis.
anaaly
lysi
sis.
si
s. C
Cox
ox pproportional
ropo
ro
porrti
po
tion
o al hhazard
on
azar
az
ardd models
ar
were adjusted for age, sex, principal components as calculated above, and QRS duration.
Results
Population identification
We identified 5,272 Caucasian patients (2,488 males and 2,784 females; Table 1) across the five
eMERGE-I sites. The positive predictive value (PPV) of the automated phenotype algorithm to
find cases with normal ECGs and without exclusions at the development site, Vanderbilt, to
identify study subjects was 97% (95% confidence interval [CI] 91-99%).13 The PPV at
9
DOI: 10.1161/CIRCULATIONAHA.112.000604
Northwestern University and Marshfield Clinic were 97% (95% CI 83%-100%) and 100% (95%
CI 96%-100%), respectively. Combining all reviewed samples across the three sites, the PPV
would be 98% (95% CI 96%-100%). The mean QRS duration was 87.9 msec (standard deviation
9.5 msec; median 88.0 msec; Figure 1A).
GWAS results
A total of 528,508 SNPs passed quality control of eMERGE-supported Illumina 660Quad
genotyping data in these subjects. Figure 1B shows the genome-wide association analysis for
QRS duration adjusted for sex; the findings were near-identical for the unadjusted analysis.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
There was a single association between QRS duration and a SNP (rs1805126) in SCN5A,
msec
encoding the cardiac sodium channel gene, that survived Bonferroni correction (beta=1.002
(beta
beta
t =1
=1.0
.0002 m
sec
per copy of the T allele, p=1.45 x 10-8).
Th
et ta
aken
ken forward to the CHARGE QR
QRS
S meta-analysi
is co
ons
nsoortium
or
included 108
Thee se
set
taken
meta-analysis
consortium
SNPs
S
SNP
NPs with P-va
P-values
valu
l es <10
lu
<100-44. The
The retrieved
retr
re
t iev
tr
ieved P-values
P-vval
valuess for
for this
thiis set
sett divided
divvidded
di
ded into
i to two
in
two distinct
dissti
t nct
nct gr
groups:
roup
oups:: 23
SNPs
SN
NPs with
wit
ithh P-values
P-va
Pvallues
es in
in the
th
he CHARGE
CH
HAR
ARGE
GE set
sett from
fro
rom
m 10
1 --88 ttoo 10-27, and
an
nd 85
8 with
wit
ithh P-values
P-vaalu
P-va
ues > 0.003.
0.00033. These
Th
hese
23 associations
association
onns (Supplementary
(Su
Supp
pple
pp
leme
le
meent
ntar
arry Ta
ablle 1)
1) are
are located
l ca
lo
cate
teed inn the
the five
fiv
ivee loci
lo
oci with
witth the
t e lowest
th
lo
owe
west
st P values
Table
reported by the CHARGE consortium: 18/23 are in the chromosome 3 locus that includes SCN5A
and SCN10A, as well as other genes (e.g. EXOG and XYLB1). The other three loci are near
SLC35F1 and C6orf204 (chromosome 6), near CDKN1A (chromosome 6) and in NFIA
(chromosome 1). The most significant SNP for each locus is presented in Table 2. The locus
zoom plot (Supplementary Figure 1) shows little linkage disequilibrium (LD) in the
chromosome 3 region in HapMap Phase III (CEU), consistent with the suggestion that the
SCN5A-10A finding may actually indicate multiple independent associations.24 Specifically, the
most significant variants in SCN5A (rs1805126) and SCN10A (rs6795970) are not in LD
10
DOI: 10.1161/CIRCULATIONAHA.112.000604
(r2<0.20).
Using the GTCA25 approach, we estimated heritability for QRS at 31.1% (standard error
[SE] 6.9%, p=5.7 x 10-7) using all SNPs in the dataset. Conducting the analysis without the 23
SNPs significant in CHARGE decreased the estimated heritability to 30.3%, a decrease of 0.8%.
This was somewhat conservative compared to a linear regression model, which estimated an
adjusted r-square value of 1.6% for the five loci in Table 2.
PheWAS analysis
The PheWAS dataset consisted of 13,859 European-American subjects in the entire genotyped
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
eMERGE cohort. The analysis focused on the most significant SNPs in each of the five loci
associated with QRS (Table 3). While no associations survived a strict Bonferro
oni
ni ccorrection
orrre
rect
ctio
ct
ionn ffor
io
or
Bonferroni
significance
ignificance (p=0.05/778/5=1.3x10-5), the most significant associations were particularly relevan
relevant
car
ardi
diac
di
ac disease
dise
isease
see and
and
n demonstrated significantly
significantlly different
different patterns
n off associations
ns
asssociations for the five
as
too cardiac
QRS-associated
Q
QRS
RS-associated
ed
d lo
loci.
oci.
i T
The
he sstrongest
tron
tr
onge
on
gest
s as
st
asso
associations
socciattio
ionns ffor
or thee S
SNPs
NPs in both
bot
othh SCN5A
SCN5
SC
N5A
A an
andd SC
SCN
SCN10A
N10A we
N10A
were
r
wi
with
ith tthe
h ddiagnoses
he
iaagn
gnos
oses
ess ooff ca
cardiac
ardi
diacc arr
arrhythmias
rrhy
rr
hyth
hy
thm
mias
as ((p=7.21x10
p=7.
p=7.
7 21
1x10
x10--44 ffor
or SC
SCN10A
CN10A
N10A aand
nd p=1.1x10
p=1
=1.1
.1
1x1
x100-33 fo
for
or SCN5
SSCN5A)
CN5
N5A
A)
SCN1
N110A
0A,, at
atri
r al
a fib
bri
r ll
llat
attio
ionn (p
(p=8
=8.5
=8
.5
5x1
x100-44, Fi
Figure
Figu
gure
gu
re 22).
). Ta
Table
Tabl
ble 3 lists
bl
lis
ists
ts associations
ass
s occia
iati
tion
ti
onss for the
on
and, for SCN10A,
atrial
fibrillation
(p=8.5x10
most significant SCN5A (rs1805126)and SCN10A (rs6795970) SNPs, as well as those at the
other QRS-associated loci, chromosome 1 (rs2207790) and the two chromosome 6 loci
(rs6906287 and rs1321313), also graphed in Supplementary Figures 2-4. The CDKN1 and
C6orf204 loci were not associated with cardiac arrhythmias (p>0.3, with 80% power to detect
and OR>1.12 at p=0.05), and NFIA was weakly associated with cardiac arrhythmias (OR=0.91,
p=0.02). Supplementary Table 2 presents PheWAS association data for all 23 SNPs significant
in CHARGE. While most SNPs in a given gene displayed similar patterns of PheWAS
associations, rs11129801, the strongest SCN10A SNP in our adjusted analysis but a lesser
11
DOI: 10.1161/CIRCULATIONAHA.112.000604
association in CHARGE, had a very different PheWAS pattern, with the strongest associations
being epilepsy, uterine cancer, and migraines; atrial fibrillation was not associated (OR=1.07,
p=0.18). In agreement with these data, rs11129801 was only in weak linkage disequilibrium to
the other SCN10A SNPs, such as rs6800541 (r2=0.20). Likewise, the EXOG locus had a very
different PheWAS pattern of associations (prostatic hyperplasia, sexual and gender identity
disorders, liver disease, kidney disease, cerebral degenerations, diarrhea) despite being nearby
the SCN5A-10A region.
Analysis of QRS population for arrhythmias
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
After PheWAS analysis, we analyzed the original set of 5,272 patients that met our algorithm
definition for normal cardiac conduction/normal heart
r for subsequent developme
ent
nt ooff at
atri
rial
ri
al
development
atrial
fibrillation and cardiac arrhythmias with the SCN5A rs1805126 and SCN10A rs6795970. In this
po
opu
pula
lati
la
tion
ti
onn, 17
1173
3 (3
(3%
%) developed atrial fibrillation
on oorr atrial flutter aatt so
ome point at least one month
population,
(3%)
some
ffollowing
olllowing the normal
norm
no
rm
mall ECG,
ECG
CG,, and
and 605
60
05 (11%)
(1
11%
%) were
we coded
coded
d aass havi
hhaving
avi
v ng aany
ny aarrhythmia.
rrrhy
yth
thm
mia. Q
mia.
RS dduration
urat
ur
a io
at
on
QRS
tseelf w
as aassociated
ssoc
ss
ocia
iaateed w
itth
th ffuture
uttur
u e de
dev
velo
velo
opm
pmen
entt of
en
o aatrial
trria
i l fibrillation
fiibr
briilla
il ati
t on (p=0.015)
(p=0
(p
=0
0.0015
15)) by llogistic
ogiisti
og
is ic
itself
was
with
development
egression. Ass with
witth the
the eMERGE
eM
MER
ERGE
GE PheWAS,
Phe
h WA
WAS,
S, SCN10A
SCN1
SC
N10A
N1
0A
A rs6
s679
s6
7959
79
5970
59
70 was
was
a associated
ass
ssoc
ocia
oc
i teed with
ia
with both
regression.
rs6795970
arrhythmias (hazard ratio [HR]=0.81 per copy of the A allele, p=0.002) and atrial
fibrillation/flutter (HR=0.67 per copy of the A allele, p=0.001). Moreover, this association was
essentially unchanged when QRS was also included in the model (HR=0.68), indicating that the
association between rs6795970 and atrial fibrillation is independent of the association between
QRS and rs6795970. Similarly, the association between rs6795970 and cardiac arrhythmias was
independent of QRS (HR=0.80 without QRS). Our analysis did not demonstrate an association
between SCN5A rs1805126 and either atrial fibrillation (HR=1.2, p=0.14) or cardiac arrhythmias
(1.03, p=0.66) in the normal QRS population. Figure 3 presents a Kaplan-Meier plot for the
12
DOI: 10.1161/CIRCULATIONAHA.112.000604
relationship between rs6795970 and development of atrial fibrillation.
Discussion
The current study demonstrates common variants in the SCN5A-SCN10A locus are associated
with QRS duration in subjects without clinical evidence of prior heart disease. These patients
were derived from clinical practice settings, adding to the growing body of evidence of
supporting the utility of EMR-based genomic analysis.12–16,27 The data replicate findings from a
large meta-analysis24 drawn from multiple community populations, where information on
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
potential QRS modulators such as heart disease status and medications was not as precisely
controlled or excluded from all included studies. The major new finding here is th
that
hatt usi
using
sing
si
ng tthe
he
PheWAS study paradigm, we were able to examine the longitudinal associations of these
genomic
geno
ge
nomi
no
micc variants
mi
vari
va
riaants
tss on
on disease in a hypothesis-freee manner.
manner. This analysis
ana
naalysiis revealed
re
that SNPs in
SSCN10A
SCN
CN10A (rs6795970)
(rs67795
959770)) aand
nd SCN5A
SCN5
SC
N A (rs1805126)
N5
(rs1
(r
s180
80
051226)
26) str
strongly
ronngly
y aassociated
ssoc
ss
occia
iatted wi
with
ith Q
QRS
RS dduration,
uraatio
ur
ion,
n, aare
re aalso
re
lsoo
ls
associated
with
SCN10A
as
sso
soci
ciat
ci
a ed w
at
ith subsequent
ith
subseq
sub
bseq
equuent cardiac
cardi
diac
ac aarrhythmias.
rrhy
rr
hyth
hy
thm
th
mias
as. SC
as
SCN
CN10A
N10A specifically
speeci
c fica
call
ca
l y iss associated
ll
asssoc
ocia
iate
teed with
wiith
h atrial
atr
triiall
fibrillation. Th
These
hes
e e as
asso
associations
sooci
c attio
ions
n be
betw
between
twee
tw
e n rs
rs67
rs6795970
67795
9 97
9700 an
aand
d at
aatrial
rial
ri
al ffibrillation
ibrrilla
ib
lati
la
tiion aand
nd ccardiac
ardi
ar
diac
di
ac aarrhythmias
rrhythmiass
were also seen specifically in the original “heart healthy” study population, and were
independent of the SNP’s association with QRS duration. The latter finding suggests that while
variants at the SCN5A-SCN10A locus determine QRS and subsequent arrhythmia susceptibility,
they may do so by divergent (pleiotropic) pathways, or that conduction slowing occurs not only
in the ventricle but also in the atrium where it contributes to susceptibility to atrial fibrillation.
Importantly, the selection logic for the case selection algorithm in our GWAS required the
absence of cardiovascular disease at the time of the ECG. Therefore, these associations represent
subsequent development of cardiac arrhythmias in subjects with these variants. This result
13
DOI: 10.1161/CIRCULATIONAHA.112.000604
highlights the potential of the EMR, with multiple diagnoses and longitudinal follow-up, to
identify not only variants associated with disease susceptibility or trait variability, but also
subsequent outcomes associated with these variants.
Drugs that block SCN5A-encoded sodium channels slow ventricular conduction, prolong
QRS duration,28 and increased mortality following myocardial infarction in the Cardiac
Arrhythmia Suppression Trial (CAST).29,30 Available evidence supports the view that slow
conduction, particularly in the setting of scarred or ischemic myocardium, promotes reentrant
excitation that leads to fatal arrhythmias,31–33 and in CAST, longer QRS durations also predicted
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
increased mortality among patients treated with placebo.1 In addition, a genetic disease caused
by SCN5A loss-of-function mutations (Brugada Syndrome) is characterized by slowed
slo
owe
w d
ventricular conduction and an increased risk for fatal arrhythmias.34 Interestingly, previous
an
nal
alys
ysees
ys
es ooff va
variab
ablle
ab
le PR duration have also identified
identtif
ifieed strong assoc
cia
i tion
on
ns with
w th variants in
wi
analyses
variable
associations
13,35–37
35–37
37
7
SSCN10A,
SCN
CN10A,13,35–
and
and multiple
mullti
tipl
plee mechanisms
pl
mech
mech
hanis
an sms are
arre currently
currren
rently
y being
bein
ng examined
exam
ex
min
ined
e to
ed
to explain
explai
exp
plainn this
this effect:
eff
ffec
ff
ecct:
t
expression
ex
xpr
pres
essi
es
s on ooff SC
si
SCN10A-encoded
CN10A
N10A
A-enc
n od
oded
ed cchannels
hann
ha
nnnells in ca
card
cardiomyocytes
rdio
rd
iomy
myoccyt
my
ytes
es aand/or
nd
d/o
or ca
cardiac
ard
dia
iacc ne
neur
neurons,
uron
onss,
on
s, oorr
38–40
40
regulation
egulation off SC
SCN5
SCN5A-10A
N5AN5
A 100A ex
Aexpr
expression.
prres
essi
sion
o .38
on
Our PheWAS analysis suggests that SCN10A, SCN5A, and EXOG variants are associated
with a cardiac arrhythmia billing codes (entered either by physicians or professional coders); this
includes atrial fibrillation and flutter, supraventricular and ventricular tachycardia, cardiac arrest,
and other unspecified arrhythmias. SCN10A rs6795970 was specifically associated with atrial
fibrillation and flutter, which was noted by Pfeufer et al.36 but not Holm et al.35 Chambers et al.37
previously demonstrated associations between SCN10A rs6795970 and heart block and
ventricular fibrillation; we also noted an association with first-degree atrioventricular block
(p=0.009, Supplementary Table 2). Mouse studies have demonstrated expression of Nav1.8 in
14
DOI: 10.1161/CIRCULATIONAHA.112.000604
vagal and spinal afferents in gastrointestinal mucosa and myenteric plexes,41 and interestingly
SCN10A was also associated with cholecystitis in the PheWAS. Variants in CDKN1A and
C6orf204, however, were not associated with cardiac arrhythmias, although we cannot exclude
that weak associations may exist. In contrast, the C6orf204 locus seems most associated with
neoplastic disorders (colorectal cancer, prostatic hypertrophy, and melanoma); its strongest
cardiovascular disease association was atherosclerosis (odds ratio 1.094, p=0.03). Thus,
PheWAS suggests that although all these regions may be associated with QRS interval, only
those SNPs in the SCN5A-10A region seem significantly associated with subsequent
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
development of arrhythmias.
Recent growth in large GWAS meta-analysis has shown the power of large
larrge numbers
num
mbe
bers
rs tto
o
find genomic influences of given traits and disease. In this study, the development of the
ph
hen
enot
otyype
ot
ype algorithm
alggori
al
riith
thm
m at one site was followed byy its
itss use at four other
oth
th
her ssites
ite
tess with different EMR
phenotype
ysttem
e s. Algo
orith
riith
t m performance
perf
pe
rffor
orma
mannce
ma
nce was
was similar
siimila
laar to find
finnd patients
pattie
ienntss meeting
meeetin
ingg in
incl
cluusio
usio
on an
andd ex
xcllusio
usio
ionn
systems.
Algorithm
inclusion
exclusion
crrit
iter
eria
er
ia at
at th
thee th
thr
reee si
sit
tes w
ho eevaluated
valu
va
luat
lu
a ed
at
d tthe
he aalgorithm,
lg
gor
o itthm
m, pr
prov
oviidiing
ov
n fur
urth
ur
t er vvalidation
th
alid
al
idat
id
atio
at
ionn ooff tthe
he
criteria
three
sites
who
providing
further
16,42
42
ransportabili
lity
ty ooff EM
EMR
R ph
hen
enot
otyp
ypee al
yp
aalgorithms.
gori
go
riith
thms
ms..16
ms
Estimates
E
Es
tiima
mate
tees of her
heritability
erit
er
ittab
abil
ilit
il
ityy of Q
it
QRS
RS uusing
sing all
transportability
phenotype
SNPs are consistent with other reports; the heritability associated with the very limited subset of
23 SNPs implicated here appears to explain a surprisingly large proportion of this heritability
estimate. This analysis also indicates that, as with other traits, there is extensive “missing
heritability” when QRS duration is analyzed as a function of common genomic variation.
This report highlights both limitations and real and potential advantages of EMR-based
genomic research. The present study involved analysis of subjects accrued in the initial stages of
the eMERGE network, and as such a major limitation was the relatively small size of the study
set. Large consortia such as CHARGE have used meta-analysis to aggregate individual datasets
15
DOI: 10.1161/CIRCULATIONAHA.112.000604
across many sites and have demonstrated the power of the large numbers to generate highly
significant results by this approach. One of the key lessons in the eMERGE experience to date
has been that algorithms to identify cases and controls for genomic or other study can be
successfully deployed across multiple EMR systems.16,42,43 Thus, as the number of subjects with
dense genomic information across multiple EMR systems grows, this and other EMR-based
studies highlight the potential for accrual of increasingly large sample sets to identify genomic
predictors of variability in phenotypes such as physiologic traits or disease susceptibility.
Further, EMRs hold the promise, as suggested here, of examining longitudinal healthcare
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
outcomes, such as disease complications or response to drug therapies. Identifying cases for
uch studies requires especially large resources, since subsets of subsets (e.g. dru
ug rresponse
ug
espo
es
pons
po
nsee X in
ns
such
drug
Y are required. Current efforts demonstrate the feasibility of accrual
r of DNA collections
disease Y)
co
oup
uple
ledd to EMRs
le
EMR
MRss large
la
u
statisti
tica
ti
callly valid analyses
analys
yses of rare
ys
rar
a e variants and/or rare
coupled
enough to support
statistically
cllin
nic
ical events.
s In
I the
he current
cuurre
urre
rennt
nt eMERGE
eME
ERG
RGE
E network,
nettwork,, there
there
ree are
are 9 sites
sit
ites
es with
wit
ithh EMR-based
EMREM
R-ba
Rbase
seed biobanks
biob
bi
obaank
ob
a nk s
clinical
hat iinclude
nclu
nc
lude
dee >
2500,0000
25
000 su
ubj
bjeccts
t , an
andd >5
50,
0,00
0000 ha
00
have
vee bbeen
eenn ge
ee
gen
noty
noty
type
peed on a ggenome-wide
enom
en
omeom
e wi
ewidde pplatform.
laatf
tfor
orrm.
that
>250,000
subjects,
>50,000
genotyped
resourrcees that
th
hat should
sho
houldd expand
expa
ex
pand
pa
n the
nd
thee reach
rea
each
ch of
of EMR-based
EMREM
R ba
base
sedd genomic
se
geno
ge
n mi
no
micc research
rese
re
sear
se
arch
ar
ch
h include
inc
nclu
lude
lu
d the Kaiser
de
Kaiseer
Other resources
Northern California biobank (>100,000 individuals),44 the Million Veteran’s Project (currently
>100,000 individuals),45 and biobanks that will be coupled to national healthcare systems (e.g.
the UK Biobank46).
Studies in the EMR environment enable PheWAS analyses: a PheWAS experiment
cannot be executed in the absence of diverse diagnoses across many diagnostic classes in study
subjects. The PheWAS-based associations reported here did not achieve significance using a
strict Bonferroni correction; however, SCN10A rs6795970 was strongly associated with both
atrial fibrillation and cardiac arrhythmias in the normal QRS population as well. The PheWAS
16
DOI: 10.1161/CIRCULATIONAHA.112.000604
approach is still in development and it is likely that the strategy of using only standard diagnostic
codes is a limitation. The refinement of disease classifications and abstraction methods will
enable more granular phenotypic subsetting, particularly when applied to increasingly large
datasets described above.
A fundamental limitation of the PheWAS technique is that every diagnosis is not
explicitly included or excluded in each record. Another potential limitation of EMR-based
phenotyping is the accuracy of the information contained in the record. Extracting phenotypes
from the EMR can result in errors if the data in the source EMR are incorrect (e.g., the EMR
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
specifies a disease the patient does not have). Gender and ancestry testing suggest these
demographic features are rarely incorrect in an EMR.7,20 Similarly, the electronic
icc phe
phenotyping
h no
he
noty
typi
ty
ping
pi
n
ng
experience in eMERGE indicates that recurrent mentions of specific diagnoses or combinations
off diverse
div
iver
erse
er
see data
dat
ata types
ty
ypes
pes (such as medications plus free
free text plus diagnostic
diagn
gnosti
gn
tiic ccodes)
odes) greatly improve
ddiagnostic
diag
i gno
n stic acc
accuracy
cur
urac
acy of
ac
o eelectronic
lect
le
ctro
ct
ronnic
ro
nic ph
phen
phenotyping
en
nottypin
in
ng alg
algorithms
gorith
gor
hms
ms w
when
hen as
hen
asse
assessed
esssed aass po
posi
positive
siti
tiive ppredictive
redi
dict
di
ctiv
i e
iv
value
va
valu
luee us
lu
uusing
ingg ha
in
hhand
and
nd
d ccuration
uraatio
ur
on ass tthe
he ggold
oldd stan
ol
sstandard.
tan
anda
dard
da
d.47 E
EMRs
MRs co
MR
con
contain
ntai
ainn da
ai
data
ata oon
n subj
ssubjects
ubj
bjec
eccts eexposed
xpossed
xp
d to
to a
healthcare system
sys
yste
ys
tem
te
m an
andd as
a such
suc
uchh EMR-based
EMREM
R baase
Rsedd studies
s ud
st
udie
iees may
m y not
ma
not be generalizable
gen
e er
eral
aliz
al
izab
iz
able
ab
le tto
o a br
broa
oadd
oa
broad
population. The conduct of EMR-based studies across a variety of geographical locations and
practice settings (as in eMERGE) potentially mitigates this issue.
In summary, a genome-wide association study conducted across multiple EMR systems
replicated known associations for a readily available index of cardiac conduction, the QRS
duration. The algorithm deployed allowed us to analyze subjects with normal
electrocardiograms, and no evidence of heart disease or confounding drugs or electrolyte
abnormalities, and the phenome-wide association established genomic variants predicting slower
conduction in this population also associated with subsequent development of arrhythmias.
17
DOI: 10.1161/CIRCULATIONAHA.112.000604
Thus, the present findings are consonant with a view that individual susceptibility to serious
arrhythmias is determined in part by genetically-determined variability in cardiac
electrophysiologic behaviors. Furthermore, this study highlights the advantages of a genotyped
EMR population to explore the subsequent emergence of clinically-important phenotypes not
ascertained in the original study design.
Funding Sources: This work was supported by the electronic MEdical Records and GEnomics
(eMERGE) Network, initiated and funded by the National Human Genome Research Institute,
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
with additional funding from the National Institute of General Medical Sciences, through the
following
g ggrants: U01-HG-004610 ((Group
p Health Cooperative);
p
); U01-HG-004608 ((Marshfield
Clinic); U01-HG-04599 (Mayo Clinic); U01-HG-004609 (Northwestern University);
Univerrsiity
y); U01-HGU01
01-H
-HG
-H
G04603 (Vanderbilt University, also serving as the Administrative Coordinating Center). BioVU
also receives support
supp
pport through the National Center for Research Resources UL1 RR024975,
wh
hic
ichh is
i nnow
ow
w at th
thee Na
Nati
t on
onal
a C
en
nterr fo
or A
dvan
nci
cinng
ng Trans
nsla
lational
all S
cien
ence
ces, 2 U
L TR0
L1
R000
0445.
which
National
Center
for
Advancing
Translational
Sciences,
UL1
TR000445.
D
r. So
S
toodehni
niaa w
ass ssupported
upp
ppor
pp
orte
or
tedd byy R
te
01-HL0
08845
56. The
he ccontent
ontent
tent iss so
sole
lely
le
y tthe
he rresponsibility
espo
es
ponnsib
po
nsib
ibillit
ityy of the
Dr.
Sotoodehnia
was
R01-HL088456.
solely
au
uth
thoors and do
does
es nnot
ot nnecessarily
eccessaariily rrepresent
epre
ep
rese
sent
nt tthe
he oofficial
he
fffic
icia
iall vi
view
ew
ws of tthe
hee NIH
H.
authors
views
NIH.
Conflict of Interest
Inte
In
tere
te
rest
re
st Disclosures:
Disscl
clos
osu
os
ures
es:: No
N
ne.
ne
None.
References:
1. Capone RJ, Pawitan Y, el-Sherif N, Geraci TS, Handshaw K, Morganroth J, Schlant RC,
Waldo AL. Events in the cardiac arrhythmia suppression trial: baseline predictors of mortality in
placebo-treated patients. J. Am. Coll. Cardiol. 1991;18:1434-1438.
2. Hanson B, Tuna N, Bouchard T, Heston L, Eckert E, Lykken D, Segal N, Rich S. Genetic
factors in the electrocardiogram and heart rate of twins reared apart and together. Am J Cardiol.
1989;63:606-609.
3. Busjahn A, Knoblauch H, Faulhaber HD, Boeckel T, Rosenthal M, Uhlmann R, Hoehe M,
Schuster H, Luft FC. QT interval is linked to 2 long-QT syndrome loci in normal subjects.
Circulation. 1999;99:3161-3164.
4. Russell MW, Law I, Sholinsky P, Fabsitz RR. Heritability of ECG measurements in adult
18
DOI: 10.1161/CIRCULATIONAHA.112.000604
male twins. J Electrocardiol. 1998;30 Suppl:64-68.
5. Ramirez AH, Schildcrout JS, Blakemore DL, Masys DR, Pulley JM, Basford MA, Roden DM,
Denny JC. Modulators of normal electrocardiographic intervals identified in a large electronic
medical record. Heart Rhythm. 2011;8:271-277.
6. McCarty CA, Wilke RA, Giampietro PF, Wesbrook SD, Caldwell MD. Marshfield Clinic
Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large
population-based biobank. Personalized Medicine. 2005;2:49-79.
7. Roden DM, Pulley JM, Basford MA, Bernard GR, Clayton EW, Balser JR, Masys DR.
Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin
Pharmacol Ther. 2008;84:362-369.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
8. McCarty CA, Chapman-Stone D, Derfus T, Giampietro PF, Fost N. Community consultation
and communication for a population-based DNA biobank: the Marshfield clinic personalized
medicine research project. Am J Med Genet A. 2008;146A:3026-3033.
9. Ginsburg GS, Burke TW, Febbo P. Centralizedd biorepositories for genetic and
ndd genomic
gen
e om
omic
ic
research.
esearch. JAMA. 2008;299:1359-1361.
10. Lemke AA, Wolf
Wo WA, Hebert-Beirne J, Smith ME. Public and biobank participant attitudes
toward
Health
Genomics.
owa
ward
rd ggenetic
enet
en
etiic rresearch
et
eseearch
es
ea
participation and data sharing.
sha
hariing. Public Hea
ha
ea
alth Ge
Geno
n mics. 2010;13:368377.
3777..
11.
11. McCarty
McCartyy CA,
CA
A, Chisholm
Chissholm
m RL,
RL, Chute
Chu
huutee CG,
CG, Kullo
Kullo
o IJ,
IJ, Jarvik
Jarv
Ja
rvik
rv
ik GP,
GP,, Larson
Larsoon EB,
EB Li
L R, Masys
Masys
Mas
sys DR,
DR
R,
Ritchie
Network:
consortium
Riitc
tchi
hiee MD,
hi
MD
D, Roden
Rode
Ro
denn DM,
de
DM, Struewing
S ru
St
uew
win
ingg JP,
JP
P, Wolf
Wollf WA.
Wo
WA The
Th
he eMERGE
eMER
eM
ER
RGE
EN
etwo
et
work
wo
rk:: A co
cons
nsoorti
ns
tiium
m ooff
biorepositories
electronic
medical
conducting
studies.
bior
orep
eposit
itor
oriess li
llinked
nkked to el
elec
ctr
tron
onic
ic med
dic
ical
al rrecords
ecor
ordds ddata
ataa fo
at
forr co
cond
duc
ucti
ting
ng ggenomic
enom
en
omic
ic stu
udi
diess. BM
BMC
C
Med Genomi
Genomics.
2011;4:13.
ics
cs.. 20
011
1 ;4
;4:1
: 3..
12. Kullo IJ, Ding K, Shameer K, McCarty CA, Jarvik GP, Denny JC, Ritchie MD, Ye Z,
Crosslin DR, Chisholm RL, Manolio TA, Chute CG. Complement receptor 1 gene variants are
associated with erythrocyte sedimentation rate. Am J Hum Genet. 2011;89:131-138.
13. Denny JC, Ritchie MD, Crawford DC, Schildcrout JS, Ramirez AH, Pulley JM, Basford MA,
Masys DR, Haines JL, Roden DM. Identification of genomic predictors of atrioventricular
conduction: using electronic medical records as a tool for genome science. Circulation.
2010;122:2016-2021.
14. Kullo IJ, Ding K, Jouni H, Smith CY, Chute CG. A genome-wide association study of red
blood cell traits using the electronic medical record. PLoS ONE. 2010;5.
15. Ritchie MD, Denny JC, Crawford DC, Ramirez AH, Weiner JB, Pulley JM, Basford MA,
Brown-Gentry K, Balser JR, Masys DR, Haines JL, Roden DM. Robust replication of genotypephenotype associations across multiple diseases in an electronic medical record. Am J Hum
Genet. 2010;86:560-572.
19
DOI: 10.1161/CIRCULATIONAHA.112.000604
16. Denny JC, Crawford DC, Ritchie MD, Bielinski SJ, Basford MA, Bradford Y, Chai HS,
Bastarache L, Zuvich R, Peissig P, Carrell D, Ramirez AH, Pathak J, Wilke RA, Rasmussen L,
Wang X, Pacheco JA, Kho AN, Hayes MG, Weston N, Matsumoto M, Kopp PA, Newton KM,
Jarvik GP, Li R, Manolio TA, Kullo IJ, Chute CG, Chisholm RL, Larson EB, McCarty CA,
Masys DR, Roden DM, De Andrade M. Variants Near FOXE1 Are Associated with
Hypothyroidism and Other Thyroid Conditions: Using Electronic Medical Records for Genomeand Phenome-wide Studies. Am J Hum Genet. 2011;89:529-542.
17. Denny JC, Ritchie MD, Basford MA, Pulley JM, Bastarache L, Brown-Gentry K, Wang D,
Masys DR, Roden DM, Crawford DC. PheWAS: demonstrating the feasibility of a phenomewide scan to discover gene-disease associations. Bioinformatics. 2010;26:1205-1210.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
18. Denny JC, Spickard A, Johnson KB, Peterson NB, Peterson JF, Miller RA. Evaluation of a
method to identify and categorize section headers in clinical documents. J Am Med Inform Assoc.
2009;16:806-815.
prolongation
19. Denny JC, Miller RA, Waitman LR, Arrieta MA, Peterson JF. Identifying QT
T pr
prol
o on
ol
onga
g ti
ga
tion
on
Inform.
from ECG impressions using a general-purpose Natural Language Processor. Intt J Me
Medd In
Info
forrm.
fo
rm
2009;78 Suppl 1:S34-42.
20. Dumitrescu L,, Ritchie MD, Brown-Gentry K, Pulley JM, Basford M,, Denny JC, Oksenberg
JR,
Roden
Haines
Assessing
JR
R, Ro
Rode
denn DM
de
DM, Ha
Hain
i es JL, Crawford DC. Assess
ssiing the accuracyy of oobserver-reported
ss
bseerver-reported ancestry
bs
inn a biorepository
Genet
Med.
2010;12:648-650.
bio
i repo
osi
sito
ory linked
lin
inke
keed to electronic
ele
lect
ctro
ct
roni
ro
nicc medical
medi
me
dica
call records.
reccor
re
cords. Ge
Gene
n t Me
ne
Med
d. 20
2010
10
0;1
; 2:
2:64
64
488 65
650.
0.
Turner
S,, Armstrong
LL,
Bradford
Y,, C
Carlson
AT,
Dee An
Andrade
221.
1. T
urner S
Arrmstroong LL
L, B
rad
a fo
ord
dY
arrlsonn CS,
CS,, Crawford
Crawf
rawf
wfordd DC,
DC, Crenshaw
Cren
Cr
ensh
shaw
aw A
T, D
Andr
draade
dr
Doheny
KF,, Ha
Haines
Hayes
G,, Jarv
G,, Ji
Jiang
Kullo
Lii R, L
Ling
Manolio
TA,
M, D
ohen
oh
enyy KF
K
ain
inees JJL,
L H
L,
ayes
ay
es G
JJarvik
arv
rvik
ik G
Jian
angg L,
an
L, K
ullo
ul
o IIJ,
J, L
ingg H, M
in
anollio T
an
A,
Matsumoto
McCarty
CA,
McDavid
AN,
Mirel
DB,
Paschall
EW,
Rasmussen
LV,
Ma
ats
tsum
u ot
otoo M, M
cCar
cC
arty C
A, M
cDavid A
cD
N M
N,
irel
ir
e D
B P
B,
asch
as
chal
a l JE
JE, Pu
Pugh
hE
W R
W,
asmu
mussen
en L
V
V,
Zuvich
RL,
Ritchie
MD.
Quality
Wilke RA, Zu
uvi
vich
ch
hR
L R
L,
ittch
chie
i M
D. Q
u li
ua
lity
ty
y ccontrol
o trool pr
on
pprocedures
oced
oc
edur
ed
u es ffor
ur
orr ggenome-wide
enom
en
omeom
e wi
ew de aassociation
sssoc
o iation
studies.
Protoc
Genet.
2011;Chapter
tud
udie
iess Cu
Curr
rr P
roto
ro
tocc Hu
Hum
m Ge
Gene
nett 20
2011
11;C
;Cha
hapt
pter
er 1:Unit1.19.
1:U
:Uni
nit1
t1 19
22. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus
genotype data. Genetics. 2000;155:945-959.
23. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal
components analysis corrects for stratification in genome-wide association studies. Nat Genet.
2006;38:904-909.
24. Sotoodehnia N, Isaacs A, De Bakker PIW, Dörr M, Newton-Cheh C, Nolte IM, Van der
Harst P, Müller M, Eijgelsheim M, Alonso A, Hicks AA, Padmanabhan S, Hayward C, Smith
AV, Polasek O, Giovannone S, Fu J, Magnani JW, Marciante KD, Pfeufer A, Gharib SA,
Teumer A, Li M, Bis JC, Rivadeneira F, Aspelund T, Köttgen A, Johnson T, Rice K, Sie MPS,
Wang YA, Klopp N, Fuchsberger C, Wild SH, Mateo Leach I, Estrada K, Völker U, Wright AF,
Asselbergs FW, Qu J, Chakravarti A, Sinner MF, Kors JA, Petersmann A, Harris TB, Soliman
EZ, Munroe PB, Psaty BM, Oostra BA, Cupples LA, Perz S, De Boer RA, Uitterlinden AG,
Völzke H, Spector TD, Liu F-Y, Boerwinkle E, Dominiczak AF, Rotter JI, Van Herpen G, Levy
20
DOI: 10.1161/CIRCULATIONAHA.112.000604
D, Wichmann H-E, Van Gilst WH, Witteman JCM, Kroemer HK, Kao WHL, Heckbert SR,
Meitinger T, Hofman A, Campbell H, Folsom AR, Van Veldhuisen DJ, Schwienbacher C,
O’Donnell CJ, Volpato CB, Caulfield MJ, Connell JM, Launer L, Lu X, Franke L, Fehrmann
RSN, Te Meerman G, Groen HJM, Weersma RK, Van den Berg LH, Wijmenga C, Ophoff RA,
Navis G, Rudan I, Snieder H, Wilson JF, Pramstaller PP, Siscovick DS, Wang TJ, Gudnason V,
Van Duijn CM, Felix SB, Fishman GI, et al. Common variants in 22 loci are associated with
QRS duration and cardiac ventricular conduction. Nat Genet. 2010;42:1068-1076.
25. Yang J, Lee SH, Goddard ME, Visscher PM. GCTA: A Tool for Genome-wide Complex
Trait Analysis. Am J Hum Genet. 2011;88:76-82.
26. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, Maller J, Sklar P,
De Bakker PIW, Daly MJ, Sham PC. PLINK: a tool set for whole-genome association and
population-based linkage analyses. Am J Hum Genet. 2007;81:559-575.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
27. Kurreeman F, Liao K, Chibnik L, Hickey B, Stahl E, Gainer V, Li G, Bry L, Mahan S, Ardlie
K, Thomson B, Szolovits P, Churchill S, Murphy SN, Cai T, Raychaudhuri S, Kohane I, Karlson
E, Plenge RM. Genetic basis of autoantibody positive and negative rheumatoidd arthritis
art
rthr
hrit
hr
i is risk
it
ris
iskk in a
2011;88:57-69.
multi-ethnic cohort derived from electronic health records. Am J Hum Genet. 201
0111;;88
88:5
:5
577 69
69.
28. Johnson EA, McKinnon MG. The differential effect of quinidine and pyrilamine on the
myocardial action potential
p tential at various rates of stimulation. J Pharmacol Exp Ther.
po
1957;120:460-468.
19
957
57;1
;120
;1
20:4
20
:460
:4
60-4
60
468
68.
effect
mortality
trial
229.
9. Preliminary
Preliminarry rreport:
epport
portt: ef
effe
fect
fe
ct ooff encainide
en
ncain
ca nidde aand
nd flecainide
fleecaainiidee oon
n mor
m
ortaalit
alityy in a rrandomized
ando
an
dom
mized
miz
zed tr
ria
iall of
arrhythmia
ar
rrh
hyt
y hmia suppression
sup
u prressioon after
afteer myocardial
myo
yocaard
r ia
iall infarction.
inffar
farctioon. The
Th
he Cardiac
Card
Ca
diacc Arrhythmia
Arrhyythhmia
hmia Suppression
Sup
u preess
ession Trial
Trriaal
(CAST)
Engl
Med.
CAS
AST)
T) IInvestigators.
nveesti
nv
estiggattors
tors.. N En
E
gl J M
ed.. 11989;321:406-412.
ed
989;
98
9;32
9;
321:
32
1 406-4
1:
6-412.
412.
2
30. Epstein AE,
AE, Hallstrom
Hall
Ha
l sttro
ll
om AP,
AP Rogers
Roge
Ro
gerss WJ,
ge
WJ, Liebson
Lieb
e so
sonn PR,
PR
R, Seals
Seal
Se
a s AA,
al
A , Anderson
AA
Ande
An
ders
de
rson
rs
o JL,
on
JL, Cohen
Coh
o en JD,
Capone
RJ,
Wyse
DG. Mo
Mortality
following
ventricular
arrhythmia
encainide,
Capo
Ca
pone
ne R
J W
ysee DG
ys
Mort
rtal
alit
ityy fo
foll
llow
owin
ingg ve
vent
ntri
ricu
cula
larr ar
arrh
rhyt
ythm
hmia
ia ssuppression
uppr
up
pres
essi
sion
on bby
y en
enca
cain
inid
idee
flecainide, and moricizine after myocardial infarction. The original design concept of the Cardiac
Arrhythmia Suppression Trial (CAST). JAMA. 1993;270:2451-2455.
31. Akiyama T, Pawitan Y, Greenberg H, Kuo CS, Reynolds-Haertle RA. Increased risk of death
and cardiac arrest from encainide and flecainide in patients after non-Q-wave acute myocardial
infarction in the Cardiac Arrhythmia Suppression Trial. CAST Investigators. Am J Cardiol.
1991;68:1551-1555.
32. Coromilas J, Saltman AE, Waldecker B, Dillon SM, Wit AL. Electrophysiological effects of
flecainide on anisotropic conduction and reentry in infarcted canine hearts. Circulation.
1995;91:2245-2263.
33. Greenberg HM, Dwyer EM Jr, Hochman JS, Steinberg JS, Echt DS, Peters RW. Interaction
of ischaemia and encainide/flecainide treatment: a proposed mechanism for the increased
mortality in CAST I. Br Heart J. 1995;74:631-635.
21
DOI: 10.1161/CIRCULATIONAHA.112.000604
34. Antzelevitch C, Brugada P, Borggrefe M, Brugada J, Brugada R, Corrado D, Gussak I,
LeMarec H, Nademanee K, Perez Riera AR, Shimizu W, Schulze-Bahr E, Tan H, Wilde A.
Brugada syndrome: report of the second consensus conference: endorsed by the Heart Rhythm
Society and the European Heart Rhythm Association. Circulation. 2005;111:659-670.
35. Holm H, Gudbjartsson DF, Arnar DO, Thorleifsson G, Thorgeirsson G, Stefansdottir H,
Gudjonsson SA, Jonasdottir A, Mathiesen EB, Njølstad I, Nyrnes A, Wilsgaard T, Hald EM,
Hveem K, Stoltenberg C, Løchen M-L, Kong A, Thorsteinsdottir U, Stefansson K. Several
common variants modulate heart rate, PR interval and QRS duration. Nat Genet. 2010;42:117122.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
36. Pfeufer A, Van Noord C, Marciante KD, Arking DE, Larson MG, Smith AV, Tarasov KV,
Müller M, Sotoodehnia N, Sinner MF, Verwoert GC, Li M, Kao WHL, Köttgen A, Coresh J, Bis
JC, Psaty BM, Rice K, Rotter JI, Rivadeneira F, Hofman A, Kors JA, Stricker BHC, Uitterlinden
AG, Van Duijn CM, Beckmann BM, Sauter W, Gieger C, Lubitz SA, Newton-Cheh C, Wang TJ,
Magnani JW, Schnabel RB, Chung MK, Barnard J, Smith JD, Van Wagoner DR, Vasan RS,
Aspelund T, Eiriksdottir G, Harris TB, Launer LJ, Najjar SS, Lakatta E, Schlessinger D, Uda M,
Abecasis GR, Müller-Myhsok B, Ehret GB, Boerwinkle E, Chakravarti A, Soliman
an
n EZ,
EZ,, Lunetta
Lun
unet
etta
KL, Perz S, Wichmann H-E, Meitinger T, Levy D, Gudnason V, Ellinor PT, Sanna
San
nna
na S,
S, Kääb
Kääb S,
S,
Witteman JCM, Alonso A, Benjamin EJ, Heckbertt SR. Genome-wide association
on study
stu
tudy
dy of
of PR
interval.
nterval. Nat Genet. 2010;42:153-159.
37.
W,, Ka
Kaba
M,
37
7. Chambers
Cham
Ch
ambe
am
berss JC,
be
C Zhao
C,
Zhao J, Terracciano CMN, Bezzina
Beezz
z ina CR, Zhangg W
Kab
ba R, Navaratnarajah M
Lotlikar
A,, Se
Sehmi
Kooner
MK,
Deng
Siedlecka
U,, Pa
Parasramka
S,, El
El-Hamamsy
Lo
otllikar
ik A
Sehm
hm
mi JS
JS,, Ko
Koon
oner
on
er M
K, D
engg G,, S
en
ieddleckaa U
ie
P
raasr
sram
am
mka S
l-H
Ham
amam
amsy
syy II,, Wa
Wass
ss
LRC,
JSSG,
Sternberg
MJE,
McKenna
Severs
NJ,
Silva
R,, Wi
Wilde
MN,, Dekker L
MN
R , De JJong
RC
ongg JS
on
JSSG
SG
G, St
S
ernbergg M
JE
E, Mc
cKen
Kenna W,
W, S
eveers NJ
ev
J, De
De S
illva
v R
Wild
de
AAM,
A
AAM
AM, Anand
nd P, Yacoub
Yaco
oubb M,
M, Scott
Scott J,, Elliott
Scot
Ell
lliiott P,
P, Wood
Wo
ood JN,
JN, Kooner
Kooonerr JS.
JS. Genetic
Gennet
netic variation
variaatio
va
on inn
SCN10A
cardiac
Genet.
2010;42:149-152.
SC
CN1
N10A
0 iinfluences
0A
nflu
nf
luen
en
ncees ca
card
rd
dia
iac cconduction.
ond
nduc
ucti
uc
tioon. Na
Natt Ge
Gen
net. 201
0 0;
01
0;42
42:1
:149
:1
499-1
152
5 .
B.. Wh
Whither
Art
Thou,
SCN10A,
What
Artt Th
Doing?
38. London B
Whit
ithe
it
herr Ar
he
rt Th
T
ou,, SC
ou
SCN
N10
10A,
A, aand
n W
nd
h t Ar
ha
A
Thou
ou D
o ng
oi
ng?? Ci
Circ
r Res.
rc
Res
es..
2012;111:268-270.
2012
20
12;1
;111
11:2
:268
68-270
270
39. Yang T, Atack TC, Stroud DM, Zhang W, Hall L, Roden DM. Blocking Scn10a channels in
heart reduces late sodium current and is antiarrhythmic. Circ Res. 2012;111:322-332.
40. Verkerk AO, Remme CA, Schumacher CA, Scicluna BP, Wolswinkel R, De Jonge B,
Bezzina CR, Veldkamp MW. Functional Nav1.8 channels in intracardiac neurons: the link
between SCN10A and cardiac electrophysiology. Circ Res. 2012;111:333-343.
41. Gautron L, Sakata I, Udit S, Zigman JM, Wood JN, Elmquist JK. Genetic tracing of Nav1.8expressing vagal afferents in the mouse. J Comp Neurol. 2011;519:3085-3101.
42. Carroll RJ, Thompson WK, Eyler AE, Mandelin AM, Cai T, Zink RM, Pacheco JA,
Boomershine CS, Lasko TA, Xu H, Karlson EW, Perez RG, Gainer VS, Murphy SN, Ruderman
EM, Pope RM, Plenge RM, Kho AN, Liao KP, Denny JC. Portability of an algorithm to identify
rheumatoid arthritis in electronic health records. J Am Med Inform Assoc. 2012;19:e162-e169.
22
DOI: 10.1161/CIRCULATIONAHA.112.000604
43. Kho AN, Hayes MG, Rasmussen-Torvik L, Pacheco JA, Thompson WK, Armstrong LL,
Denny JC, Peissig PL, Miller AW, Wei W-Q, Bielinski SJ, Chute CG, Leibson CL, Jarvik GP,
Crosslin DR, Carlson CS, Newton KM, Wolf WA, Chisholm RL, Lowe WL. Use of diverse
electronic medical record systems to identify genetic risk for type 2 diabetes within a genomewide association study. J Am Med Inform Assoc. 2012;19:212-218.
44. Kaiser Permanente, UCSF Scientists Complete NIH-Funded Genomics Project Involving
100,000 People [Internet]. [cited 2011 Sep 13];Available from:
http://www.dor.kaiser.org/external/news/press_releases/Kaiser_Permanente,_UCSF_Scientists_
Complete_NIH-Funded_Genomics_Project_Involving_100,000_People/.
45. Million Veteran Program (MVP) [Internet]. [cited 2012 Jun 20];Available from:
http://www.research.va.gov/mvp/.
46. Collins R. What makes UK Biobank special? Lancet. 2012;379:1173–1174.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
47. Kho AN, Pacheco JA, Peissig PL, Rasmussen L, Newton KM, Weston N, Crane PK, Pathak
Electronic
J, Chute CG, Bielinski SJ, Kullo IJ, Li R, Manolio TA, Chisholm RL, Denny JC. E
lect
le
ctro
roni
nicc
Sci
Transl
Med.
Medical Records for Genetic Research: Results of the eMERGE Consortium. Sc
ci Tr
ran
nsl
s M
ed.
ed
2011; 3:79re1.
Table
eMERGE
All
were
European
ancestry.
T
ab
bl 1. eMER
ble
ERGE
GE ssites
ites
it
ess contributing
con
ontr
triibut
tr
ibuttin
ingg sam
ssamples.
amplles. A
ll ppatients
attie
tients
entss w
eree of
er
of E
uroopea
opeaan an
anc
cest
stry
st
ry
y.
Site
Si
te
Group Health
Marshfield
Mayo Clinic
Northwestern
Vanderbilt
Total
Genotyped
Primary
Pri
rima
mary
ma
r site
sitee Mean
M an Age
Me
Age (SD)
(SD
SD)) G
Geno
eno
noty
type
ty
ped
pe
d samples
saamp
mplees meeting
meet
me
etin
et
ingg normal
in
norm
no
rmal
rm
al QRS
QRS
RS
algorithm
definition
p enot
ph
o yp
ypee
algo
al
g ri
go
rith
t m de
defi
fini
nition
on
phenotype
Males
Male
Ma
less
le
Fema
Fe
Females
male
ma
lees
Total
Dementia
72.58 (7.23)
187
351
538
Cataract
56.73 (9.73)
69
149
218
Peripheral
58.81 (8.52)
1056
730
1786
artery disease
Type 2 diabetes 52.49 (10.54)
99
118
217
Normal QRS
49.95 (14.6)
1077
1436
2513
2488
2784
5272
23
DOI: 10.1161/CIRCULATIONAHA.112.000604
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Table 2. Replicated SNPs in the QRS GWAS analysis. Reported betas and p-values for eMERGE analysis are adjusted for sex, and
all betas for CHARGE and eMERGE analyses are for the coded allele. Each SNP represents the strongest association in the region,
and was the target of a PheWAS (as shown in Table 3).
CHR
3
3
6
6
1
eMERGE QRS
Coded Coded Allele
Allele
Frequency
rs1805126 38567410
T*
0.665
rs6795970 38741679
A
0.401
C
rs6906287 119069433
0.454
G*
0.760
rs1321313 36726799
rs2207790 61670555
T*
0.482
SNP
Location
BETA
(msec)
-1.002
0.765
0.717
-0.793
-0.622
p-value
1.45E-08
6.60E
6.60E-06
06
2.26E-05
6.13E-05
2.07E-04
Coded
Allele
A
A
C
C
A
CHARGE QRS
Coded Allele
BETA
Frequency
(msec)
0.655
-0.6568
0.396
0.74
0.7476
7 76
00.5383
.538
538
3 3
0.451
0.742
-0.8
-0
0.8
. 12
129
9
-0.8129
0.461
-0.5956
p-value
2.52E-20
55.08E-27
5.
08E
08
E 27
55.56E-16
5.5
.5
56E
6E-1
-16
-1
6
44.60E-25
4.6
.60E
60E
0E-2
-25
25
6.31E-18
Nearest
Gene
SCN5A
SCN10A
SC
C6
C6orf204
CD
CDKN1A
N
NFIA
*This SNP was coded on the opposite strand from that in the CHARGE study. Therefore, while the allele is not identical, the direction of effect is the same
24
DOI: 10.1161/CIRCULATIONAHA.112.000604
Table 3. Most significant associations between SNPs predicting normal QRS duration and
diagnostic codes. For each SNP displayed, all PheWAS associations p<0.01 are displayed.
Associated Phenotype
SCN5A (Chr 3, rs1805126)
Cardiac arrhythmias
Other diseases of upper respiratory tract
Benign prostatic hypertrophy
Chronic kidney disease
Melanoma
Dermatophytosis
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Case count
Odds ratio
P
3075
509
1615
1212
151
1229
0.877
0.8131
0.8527
0.8772
0.7033
0.8791
1.10E-03
3.83E-03
5.54E-03
6.02E-03
8.18E-03
8.77E-03
SCN10A (Chr 3, rs6795970)
Cardiac arrhythmias
Atrial fibrillation and flutter
Arterial embolism and thrombosis
Cholecystitis
Convulsions
Aphasia
3075
1758
150
121
311
69
0.8781
0.8519
0.6928
0.6627
1.249
0.5999
7.21E-04
8.45E-04
3.20E-03
3.44E-03
7.01E-03
7.01
0 E 03
0
7.25E-03
7.25
7.
25E25
E-03
E03
C6orf204 (Chr 6, rs6906287)
Colorectal cancer
Benign
Beni
nign
gn prostatic
pro
rost
stat
st
atic
at
i hypertrophy
hyp
ypeertrophy
Penicillin
Pe
enici
nici
cill
lliin aallergy
lller
ergy
y
Melanoma
M
ellano
n ma
Disorders
D
isoorders of pancreatic
panccreatic secretion
secrettio
se
ion
n
Nasal
Nasa
Na
al polyps
p ly
po
yps
p
Prostatitis
Pros
Pr
osta
tati
ta
titi
ti
tiss
ti
316
1615
65
151
51
1
844
127
1227
150
150
0.7592
0.8576
0.857
57
76
1.68
1.
.68
11.383
1.
383
38
3
0.6421
0.642
21
1.413
1.4113
0.7166
0 71
0.
7166
66
9.90E-04
4.19E-03
4.01E-03
4.01E-0
03
5.68E-03
5.68
5.
6 E68
E-03
03
6.22E-03
6.22
6.
22E22
E-03
03
3
7.15E-03
7..15
5EE-03
03
8.07E-03
8.0
8.
07E07EE-03
03
NFIA (Chr 1,, rs2207790)
rs2
s220
2077
20
7790
77
90))
90
Postinflammatory
pulmonary
fibrosis
P
i fl
l
fib i
Multiple myeloma
Neutropenia and leukopenia
Facial nerve disorders
Cardiomyopathy
Pneumothorax
Disorders of function of stomach
Prostatitis
Toxic diffuse goiter
173
54
238
67
420
80
355
150
76
0.7048
0 7048
0.5425
0.7545
1.677
0.8033
0.6391
1.23
0.7239
1.541
11.76E-03
76E 03
2.80E-03
2.81E-03
3.70E-03
4.40E-03
6.59E-03
7.73E-03
8.85E-03
9.47E-03
CDKN1A (Chr 6, rs1321313)
Anorexia
Bacteremia
Anal fissure and fistula
Abnormal loss of weight and underweight
Dementias
Overweight, obesity and other hyperalimentation
Spondylosis and allied disorders
53
118
108
398
841
2501
1244
0.3649
0.5684
1.552
0.7642
0.833
1.105
1.156
1.57E-03
2.01E-03
3.04E-03
3.39E-03
5.69E-03
7.39E-03
9.39E-03
25
DOI: 10.1161/CIRCULATIONAHA.112.000604
Figure Legends:
Figure 1. Panel A. Distribution of QRS durations in 5,272 normal ECGs. Panel B. Genomewide association analysis of QRS duration, using sex-adjusted linear regression. The red line
indicates genome-wide significance (p=5x10-8). The points in green are those SNPs that were
also identified at genome-wide significance in the CHARGE consortium QRS meta-analysis as
described in the text.
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Figure 2. Phenome-wide association study plots of most significant SNPs associated with QRS
blu
ue li
ine
ness in
ndi
dica
c te
ca
duration. Panel A. SCN5A (rs1805126). Panel B. SCN10A (rs6795970). The blue
lines
indicate
p<0.05.
Figure
Figu
gure
u 3. Kapl
Kaplan-Meier
plan
an
n-M
Mei
e err eestimate
stim
st
im
mat
atee of ddevelopment
evelop
eve
pmentt A
Atrial
triaal Fi
Fibr
Fibrillation
b il
br
illa
lattioon
on pper
er SCN10A
SCN1
SC
N10A
N1
0A rs
rrs6795970
6795
67
9 97
95
9700
gennoty
ge
noty
type
pe.. Al
pe
Alll pa
atieents
entss w
ere ju
er
judg
dged
dg
ed ffree
reee of ccardiac
re
arrdi
d acc dise
ddisease
i easse (inc
((including
inc
nclu
lu
udi
ding
ng aarrhythmias)
rrhhyt
rr
hythm
hmia
ias)
s) aatt th
he ti
tim
me ooff
genotype.
patients
were
judged
the
time
he initial no
orm
mal
a E
CG..
CG
the
normal
ECG.
26
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Figure 1
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Figure 2
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Figure 3
Downloaded from http://circ.ahajournals.org/ by guest on June 16, 2017
Genome- and Phenome-Wide Analysis of Cardiac Conduction Identifies Markers of
Arrhythmia Risk
Marylyn D. Ritchie, Joshua C. Denny, Rebecca L. Zuvich, Dana C. Crawford, Jonathan S.
Schildcrout, Lisa Bastarache, Andrea H. Ramirez, Jonathan D. Mosely, Jill M. Pulley, Melissa A.
Basford, Yuki Bradford, Luke V. Rasmussen, Jyotishman Pathak, Christopher G. Chute, Iftikhar J.
Kullo, Catherine A. McCarty, Rex L. Chisholm, Abel N. Kho, Christopher S. Carlson, Eric B. Larson,
Gail P. Jarvik, Nona Sotoodehnia, Teri A. Manolio, Rongling Li, Daniel R. Masys, Jonathan L.
Haines and Dan M. Roden
Circulation. published online March 5, 2013;
Circulation is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231
Copyright © 2013 American Heart Association, Inc. All rights reserved.
Print ISSN: 0009-7322. Online ISSN: 1524-4539
The online version of this article, along with updated information and services, is located on the
World Wide Web at:
http://circ.ahajournals.org/content/early/2013/03/05/CIRCULATIONAHA.112.000604
Data Supplement (unedited) at:
http://circ.ahajournals.org/content/suppl/2013/03/05/CIRCULATIONAHA.112.000604.DC1
Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in
Circulation can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office.
Once the online version of the published article for which permission is being requested is located, click Request
Permissions in the middle column of the Web page under Services. Further information about this process is
available in the Permissions and Rights Question and Answer document.
Reprints: Information about reprints can be found online at:
http://www.lww.com/reprints
Subscriptions: Information about subscribing to Circulation is online at:
http://circ.ahajournals.org//subscriptions/
Supplemental Material
Supplemental Table 1: Replicated SNPs in the QRS GWAS analysis. Reported betas and pvalues for eMERGE analysis are adjusted for sex, and all betas for CHARGE and eMERGE
analyses are for the coded allele.
eMERGE QRS
CHR
SNP
Location
Coded
Allele
BETA
(msec)
p-value
Coded
Allele
T**
Coded
Allele
Frequency
0.665
3
rs1805126*
38567410
-1.002
1.45E-08
3
rs9832895
38636537
C
0.518
0.895
3
rs6771881
38553268
A
0.756
3
rs12053903
38568397
C
0.329
3
rs7374138
38580736
C
3
rs11129801
38725379
A
3
rs6800541
38749836
3
rs12636358
3
BETA
(msec)
CHARGE QRS
p-value
A
Coded
Allele
Frequency
0.655
-0.6568
2.52E-20
SCN5A*
1.18E-07
C
0.497
0.6576
8.68E-18
SCN5A
-1.003
2.23E-07
A
0.745
-0.7659
2.54E-22
EXOG
0.881
7.54E-07
C
0.330
0.646
1.87E-19
SCN5A
0.737
0.912
1.49E-06
C
0.718
0.5041
5.59E-10
SCN5A
0.294
-0.877
1.82E-06
A
0.301
-0.6296
2.60E-17
SCN10A
C
0.409
0.803
1.97E-06
C
0.398
0.7394
1.88E-27
SCN10A
38430905
A
0.047
1.867
2.30E-06
A
0.070
1.185
2.46E-12
XYLBI
rs7430439
38778643
A
0.597
-0.794
4.03E-06
A
0.588
-0.5516
8.32E-16
SCN10A
3
rs6798015
38773840
C
0.374
0.793
4.25E-06
C
0.370
0.7187
5.88E-24
SCN10A
3
rs4130467
38586708
G**
0.279
0.846
5.88E-06
C
0.289
0.5274
5.98E-12
SCN5A
3
rs6795970*
38741679
A
0.401
0.765
6.60E-06
A
0.396
0.7476
5.08E-27
SCN10A*
3
rs7374540
38609146
A
0.609
-0.758
1.04E-05
A
0.598
-0.5246
7.67E-14
SCN5A
3
rs6797133
38631037
A
0.394
-0.750
1.14E-05
A
0.395
-0.667
2.61E-21
SCN5A
3
rs9861242
38584338
A
0.250
0.848
1.20E-05
A
0.262
0.5273
7.43E-11
SCN5A
3
rs7373102
38655632
C
0.496
0.680
1.65E-05
C
0.494
0.4128
1.10E-08
SCN5A
6
rs6906287*
119069433
C
0.454
0.717
2.26E-05
C
0.451
0.5383
5.56E-16
C6orf204*
6
rs11970286
118787067
C
0.549
-0.715
2.33E-05
C
0.534
-0.5502
4.89E-16
SLC35F1
6
rs4307206
118920013
A
0.451
0.712
2.59E-05
A
0.450
0.5399
4.56E-16
C6orf204
6
rs1321313*
36726799
G**
0.760
-0.793
6.13E-05
C
0.742
-0.8129
4.60E-25
CDKN1A*
3
rs7430477
38740494
C
0.530
0.661
8.18E-05
C
0.524
0.6061
2.72E-19
SCN10A
3
rs6599257
38779592
C
0.313
0.705
9.58E-05
C
0.306
0.5508
2.23E-14
SCN10A
1
rs2207790*
61670555
T**
0.482
-0.622
2.07E-04
A
0.461
-0.5956
6.31E-18
NFIA*
*The phenome-wide association analysis results for these variants (best or near-best association
at each locus/gene in replication analysis) are shown in Table 3.
**This SNP was coded on the opposite strand from that in the CHARGE study. Therefore, while
the allele is not identical, the direction of effect is the same.
Nearest
Gene
Supplemental Figure 1: LocusZoom plot of the region in chromosome 3 in which most of the
replicated associations (18/23) were identified. All variants with p<10-4 in the eMERGE data set
were replicated at p<5x10-8 in the CHARGE data set.
Supplemental Table 2: PheWAS associations for replicated SNPs (N=23). Data for all
associations with P<0.01 are shown. ICD9 codes for given associations can be downloaded from
http://knowledgemap.mc.vanderbilt.edu/research/content/phewas.
SNP
rs11129801
rs11129801
rs11129801
rs6599257
rs6599257
rs6599257
rs6599257
rs6599257
rs6599257
rs6795970
rs6795970
rs6795970
rs6795970
rs6795970
rs6795970
rs6798015
rs6798015
rs6798015
rs6798015
rs6800541
rs6800541
rs6800541
rs6800541
rs6800541
rs6800541
rs7430439
rs7430439
rs7430439
rs7430439
rs7430439
rs7430477
rs7430477
rs7430477
rs7430477
rs7430477
rs7430477
rs7430477
Chr
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
3
rs11970286
rs11970286
rs11970286
rs11970286
6
6
6
6
Nearest gene
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
SCN10A
Disease association
Epilepsy and recurrent seizures
Malignant neoplasm of uterus
Migraine
Atrial fibrillation and flutter
Drug dependence
Cardiac arrhythmias
Lupus erythematosus
Cholecystitis
Migraine
Cardiac arrhythmias
Atrial fibrillation and flutter
Arterial embolism and thrombosis
Cholecystitis
Convulsions
Aphasia
Cholecystitis
Arterial embolism and thrombosis
Atrial fibrillation and flutter
Genital prolapse
Cardiac arrhythmias
Atrial fibrillation and flutter
Cholecystitis
Arterial embolism and thrombosis
Aphasia
Gingival and periodontal diseases
Intestinal malabsorption
Allergic rhinitis
Disturbance of skin sensation
First degree AV block
Cholecystitis
Cholecystitis
Epilepsy and recurrent seizures
Tinnitus
Gingival and periodontal diseases
Convulsions
Migraine
Inflammatory disease of cervix,
vagina, and vulva
SLC35F1
SLC35F1
SLC35F1
SLC35F1
Colorectal cancer
Nasal polyps
Malignant neoplasm of colon
Personal history of malignant
cases odds ratio
181
0.71
103
1.49
420
0.81
1758
0.84
71
1.68
3075
0.89
93
0.59
121
0.65
420
1.23
3075
0.88
1758
0.85
150
0.69
121
0.66
311
1.25
69
0.60
121
0.56
150
0.67
1758
0.86
511
0.82
3075
0.89
1758
0.86
121
0.66
150
0.69
69
0.61
229
0.77
156
1.42
865
1.16
664
0.85
181
1.34
121
0.70
121
1.71
181
0.72
159
1.39
229
1.30
311
0.80
420
0.83
266
0.79
316
127
226
130
0.77
1.48
0.75
0.69
P-value
6.23E-03
6.31E-03
9.81E-03
6.53E-04
2.88E-03
3.03E-03
4.31E-03
4.77E-03
5.78E-03
7.21E-04
8.45E-04
3.20E-03
3.44E-03
7.01E-03
7.25E-03
1.32E-04
1.73E-03
2.39E-03
8.87E-03
1.90E-03
2.15E-03
2.64E-03
3.05E-03
8.15E-03
9.66E-03
2.34E-03
4.83E-03
7.85E-03
9.09E-03
9.44E-03
5.85E-05
2.90E-03
4.46E-03
6.12E-03
7.36E-03
7.87E-03
8.21E-03
1.37E-03
2.32E-03
3.23E-03
3.80E-03
rs11970286
6
SLC35F1
rs11970286
6
SLC35F1
rs12053903
rs12053903
rs12053903
rs12053903
rs12053903
3
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
rs12053903
3
SCN5A
rs12053903
rs12053903
3
3
SCN5A
SCN5A
rs12053903
3
SCN5A
rs1805126
rs1805126
rs1805126
rs1805126
rs1805126
3
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
rs1805126
rs4130467
3
3
SCN5A
SCN5A
rs4130467
rs4130467
rs7373102
3
3
3
SCN5A
SCN5A
SCN5A
rs7373102
rs7373102
rs7373102
rs7373102
rs7373102
rs7373102
3
3
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
rs7373102
3
SCN5A
rs7374138
3
SCN5A
rs7374138
3
SCN5A
rs7374540
rs7374540
rs7374540
rs7374540
rs7374540
3
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
neoplasm of colon
Personal history of malignant
melanoma of skin
Hyperplasia of prostate
Chronic kidney disease
Atherosclerosis
Cardiac arrhythmias
Other peripheral vascular disease
Disorders resulting from impaired
renal function
Osteomyelitis, periostitis, and other
infections involving bone
Other paralytic syndromes
Thyrotoxicosis with or without
goiter
Fracture of other and unspecified
parts of femur
Cardiac arrhythmias
Diseases of upper respiratory tract
Hyperplasia of prostate
Chronic kidney disease
Personal history of malignant
melanoma of skin
Dermatophytosis
Late effects of musculoskeletal and
connective tissue injuries
Hyperplasia of prostate
Systemic lupus erythematosus
Postinflammatory pulmonary
fibrosis
Irritable Bowel Syndrome
Diseases of white blood cells
Abnormal glucose
Functional digestive disorders
Elevated blood pressure
Abnormal cardiovascular function
test
Anxiety, phobic and dissociative
disorders
Malignant neoplasm of female
breast
Chronic pharyngitis and
nasopharyngitis
Dermatophytosis
Spondylosis and allied disorders
Fracture of patella
Deviated nasal septum
Emphysema
151
1.37
7.37E-03
1615
0.87
9.66E-03
1212
1511
3075
1228
173
0.85
0.87
0.89
0.88
0.71
9.16E-04
1.60E-03
2.80E-03
5.84E-03
5.92E-03
192
0.73
7.56E-03
65
189
1.60
0.74
8.78E-03
9.64E-03
85
0.62
9.96E-03
3075
509
1615
1212
151
0.88
0.81
0.85
0.88
0.70
1.10E-03
3.83E-03
5.54E-03
6.02E-03
8.18E-03
1229
81
0.88
1.67
8.77E-03
1.55E-03
1615
81
173
0.85
1.56
1.50
6.97E-03
7.07E-03
1.31E-04
498
219
837
553
829
838
0.81
1.31
0.87
0.84
0.88
0.88
1.54E-03
3.98E-03
4.46E-03
4.95E-03
7.67E-03
7.87E-03
412
0.84
9.09E-03
645
1.22
2.10E-03
462
1.24
3.14E-03
1229
1244
65
275
138
0.87
0.86
0.56
0.77
0.69
2.05E-03
2.54E-03
3.58E-03
3.86E-03
5.01E-03
rs7374540
rs7374540
rs7374540
rs7374540
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
0.89
0.75
0.91
1.54
5.77E-03
6.29E-03
6.34E-03
6.47E-03
0.89
0.75
1.56
1.20
1.25
1.69
8.31E-03
9.38E-03
2.14E-03
5.79E-03
6.76E-03
7.48E-03
1.20
1.66
9.29E-03
2.18E-03
1.27
9.71E-03
SCN5A
SCN5A
Osteopenia
2161
Candidiasis
201
Sprains and strains
2928
Late effects of musculoskeletal and
81
connective tissue injuries
Urinary tract infection
2094
Polymyalgia Rheumatica
214
Acute reaction to stress
99
Mitral valve disorders
551
Debility
309
Disorders of tooth development and 55
eruption
Peptic ulcers
427
Late effects of musculoskeletal and
81
connective tissue injuries
Deficiency of B-complex
284
components
Sleep disorders
52
Other diseases of appendix
59
rs7374540
rs7374540
rs9832895
rs9832895
rs9832895
rs9832895
3
3
3
3
3
3
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
SCN5A
rs9832895
rs9861242
3
3
SCN5A
SCN5A
rs9861242
3
SCN5A
rs6797133
rs6797133
3
3
0.50
0.59
2.48E-03
9.88E-03
rs12636358
rs12636358
3
3
XYLB
XYLB
Sleep disorders
Diseases of hard tissues of teeth
52
209
2.46
1.65
5.26E-03
9.73E-03
rs1321313
rs1321313
rs1321313
rs1321313
rs1321313
rs1321313
6
6
6
6
6
6
CDKN1A
CDKN1A
CDKN1A
CDKN1A
CDKN1A
CDKN1A
53
118
108
398
841
2501
0.36
0.57
1.55
0.76
0.83
1.11
1.57E-03
2.01E-03
3.04E-03
3.39E-03
5.69E-03
7.39E-03
rs1321313
6
CDKN1A
Anorexia
Bacteremia
Anal fissure and fistula
Abnormal loss of weight
Dementias
Overweight, obesity and other
hyperalimentation
Spondylosis and allied disorders
1244
1.16
9.39E-03
rs2207790
1
NFIA
173
0.70
1.76E-03
rs2207790
1
NFIA
205
1.37
1.97E-03
rs2207790
1
NFIA
54
0.54
2.80E-03
rs2207790
rs2207790
rs2207790
rs2207790
rs2207790
rs2207790
rs2207790
1
1
1
1
1
1
1
NFIA
NFIA
NFIA
NFIA
NFIA
NFIA
NFIA
Postinflammatory pulmonary
fibrosis
Abnormal serum enzyme levels
(acid phosphatase, alkaline
phosphatase, amylase, lipase)
Multiple myeloma and
immunoproliferative neoplasms
Neutropenia and leukopenia
Facial nerve disorders
Cardiomyopathy
Pneumothorax
Disorders of function of stomach
Prostatitis
Toxic diffuse goiter
238
67
420
80
355
150
76
0.75
1.68
0.80
0.64
1.23
0.72
1.54
2.81E-03
3.70E-03
4.40E-03
6.59E-03
7.73E-03
8.85E-03
9.47E-03
rs4307206
rs4307206
6
6
C6orf204
C6orf204
Colorectal cancer
Malignant neoplasm of colon
316
226
0.76
0.74
1.01E-03
2.76E-03
rs4307206
rs4307206
6
6
C6orf204
C6orf204
rs4307206
rs4307206
6
6
C6orf204
C6orf204
rs4307206
rs4307206
6
6
C6orf204
C6orf204
rs6906287
rs6906287
rs6906287
rs6906287
rs6906287
6
6
6
6
6
C6orf204
C6orf204
C6orf204
C6orf204
C6orf204
rs6906287
6
C6orf204
rs6906287
6
C6orf204
rs6906287
rs6906287
6
6
C6orf204
C6orf204
Hyperplasia of prostate
Personal history of malignant
neoplasm of colon
Penicillin allergy
Disorders of pancreatic internal
secretion
Nasal polyps
Personal history of malignant
melanoma of skin
Colorectal cancer
Malignant neoplasm of colon
Penicillin allergy
Hyperplasia of prostate
Personal history of malignant
neoplasm of colon
Personal history of malignant
melanoma of skin
Disorders of pancreatic internal
secretion
Nasal polyps
Prostatitis
1615
130
0.86
0.69
3.92E-03
4.14E-03
65
84
1.64
0.65
6.07E-03
7.40E-03
127
151
1.40
1.36
8.87E-03
9.30E-03
316
226
65
1615
130
0.76
0.75
1.68
0.86
0.70
9.90E-04
3.03E-03
4.01E-03
4.19E-03
5.04E-03
151
1.38
5.68E-03
84
0.64
6.22E-03
127
150
1.41
0.72
7.15E-03
8.07E-03
rs6771881
rs6771881
rs6771881
rs6771881
rs6771881
rs6771881
rs6771881
3
3
3
3
3
3
3
EXOG
EXOG
EXOG
EXOG
EXOG
EXOG
EXOG
Hyperplasia of prostate
1615
Chronic liver disease
254
Cardiac arrhythmias
3075
Sexual and gender identity disorders 76
Chronic kidney disease
1212
Cerebral degenerations
137
Diarrhea
880
0.81
1.33
0.88
0.51
0.86
0.63
0.85
8.10E-04
3.55E-03
3.86E-03
4.20E-03
5.07E-03
5.29E-03
9.06E-03
Supplemental Figure 2: Phenome-wide association study plot for CDKN1A (rs1321313).
Supplemental Figure 3: Phenome-wide association study plot for C6orf204 (rs6906287).
Supplemental Figure 4: Phenome-wide association study plot for NFIA (rs2207790).