Special Article Mapping Epidemiology`s Past to

American Journal of Epidemiology
© The Author 2015. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of
Public Health. All rights reserved. For permissions, please e-mail: [email protected].
Vol. 182, No. 2
DOI: 10.1093/aje/kwv034
Advance Access publication:
May 14, 2015
Special Article
Mapping Epidemiology’s Past to Inform Its Future: Metaknowledge Analysis of
Epidemiologic Topics in Leading Journals, 1974–2013
Ludovic Trinquart* and Sandro Galea
* Correspondence to Dr. Ludovic Trinquart, Department of Epidemiology, Mailman School of Public Health, Columbia University,
722 West 168th Street, New York, NY 10032 (e-mail: [email protected]).
Initially submitted July 29, 2014; accepted for publication January 28, 2015.
An empiric perspective on what epidemiology has studied over time might inform discussions about future directions for the discipline. We aimed to identify the main areas of epidemiologic inquiry and determine how they evolved
over time in 5 high-impact epidemiologic journals. We analyzed the titles and abstracts of 20,895 articles that were
published between 1974 and 2013. In 5 time periods that reflected approximately equal numbers of articles, we identified the main topics by clustering terms based on co-occurrence. Infectious disease and cardiovascular disease epidemiology were the prevailing topics over the 5 periods. Cancer epidemiology was a major topic from 1974 to 2001
but disappeared thereafter. Nutritional epidemiology gained relative importance from 1974 to 2013. Environmental
epidemiology appeared during 1996–2001 and continued to be important, whereas 2 clusters related to methodology
and meta-analysis in genetics appeared during 2008–2013. Several areas of epidemiology, including injury or psychiatric epidemiology, did not make an appearance as major topics at any time. In an ancillary analysis of 6 highimpact general medicine journals, we found patterns of epidemiologic articles that were overall consistent with the
findings in epidemiologic journals. This metaknowledge investigation allowed identification of the dominant topics
in and conversely those that were absent from 5 major epidemiologic journals. We discuss implications for the field.
bibliometrics; knowledge; periodicals as topic; terminology as topic
areas gaining or losing importance. Underrepresented research
paths might be due to a lack of attention to important areas that
need to be looked at in the future.
Metaknowledge investigations can complement the ongoing self-reflection in the field (10). Because they involve
analyzing large quantities of texts, metaknowledge investigations have the potential to allow the investigation of the
distribution and relative influence of topics over time (11).
Considering that what we, as epidemiologists, write should
reflect our vision of the discipline, such analysis may help
us shape the discipline. Therefore, we aimed here to provide
an empirical perspective on the field of epidemiology by
identifying the main topics in 5 major epidemiology journals
and assessing how they had evolved over the past 40 years.
The definition of epidemiology has not changed significantly
since it originated. A review of 70 epidemiology textbooks published between 1931 and 2014 shows that epidemiology has
consistently been defined as the science of understanding the
distribution and determinants of population health to be able
to intervene to control or prevent disease (Web Table 1 and
Web Figure 1, available at http://aje.oxfordjournals.org/). However, the scope of epidemiologic study and practice has expanded substantially over the past few decades. Between 1974
and 2013, there was a nearly 6-fold increase in the use of the
term epidemiology in papers indexed by MEDLINE.
Motivated in part by changes in funding opportunities and
in the scale of population-based studies, several recent comments have been concerned with potential future directions
for epidemiology as a discipline (1–7). However, much of
this soul-searching has been informed principally by expert
opinion, with little evidence to guide our thinking. An empiric perspective on the field’s evolution may be useful to
help guide our collective thinking about future research directions for the field (8, 9). A large-scale content analysis can
track how areas of epidemiologic research evolve, with various
METHODS
Selection of articles
We considered 5 high-impact epidemiology journals: the
American Journal of Epidemiology, the International Journal
93
Am J Epidemiol. 2015;182(2):93–104
94 Trinquart and Galea
of Epidemiology, the Annals of Epidemiology, Epidemiology,
and the European Journal of Epidemiology. Their impact
factors are among the highest for the category public, environmental, and occupational health of the Journal Citation
Reports, and these journals are widely considered the journals of record (12).
We retrieved from MEDLINE via PUBMED the records
of all indexed articles published in these 5 journals up to
2013, without any restriction on article type but including
only articles for which an abstract was available. The selected articles were categorized into 5 time periods, and
we aimed to have approximately equal numbers of articles
in each time period. Data were analyzed separately for each
period of time.
Linguistic processing
For each article, we extracted the title and abstract and then
combined them into a single string. We discarded the words
used to denote the structure of the abstract. Grammatical tagging allowed us to identify the part of speech (e.g., noun, pronoun, adjective, noun, or verb) and assign the lemma (its
canonical form) of each word of a string. For instance, “genes”
and “gene” would be assigned the lemma “gene.” Moreover,
we developed a thesaurus that allowed for merging of different spellings of the same word (“ischaemic” and “ischemic”)
and for merging an abbreviation with the word or phrase itself
(PTSD and “posttraumatic stress disorder”). Each string was
then reduced to a set of noun phrases, that is, single nouns or
sequences of adjectives plus nouns or nouns that belong together (e.g., “cardiovascular disease” or “relative risk”). In
the remainder, noun phrases were referred to as terms.
Terms that occurred multiple times within a string were
counted only once, and we discarded the terms that occurred
in fewer than 10 articles. The relevance of each term was estimated as the degree to which the occurrences of the term were
oriented towards 1 or more topics underlying the articles. For a
given term, it was measured as the Kullback-Leibler distance
between the distribution of (second-order) co-occurrences between that term and all other terms and the overall distribution
of co-occurrences over all terms. We selected the top 60% of
the terms with the highest relevance (13).
Mapping and clustering of terms
The selected terms were positioned on a 2-dimensional
co-occurrence plot and were grouped into clusters based on
the co-appearance of terms. A normalized co-occurrence frequency was derived for each pair of terms. The locations of
terms on the plot were determined by minimizing a weighted
sum of the squared distances between all pairs of terms. Minimization was achieved through stress majorization (14).
Consequently, terms with high co-occurrence tend to be close
to each other, whereas terms that are far away from each other
do not or rarely occur together in the same article (15). The
terms were also assigned to clusters using a weighted variant
of modularity-based clustering (16, 17). We characterized
each cluster by providing a heading based on the terms in
the cluster. We assessed the relative importance of clusters
according to their share of terms relative to the total number
of terms. Clusters of terms are interpreted as major epidemiology topics, and clusters located close to each other in the
map indicate related topics.
For each time period, the resulting maps show terms as labeled nodes in the co-occurrence network. Node size is proportional to the term frequency of occurrence, so that the
larger the node, the more articles include the term. The clustering of the terms is displayed on top of the map by coloring
nodes based on the cluster to which they belong. Analysis involved the use of the VOSviewer software, version 1.5.7
(Centre for Science and Technology Studies, Leiden University, The Netherlands) (18).
Identification of bursts
To identify topics that attracted attention in epidemiology
research but eventually faded away, we used Kleinberg’s
burst detection algorithm to identify words that experienced
sudden increases in use (19, 20). The algorithm assesses
states of the document stream, with different frequencies of
individual words, and identifies state transitions, that is,
years around which the frequency of a word’s usage changes
significantly. The analysis generates a list of burst words, together with the intervals of time during which each burst occurred and the intensity of the burst. We visualized the top
100 burst words graphically on a horizontal bar chart, with
publication year on the x-axis, burst words on the y-axis,
and a bar from the start to the end of the burst. The bar
width is proportional to the intensity of the burst. Bars
were color-coded according to the major epidemiology topics, as previously. Some words did not belong to any particular cluster, and the corresponding bars were left uncolored.
Analysis involved the use of the Science of Science Tool,
version 1.1 β (Cyberinfrastructure for Network Science Center, Indiana University, Bloomington, Indiana, http://sci2.cns.
iu.edu).
Ancillary analysis of high-impact general medicine
journals
Because many epidemiologic articles are published outside of epidemiology journals, we performed the following
ancillary analysis. We considered 6 high-impact general
medicine journals: The New England Journal of Medicine,
The Lancet, The Journal of the American Medical Association, The BMJ, Annals of Internal Medicine, and PLoS Medicine. To identify articles most likely of relevance to the field
of epidemiology, we analyzed how the articles published in
the 5 epidemiology journals were indexed with MeSH terms
in MEDLINE and we derived the following sensitivitymaximizing search filter: “epidemiology” (Subheading) OR
Epidemiologic Factors (MeSH) OR Epidemiologic Methods
(MeSH) OR epidemiologic studies (MeSH). Using this filter,
we retrieved from MEDLINE the records of articles with abstracts that were published in these 6 general medicine journals during the same time period that the other articles in the 5
epidemiology journals. The selected articles were categorized into the same 5 time periods. We applied the same linguistic processing to the titles and abstracts, and we mapped
and clustered terms into major topics, as previously.
Am J Epidemiol. 2015;182(2):93–104
Metaknowledge Analysis of Epidemiologic Topics 95
RESULTS
These 5 major topics were identified consistently over the
1990–1995 and 1996–2001 periods. In addition, a cluster
corresponding to female cancer epidemiology was identified
for 1990–1995 but disappeared thereafter, whereas a cluster
related to environmental appeared during 1996–2001 and
persisted in the subsequent periods.
For the period of 2002–2007, infectious and cardiovascular diseases epidemiology remained among the top major
topics. However, the cluster related to cancer epidemiology
disappeared, and the nutritional epidemiology cluster gained
a larger share of the term map. Moreover, a cluster related to
methodology appeared during 2002–2007, ahead of reproductive and perinatal epidemiology and environmental epidemiology, and persisted in the subsequent period.
Finally, 2008–2013 saw the appearance of another cluster
related to meta-analysis in genetics, and the period included a
total of 7 clusters. Cardiovascular diseases, nutrition, and infectious disease epidemiology remained the top major topics.
Reproductive and perinatal epidemiology and environmental
epidemiology completed the picture.
Characteristics of selected articles
We selected 20,895 articles. Overall, 42.7% were published
in the American Journal of Epidemiology, 21.7% in the International Journal of Epidemiology, 10.2% in the Annals of
Epidemiology, 10.9% in Epidemiology, and 14.5% in the European Journal of Epidemiology. Figure 1 shows the evolution
over time of the yearly number of articles across the 5 journals.
In all, 3,725 (17.8%) articles were published between 1974
and 1989; 3,948 (18.9%) were published between 1990 and
1995; 4,492 (21.5%) were published between 1996 and 2001;
4,180 (20.0%) were published between 2002 and 2007; and
4,550 (21.8%) were published between 2008 and 2013.
Mapping and clustering of terms
Figure 2 shows the mapping and clustering of terms over
time. Table 1 shows a summary of the clusters of terms and
the evolution of major epidemiology topics. The map for
1974–1989 contained 5 main clusters of co-occurring terms,
which corresponded to infectious diseases epidemiology,
cardiovascular diseases epidemiology, cancer epidemiology, reproductive and perinatal epidemiology, and nutrition
epidemiology.
Identification of bursts
The analysis of the 100 top burst words showed similar
patterns (Web Figure 2). From 1974 to 1999, all of the bursts
Journal
800
American Journal of Epidemiology
International Journal of Epidemiology
Annals of Epidemiology
Epidemiology
European Journal of Epidemiology
No. of Articles
600
400
200
0
1975
1980
1985
1990
1995
Publication Year
Figure 1. Number of articles published per year from 1974 to 2013 by journal.
Am J Epidemiol. 2015;182(2):93–104
2000
2005
2010
96 Trinquart and Galea
A)
control
oll woman
invasive cerrvica
v l cancer
o rian
ova
n ccanc
an er
can
anccer
cer risk
ce
ris
ri
brea
br
ea
ast
s can
ance
nc r
co
col
olon
n
orall con
ont
ntracept
rac
ac
cept
eptive
ve
ve
postmenop
op
paus
au
a
us
usal
all wom
woma
an
equa
qua
al n
nu
number
u be
be
beer
bee
ee cup
c
cu
allco
coho
ho
hol
o
beve
ve
erage
dri
rinke
ri
inker
nke
ke
caff
ca
ffeiine
ff
die
d
ie
i t
sstr
st
ttrratific
fica
fic
ation
n
f t
fat
cur
urren
ur
r t smoker
re
cco
ons
nsumpt
um
mpt
p io
on
pac
pack
ac
myo
ocar
ardia
ardia
dial iinfarcttion
dial
on
miid
iddle
iddle
d -ag
aged
ag
e man
uric
c acid
Nutritional
gain
gai
n
cod
o e
od
a cide
ident
ide
ent
n hhomi
biasse
bi
bia
se acc
ho
omi
micide
c
ci
blloo
o d pr
p es
ssu
ure
re
Mexican
nA
Amer
merica
mer
me
i n
inffa
ant
nt
iss and
an
a
n
nd
d
se
sen
ensit
en
ensit
sitiv
iv
vity
vi
ty isl
falll
fal
year
ar old
g l
gir
C.tracho
chomatis
cho
ho
hou
o seh
hold
d
sch
sc
c ool
ol
wiin
inte
er
sch
hoolc
oolchild
co
ough older pe
oug
p rson
B sur
surfac
f e antigen
fac
nti
t ge
g
anti
an
tib
bodyy
body
bo
nos
nosoc
nos
soc
oco
cco
omia
miall iinfe
mia
nfe
fec
ct on
cti
o sta
sta
st
tay
su
surv
urv
rvei
eilllllan
ei
ance
e
hepatitis Be an
ntig
tigen
ige
ins
nstiituttion
n hep
hepati
a tis
at
ati
t
do
dog
Ta
aiwan
aiwan
w
hos
ostt
mou e
mous
Bra
B
Br
razil
zii
may
may
infection
IInd
nd
dia
ia
cul
cu
ultur
ul
ure
e
h gh rate
hig
a
at
Ind
nd
ndia
dia
an
Texas
Texa
ast
stthm
ma
HIV
H
V
ani
an
a
nima
ni
mal
syndro
syn
dro
drom
ome
aug
u us
ust
st
sep
septem
ptem
mber
berr
r pirato
res
ory
y disease
pre
ess
sssu
ure
Am J Epidemiol. 2015;182(2):93–104
hig
h
i h-d
dens
ensititity
y liipop
poprot
ro
otein
ein ch
chole
o sterol
emp
em
e
ploy
ploy
yee
ee
prin
nciple
nciple
nci
ple
hea
heal
ea
alth
alth
al
h care
Cau
ucasian
uc
AID
AIDS
AID
down
ow
Me
exico
exico
co
measur
me
meas
asurre
em
men
nt
trra
raiitt
rait
effic
ciency
cie
es ma
esti
mato
or
reg
re
egres
re
e sion
sion mode
el
predi
pre
edic
dictor
tor
tor
to
partn
par
n er
bab
aby
ab
pro
otec
tection
te
US
USA
ssump
ss
ump
ption
mo
m
ort
rtal
alititity
y ra
rate
te ass
case fat
fa alit
fa
litty
lit
hyype
errttten
en
nsion
ssiion
o
de
efec
feccctt
Los Ange
An les
Chin
in
nese
e
bir
irth
ir
th
h wei
eiight
e
gh
ht
pe fo
per
orm
or
ma
ance
sub
bty
typ
y e
Upstate
e New York
che
emic
mi al
de
elive
live
very
ery
birth
bir
th
h coh
co
ohort
oh
orrt
o
isc
iis
chaemic
chae
hae
ha
ae
a
em
mic
ic
ic h
he
e
eart
art di
art
d sease
e
man
Reproductive and Perinatal
con
oncep
on
ce
cep
eption
titive
e ap
morrtalityaaltlternrnaative
ppr
p oa
oach
ach
smok
sm
ok
kin
ing
ing
co
oro
onar
narryy hea
na
e rt
rt disseas
ea
asse
e
Infectious Disease
leu
eukem
eu
kem
mia
a
ca
ase
se con
ntrro
oll stu
udy
dy
rre
ela
atiive
e ris
isk
isk
liver ciirrho
r sis
rr
cofffe
cof
coffe
ee
lympho
phoma
pho
tum
umor
umor
o
pa ity
par
pa
ty
y can
ancce
an
ce in
cer
nci
c enc
cid
nc
ce
more
ore yyear
ear
arr
a
pro
p
ro
otec
te
ecctive
tive effe
tiv
f ct
Cancer
Cardiovascular Disease
Washin
hiingto
hingto
hin
g n Stat
tat
ae
illln
lnes
esss
in
nflu
nf
l nza
lue
pres
schoo
chool
h
child
type
pe B
orga
org
rrgani
an sm
anism
seroco
co
onve
ers
rsi
sion
on
viru
vi
rus
ru
s
se
erologi
og
ogi
gic test
st
eme
emer
mergenc
mer
genc
genc
en e
co leme
lement
ement fixation
o tbre
ou
tb
bre
r ak
a comp
diarrhea
di
al di
ds
se
seas
sea
e e ill p
pe
pers
e on
aden
nov
o us
ovir
re
einfect
fec
e ion
ec
H3N
H3N2
N
H1N1
H1N
N
respirato
ory
y ill
illness
ll
boy
Tecu
cu
umseh
respiratory sync
sy ytial virus
syn
total ch
hole
ho
esterrol
Figure 2 continues
Metaknowledge Analysis of Epidemiologic Topics 97
B)
maternal
al exposure
al
Cancer
first trrim
rimes
ri
me ter
me
Cardiovascular Disease
fetal gr
g ow
owth
owt
w
Female Cancer
m for
mal
orm
or
ma
ation
cancer
err control
Infectious Disease
ges
ge
ges
e ta
tation
tat
attion
on
o
nal
al ag
age
ge
stillb
stil
illb
lbir
lbi
bii th
b
panc
n rea
nc
a
bla
la
ad
dde
der
ovvaria
ova
o
rian
n canc
ncer
nc
e
rectall cca
ancer
rad
adia
a
iattion
iat
on
n
emp
mployymen
mploy
me
e t
cof
co
o fee
ee
ee
ca
c
anc
ncer
e
veg
geta
ta
able
e
fiber
fib
iber
er
caro
caro
otene
te
te
non
no
n-ssm
nmok
oker
ke
err
co
ons
nsum
mpt
ptio
ion
m k
mil
fat
att
oho
hol
ol in
nta
t ke
energy
gy in
in ke allco
intak
vva
alilid
idi
dity
y
we
w
e
eig
igh
htt
art
ar
a
rticl
clle
lung
ung
un
u
n functio
ffun
fu
un
unctio
cttio
cti
ct
io
ion
min
in
nu
ute
te
e
ani
a
an
niima
mal
m
al fe
ffev
ever
e
e mea
mea
easle
sle
fut
futu
fu
uttu
ut
utu
ur
ure
re
grration
atio
a
tion
ttio
llowe
lo
owe
er ra
rate
tte
e migr
Ital
It
aly
dea
ath ra
ra
rat
ate
urin
r e
rat
ra
rati
ation
onal
o
n
nale
potass
as ium
ass
i m
spo
spor
p t
bo
ody
dy masss in
nde
d x
beta
be
bet
et
eta
e
diiab
a ettes
blloo
b
ood
od p
prre
es
ssure
su
urre
e
tra
rans
ransm
ra
nsm
smitt
ttte
ed
d dis
se
ea
ase
a
imm
munogl
og
ogl
g obul
obu in G
a titibo
an
bo
ody
dy
AID
AI
DS se
DS
ser
e opr
opreva
op
eva
v len
lence
mark
ma
ker
er
cond
ond
ndom
om use
se
HIV
HI
V
dru
ug use
se
Hisspa
p ic
pan
high-d
den
e si
sityy lip
sity
ipop
opro
pro
rote
ein
in cho
h le
est
s erol
serrol
serolo
olo
o
l gy
gy
Q fe
eve
v r
rub
rube
bella
b
infection
ssttrok
stro
rokke
ro
e ser
serum
se
um higher pr
ow rra
rate
a
p ev
evalen
ence
en
ce loow
Tex
exa
ex
as
ssttra
rain
in
in
attack
att
ack
ac
ck
k ra
rate tic
ick
ck
te
estin
es
estin
ng
sout
sou
outh
ou
h
Ma
Mar
aryla
ar
y nd
yl
nd
ch
hol
olesste
t ro
ol
serum total
all cch
cho
h lesterrol
o
whit
whit
wh
ite
e
sig
gn
Kap
apos
ap
pos
os
osi
sexua
ua
ual
a acctiv
t vity
y
past
ast
stt year
s ular
sec
ula
la
ar tren
rend
re
d
ccul
cu
u tur
urre
u
tech
te
chni
ch
niqu
ni
q e
qu
eur
eu
u
urrope
ope
op
pe
cellll e
cell
ce
goa
go
oal
o
bo
boy
oy
corrrrel
co
corr
rel
e at
ation
io
on
ph
hysic
icall acttiv
ivit
vit
ity
y
sto
ool
surv
su
vei
eillllllan
ance
an
ce
m rttal
mo
alilit
ity rat
ra
ate
te
prro
pro
rote
ote
tei
ein
mala
al ria
dia
arrh
hea
a
ap
ppr
proa
oa
ach
ch
Flor
Flor
Flo
lo iida
d
instrru
rument b
rum
bro
bron
br
r nch
chitiis
chit
is
chil
hi d m
hil
mortality
new
wb
bor
bo
on
off
o
ffice
ic
ce
e
co
con
once
on
cep
cep
pt spec
spe
sp
ec
cifific
iiccity
itity
ccas
ca
asse pa
a
atittiient
ent
ciiga
gare
rett
tte sm
mok
oki
kiin
ng
die
d
eta
ttar
arry in
nta
ta
take
ke c
dietary chol
ch esteroll
cho
pre
egnant
gna
antt wo
wo an
wom
n
err
e
er
rro
orr pa
pare
ent
n
canc
ca
ancer
nc
cer
er mortalility
lility
ty
alco
al
oh
ho
ol
n rie
nu
nut
entt
bab
baby
ab
by
plan
pla
l nt
m th
ther
err
e
fa
athe
at
ther
he
er mo
misscla
mis
classi
ass
sssi
sific
f ati
ation
on
n
rel
rre
elati
attiive
a
e
carrcin
c om
ci
oma
fru
fr
fru
uiitt
vitam
amin E
am
maccronut
o rient
onu
on
passsive
pas
sivve smo
siv
mok
okkin
o
iing
n
Reproductive and Perinatal
lo
lo
ow
w birth
biirth
b
h weight
exxxpe
expe
pe
p
ert
rt
pari
rity
ty
ty
c nc
ca
ncerr riissk
Nutritional
con
c
co
o gen
on
g ita
it l malformation
it
pregna
pr
gna
nan
ncy
nc
intraveno
eno
ous
sd
dru
dr
r g user
HIIIV
V ty
type
lower prrev
e ence
eval
h atiitis
hep
is C virus
non-Hisp
sp
pa
anic
an
n white
card
dio
i va
asccul
ular
ar ris
ar
i k factor
C)
early prreg
egna
g ncy
Cancer
neural tube
ube defects
Cardiovascular Disease
neural tu
ube
be defect
ec
ect
ct
dietary
ary
ry fiber
recctum
ctum
u
la
ant
n
eleva
el
ele
atted
a
ted
d ris
isk
sk pla
liittle
itittle
le
e ev
evid
dence
ce
e
po
pos
ostme
tme
meno
nop
op
paus
aus
au
usal
al wom
wo
om
man
an lit
cons
co
cons
nsum
ump
mption
ion
io
be
b
ee
e
er
nu
utrient
utrie
rie
ent
wiine
win
vitamin
vit
vitami
amin C
b-ca
aro
rot
otene
ene
ne
l ng
lu
ng can
ance
ce
cer
er
egg
gg
g
cor
o rre
rec
ec
e
cttitio
ion
ion
waist
wais
wai
obes
ob
esititity
ty
insuli
uliliin lleve
eve
el
in
ins
nsuli
ns
ulin
lin
Arrgentina
Arge
A
reco
ec gnit
eco
gn ion
on
enz
en
enzyme
n yym
yme
me
sm
moke
mok
ke
pla
aceb
ce o
coe
co
coe
oeff
fficient
ffic
ffi
nt
ap
ppr
p oa
ach
h
card
ca
car
rdiov
rdi
iova
io
vasc
scul
ular
arr dis
isea
sea
ase
e
cro
ccr
ro
osss
high
h-d
den
ns
siity lip
ipop
ipop
opro
rotte
ein
n cho
ole
l st
ster
errol
erol
ol
homi
ho
omi
o
m cide
e
se
ea
e
aso
aso
on
on
as
a
sth
hma
ma
hig
gh-d
h-dens
en
nsity
ity
y lilipop
ipop
po rotein
adult pop
op
pula
ulatio
latio
ion
n
Thai
Tha
aiilan
land
and
fev
e er
ev
er
stra
st
rate
ra
tegy
te
g
gy
B.ga
garini
ga
r i
tickk
tic
s ra
st
rain
in
n
Q fev
fever
infection
int
introd
ntrod
ro
o uc
uct
cttiion
o
imp
mplem
m
le
ement
e
ntati
nt
ati
at
ttion
n
a iss
adm
ssion
siion
on
airr pollu
air
polllu
lu
utio
tion
tion
lung
g ffun
unc
un
u
n tion standard
rd
d method
u lit
ut
uti
lity
ty
Wa
Wal
W
ales
es
rres
es
sp
spir
pir
irrato
att ry dise
a
diisea
dis
eas
e
ase
detect
det
ecttion
ec
io
on
cat
appl
ap
p ic
pl
i atio
attio
ion
n dia
i rrhea
rrh
he
ea
a
vali
va
alilidi
dity
di
ty
y
erro
erro
rorr
Egyp
gy t
agen
ag
ent
bio
i psy
iopsy
y
respir
res
es
spir
pirra
pi
ato
to
tory
oryy
gen
g
ge
e e bia
b sse
e
weste
west
ern
er
ern
r popu
opu
pulla
pula
lati
a on
n
fat dist
sttrib
riib
bution
n
Japanese
e Am
Ame
m rican ma
man
man
mal
allign
ig a
anc
nccy
cy
war
war
a
ex
xces
xc
ss wa
excess
es mo
ess
morta
rta
alilit
li y
ins
nsstr
trum
rume
men
m
ent b
en
bllac
ack
k
cor
c
orrel
relate
late
ate
at
ly
lym
ympho
ym
p ma
standa
sta
tanda
dard
da
ard
rdi
diize
d
zzed
e
ed
d mo
ort
rt lit
rta
itty rat
ity
atitio
His
sp
pa
an
a
nic
a coholl co
al
cons
nsum
mpt
ptio
tiio
on
bodyy mass index
lu g
lun
ne
nec
neck
ne
eck
ec
cckk
alco
al
oho
hol
ol
mel
me
e anoma
el
anoma
ano
ma
nitr
nit
trrate
ade
de
eno
noc
ocarc
arcino
inoma
ino
ma
m
a
nons
nsmo
nsm
moke
moke
ker
diet
di
et
et
hig
hig
ghes
he
h
e
es
st tert
e ile
energ
gy in
gy
nta
tak
ke
oil
oil
leu
ukem
kemia
mi
biirt
b
rth
h
ttum
tu
um
u
m
mor
or
or
brrea
east
st can
st
ance
nce
cer
er
item
m food freque
uency
ue
nc ques
ncy
qu
uesstion
ue
tionnair
nair
nai
aiire
irre
re
ab
bo
b
orti
orti
rtion
i
nigh
ht
bre
reast
re
east
ast ca
c nc
cer risk
cer
ce
skk
hor
o mo
one
Reproductive and Perinatal
preg
pr
reg
e na
ancy
ncy
nc
rectall ca
rectal
rec
cance
cerr
ce
t
tea
Nutritional
mat
ater
at
ernal
ern
a ag
age
e
multivvitamin
ita
a n
co
c
offee
of
fe
ee
Infectious Disease
cong
con
ongenital malformation
fetal gro
g wth
bev
vera
er ge
Environmental
sponta
spo
nt ne
nta
neo
eous
us abo
aborti
r on
folicc a
acid
rrura
ural po
popu
pu tion
pula
pu
tran
ransfu
ran
sfusion
sfu
sfus
ion
ion
ser
eropr
op
p eva
valen
le
len
en
nce
Tur
Turkey
urkey
k seropr
int
in
ntravven
n
e us
eno
us drug user
u
HIV
HI
V
AIDS
AID
S HIV
HIV sero
HI
ero
rosst
stat
sta
t us
s
CD4
CD4
4 count
hepatit
hepa
titis C
tit
HIV epidemic
weath
weat
her
her
pa ticula
par
ula
late
te
e matter
CAR
ARDIA
AR
CARD
DIIA st
study
Figure 2 continues
Am J Epidemiol. 2015;182(2):93–104
98 Trinquart and Galea
D)
cle
left
le
eft lip
Cardiovascular Disease
Environmental
Infectious Disease
geneticc variation
geneticc variant
v
hap
plotyp
lot
otyp
ype
HuGE
E revi
e ew
neural tu
ube
e defect
Methodology
Nutritional
Reproductive and Perinatal
dietary
ry fola
f te
p ly
po
ymorp
phi
hism
sm
m
P chloro
oro
rophenyl
ro
fo
olatte
ol
metabo
met
abolillit
ab
abo
lite
ma
ate
at
ern
r al
al age
ag
ge
ge
par
p
arrit
ity
tty
y
iro
iron
ro
Haw
aw
a
awaii
waii
a
ai
foo
od fre
equ
uen
ency
ency
y que
uesst
stionn
io
o nai
a re
veg
egeta
eta
etable
able
le
caro
otte
ote
enoid
pla
asm
ma
m
a
nond
non
dri
d
ri
r nker
nk
nke
ker
Sea
S
Se
e
eat
at
att
ttltle
tl
pub
ub
blic
liccat
a on
ati
o
treatmen
entt effect
en
asssu
sump
mp
ptitio
tion
l era
lit
era
attur
tu
u
ure
v us
vi
vir
s
in
nfe
fect
ect
ctio
on
de
d
eb
e
bate
prrob
p
ble
em
youn
ou g w
oun
woman
ma
an
n
ca
ca
can
an
nce
ncer
cer mo
cer
orrta
tta
alilitity
alit
Lond
on
nd
n
don
on
inffluen
lue
lu
uenza
ue
we
we
wea
eath
ther
par
p
pa
ar
a tic
tic
icula
ula
ul
late
e ma
ma terr
mat
coro
coro
co
ona
ary he
ea
artt dis
isea
ease
ase
e
short te
erm
m effect
ine
ine
in
equa
qua
ality
al
syst
sy
yst
stol
olic
ic blo
ic
lood
od pre
od
ress
ssur
ss
urre poor
oo
o h
he
ealth
e dem
ep
epi
demic
ic
c d mo
chil
ortality
peak time
peak
pea
e series
morrttal
mo
alit
litity
ty ra
ate
de
dea
d
ea
ath
th rrat
ate
ate
carrd
ca
dio
iova
ova
vas
ascul
ular
arr dis
isea
ease
se
se
mea
me
m
easle
s
sprea
sp
spr
ead
rru
rur
ural
ur
ral
a ar
are
a
r a
tra
ra
raff
affffiic yout
yo
ou h
sum
um
m eth
ethnic
thnic
nic di
n
difffe
diff
feren
nce
ce
po
pol
p
ollut
ol
lut
uta
an
ant
an
HIV
HI
HIV
HIV
H
HI
IIV in
infe
fe
fec
ectio
on
cul
ccu
ultur
ue
ure
ur
coun
co
cou
nt
nt
urb
u
rb
r an
a a
arre
are
rea
imm
mmun
m
unity
sittu
sit
ua
uat
ation
at
on
o
n ste
stte
st
ep
he
hea
h
ealtltth
ea
h ef
effec
f ect
ec
ct
assth
asth
thm
ma
a
weig
weig
we
gh
htt
jou
urn
nal
epid
epid
ep
dem
emio
mio
iolo
lo
logy
ogy
gy
valida
ida
dat
da
atio
tiion
tion
ssttta
sta
and
nda
ard error
rest
res
r
es
e
st
s
cor
co
o rellat
ati
a
tion
on coe
co
co
oefffif ciie
ient
nt t
body mass index
normal
al wei
ei t
eigh
eig
CARD
ARD
R IA
RD
A
sim
sim
mula
ulatio
tit n study
tio
bia
bia
iase
se
e
birt
bi
rrtth we
rth
weight
iig
ght
h
boyy
gr wtth
gro
Cox propo
Cox
opo
ort
rti
tiona
onal ha
aza
az
zarrd
ds
d
sm
mo
odel
ode
el
gir
g
irl
EPIC
EP
E
PIC
P
adu
du
dul
d
ulltt life
growth
h ffacto
act
acto
a
ctor
ct
coup
co
ou
o
uple
e
va
alilidi
diity
ty
fe
ffet
eta
et
all
lowe
lo
ower
wer rriisk
we
sk
body
dy
y mass
MED
DLINE
me
m
eta
a-a
-ana
ana
aly
ysis
oc
occ
o
cc
ccup
upa
u
p
pa
atio
ion
na
nal
al ex
expo
xpo
po
pos
os
sur
ure
re
e
cy
ccyc
yyc
clle
cle
e
Cal
alififo
forrni
rn
niia
n
rrad
ra
adiation
ad
iattion
n
RR
R
R
cons
co
nsum
sum
mpt
ption
iio
on IRIIRR
fatt
point es
e tima
t
te
DN
DNA
iin
nfa
fantt
fant
tum
tu
um
u
mo
orr
can
ca
ancer riisk
k
red
ed
educe
duc
uc d risk
uce
sk
diet
di
et
sskkin
pre
pr
preg
eg
gnanc
nanccy
na
dau
dau
aug
aug
ughter
hte
hter
e
supp
plemen
ple
mentt
fattyy aci
a d
en
e
enz
nzyym
me
leu
ukem
kemia
em
m
prete
pre
tter
e m deli
delive
liivve
e
erry cigarett
ery
ciga
ciga
i rett
rett
ret
ette
tttte smoke
sm
smoke
oke
ok
e
nutrien
nt int
nt
i ta
ak
ak
ake
gen
ene en
ene
nvi
vir
vi
ironmen
me t interaction
gene
ge
e
offffsspr
of
priiin
ng
increase
ed mortal
rta
aliity
al
hK
Korea
cor
oro
or
ona
nary event South
choles
sterrol
ol le
le
lev
evel
abdomina
nal obesity
na
y
E)
geneti
tiic
tic
ic effect
neural tube
ube
be defect
chromoso
mo me
fetu
ffe
tu
use
se
et s
etus
etu
Norwegia
ia
an mother fetu
child co
ohort
ho st
sstu
t dy
dy
gene
gen
ene-environ
nme
ment interaction
m
me
mend
elia
an random
rran
ization
va
var
va
arria
iian
an
a
nt
allele
llel
ele
e
samp
sam
ple siize
ple
misc
carri
arriage
age
a
ag
g
ge
e
miscla
asssi
sssiifi
ifiicatiion
on
ova
ovar
varrian
n canc
can
anc
an
n err
firs
irs
rsst bi
birth
h
tu
umo
um
orr fola
la
ate
te
del
d
eliive
ve
eryy
ery
prre
p
eg
gna
anccyde
supp
su
upp
pp
ple
pl
lem
le
men
en
nt
cig
iga
are
re
ett
ttte
t ssm
te
mokin
ng
g
w
wine
bo
b
ody
dy mass
ss in
nde
dex
loww--dens
density
ens
ns tyy lipo
lipoprot
poprot
protein
pro
ein
e
n
cal
allcium
cium
ciu
au
aut
a
utitism
u
sm
iinte
n
nte
tellig
ge
ence
nce
ars
a
rse
senic
se
ca
ccan
an
a
anccer
cerr m
mo
orta
orta
alit
iitty
bloo
bloo
bl
od pr
pre
esssu
sure
r
acci
acc
acci
cc
cc de
de
den
dent
en
ent
n
nt
ca
ccali
a b
brat
attio
on
o
n
in qua
in
ine
qu
ua
alit
ity
ty
suiiccid
sui
id
de
syyssto
sys
tol
o ic blo
blood
od p
pre
ressu
su
sure
ure
e
cardio
car
d vas
di
dio
vascul
culla
cula
arr mor
mo ta
ta
allity
ityy
AR
ARI
ARIC
carrd
ca
car
diio
dio
ovas
vasscu
cul
ullar eve
u
ev nt
aiir
air
heartt fa
ailurre
ankle brachi
ach
ch al index
win
wint
inte
er
mper
pera
pe
ature
pollu
po
utio
on tem
part
par
rtic
rt
iccle
ca
case
as -crossov
as
sover
sov
e design
partic
cula
u te
e mat
m ter
micr
crrog m
rec
rece
ece
ceipt
iipt
ip
p
pt
ef ortt
eff
s
se
seas
on ity
ona
onal
pois
ois
isson
s
acyclicc graph
distinct
di
i tinc
ncc ion
ader
ad
con
on
once
nce
ep
e
p
ptt read
scciien
sci
sc
enc
enc
ce
ide
idea
de
d
dea
e
ea
a
disc
isc
scipli
sc
ip
ipli
ipl
pline
pl
tthe
th
he
h
eor
ory id
wayy
succ
suc
uc
ccess
ccess
sss
cov
cov
vera
era
age
evol
evo
vo
olutio
uti
tiion
tio
act
act
HIV
V
rres
es
sou
sour
our
urce
e
va
vac
va
accin
c ne
H test
HIV
te ing
tes
ng
g
HIV/
HIV
IV/
V AIDS
DS
S
virus
vir
s
po
po
pol
olicy
icy
cyy
rural a
ru
area
dea
d
ea
e
ath
th ra
rate rura
se
easo
easo
on
o
n
ca
au
us
se mo
orttal
alititity
ty
iin
nfe
fect
fect
c io
ion
n
cohor
orrt
rt p
prrofi
ofiile
men
m
en
e
nttal
ta
a
all healt
he
h
ealtltth
sspe
sp
pe
p
ecif
cific
ci
ic mo
ic
mor
orrtal
o
alityy
communi
nititie
ni
itie
es stu
study
st
per
pe
e fo
ffor
o
orrman
ma
man
ma
ance
e
i
in
inse
ctic
cticide
t cid
ide
de
d
e
mean
an
ns
llififfe cour
urrse
se
pape
pa
per
m th
me
ho
od
d
viol
ollat
ation
ass
as
ssump
ssump
ss
ptio
tion
sc
s
ce
en
nar
arrio
io
errror
or
sett
t in
ing
g l se
re
esse
ear
arcch
hgoa
urin
rin
ne
pa
pare
arental
are
e t l ed
edu
du
d
ucatitition
on
n
s ula
sim
latio
t n
select
sel
se
ecction
io
on
n
med
m
e
ed
dia
iat
atttio
a
iio
on
ag
ge
g
en
ntt
grrrowth
g
gro
wtth
th
girrl
c ilildh
ch
dhoo
oo
od gir
he
h
eal
ea
alth stu
tudy
y
bias
bi
ass
study
st
stu
stud
t y re
ressult
ultt
form
rm
mula
u
alle
alle
erg
gy
lun
ung can
un
ung
an
ncce
err
t l ef
tota
to
effect
f ld
fifie
d alt
a te
ern
er
rrn
native
p valu
lu
ue
pes
p
pest
es icid
est
icid
ccide
e
cons
co
ons
nsum
umpt
ptio
pt
io
on
fru
fr
fru
ruitiitt
mo
m
oth
ther
e
b eas
bre
ast
as
a
s ca
cancer
nce
nc
cer
estrog
es
est
rog
og
gen
multivariatte rel
r ativ
a e risk
eattic
c ccance
a
r risk
sk
k
pancre
NIH-AARP
NI
NIH
AA
ARP die
diett
m ation
me
medi
na
analysis
large
e sstudy
ud
gene
ge
gene
e
pollym
ym
ymo
morp
rrph
ph
p
hism
sm
m
preeclam
cllampsia
clam
llam
ampsi
psia
p
ssi po
brea
rea
ea
e
asst cance
ancer ca
ancer
anc
case
se
e
genetic asso
ocia
ia
iat
a ion
on study
ep
epi
p dem
micc
outtbre
break
ak
k
infl
nflluenz
nfluenz
enz
za vvirus
r
H
H1N1
H1
H1N
Cardiovascular Disease
Environmental
Infectious Disease
Meta-analysis
Methodology
Nutritional
Reproductive and Perinatal
Figure 2. Mapping and clustering of terms in 5 high-impact epidemiology journals for A) 1974–1989, B) 1990–1995, C) 1996–2001, D) 2002–
2007, and E) 2008–2013. The maps show terms as labeled nodes. Some terms appear to be misspelled or truncated because of the tasks of linguistic processing that were performed before the mapping and clustering of terms, as described in the Methods section. Node size is proportional to
the term frequency of occurrence (i.e., the larger the node, the more articles include the term). Terms that are far away from each other do not or
rarely occur together in the same article, whereas terms with high co-occurrence are close to each other. The clustering of the terms is displayed on
top of the map by coloring nodes based on the cluster to which they belong. Clusters of terms are interpreted as major epidemiology topics, and
clusters located close to each other in the map indicate related topics.
Am J Epidemiol. 2015;182(2):93–104
Metaknowledge Analysis of Epidemiologic Topics 99
Table 1. Evolution of Major Topics in 5 High-Impact Epidemiologic
Journals and in a Subset of Articles Published in 6 High-Impact
General Medicine Journals, 1974–2013
Table 1. Continued
1974–1989
1974–1989
Topica
Epidemiology
Journals 3,725
Articles, 951
Terms, %
General
Medicine
Journalsb 8,602
Articles, 823
Terms, %
Infectious disease
epidemiology
36
24
Infection
14
13
Antibody
8
6
Topica
Epidemiology
Journals 3,725
Articles, 951
Terms, %
General
Medicine
Journalsb 8,602
Articles, 823
Terms, %
Dose
7
Efficacy
6
Improvement
4
Health care quality
16
Physician
6
7
Care
4
Outbreak
6
Survey
4
Illness
6
6
Cost
3
5
Service
Virus
Prevalence
United States
3
1990–1995
4
Cardiovascular disease
epidemiology
24
Man
15
Smoking
7
Blood pressure
6
Cigarette smoking
5
Adjust
5
16
3
Risk factor
6
Relative risk
3
Diabetes
3
Cohort
2
Cancer epidemiology
15
Mortality
12
Cancer
11
Epidemiology
Journals 3,948
Articles, 1,116
Terms, %
19
Infectious disease
epidemiology
32
15
Infection
12
14
Antibody
5
5
HIV
5
7
Italy
4
Sensitivity
4
Detection
4
HIV infection
Cardiovascular disease
epidemiology
3
21
Mortality rate
4
Body mass index
Death rate
3
Cigarette smoking
6
Lung cancer
2
Blood pressure
5
Coronary heart disease
5
5
Therapy
14
General
Medicine
Journalsb 6,434
Articles, 903
Terms, %
18
8
Survival
5
Diabetes
Cell
4
Case control study
5
Chemotherapy
3
Baseline
5
Recipient
3
Adjust
5
5
Smoking
4
Hypertension
4
Reproductive and
perinatal epidemiology
12
Case control study
12
Relative risk
7
Confidence interval
6
Pregnancy
4
4
Infant
4
5
Mother
3
Birth
2
Delivery
2
Cancer epidemiology
18
Cancer
10
Approach
5
Mortality rate
4
Validity
3
Example
3
Reproductive and
perinatal epidemiology
14
8
4
Clinical trials
20
Pregnancy
6
Trial
10
Mother
5
3
Birth
5
4
Placebo
8
Table continues
Am J Epidemiol. 2015;182(2):93–104
Table continues
100 Trinquart and Galea
Table 1. Continued
Table 1. Continued
1990–1995
Topica
Epidemiology
Journals 3,948
Articles, 1,116
Terms, %
Infant
5
Smoker
4
1996–2001
General
Medicine
Journalsb 6,434
Articles, 903
Terms, %
5
Screening
5
Epidemiology
Journals 4,492
Articles, 1,346
Terms, %
Topica
Strategy
4
Survival
8
Cell
4
Nutritional epidemiology
8
Sensitivity
Consumption
8
21
Diet
5
Cardiovascular disease
epidemiology
Alcohol
4
Body mass index
10
Correlation
4
Food
3
Female cancer
epidemiology
4
Baseline
8
General
Medicine
Journalsb 6,891
Articles, 1,009
Terms, %
29
5
Physical activity
5
Hypertension
4
Alcohol consumption
4
4
Breast cancer
4
Diabetes
Parity
2
Case control study
5
Family history
2
Infant
4
Oral contraceptive
2
Birth
Menopause
1
Cancer epidemiology
Health care quality
27
5
4
15
Lung cancer
3
Care
9
Cigarette
3
Survey
8
Nonsmoker
3
Practice
6
Cancer registry
2
Questionnaire
6
Cancer risk
Physician
6
Reproductive and
perinatal epidemiology
Clinical trials
21
Therapy
14
Trial
14
Placebo
9
Efficacy
9
Dose
8
Cardiovascular disease
epidemiology
12
2
13
Pregnancy
7
Birth
7
Mother
5
Infant
5
Breast cancer
4
Nutritional epidemiology
10
Consumption
7
Sensitivity
5
Diet
4
Myocardial infarction
4
Validity
3
Stroke
3
Error
3
Specificity
3
Agreement
2
2
Environmental
epidemiology
6
Acute myocardial infarction
1996–2001
Epidemiology
Journals 4,492
Articles, 1,346
Terms, %
Infectious disease
epidemiology
36
Infection
11
Approach
5
HIV
4
Antibody
4
General
Medicine
Journalsb 6,891
Articles, 1,009
Terms, %
25
Asthma
2
Air pollution
2
Season
2
Respiratory symptom
1
Respiratory disease
1
Health care quality
13
6
9
Quality
7
Survey
7
Practice
Table continues
25
Care
7
Table continues
Am J Epidemiol. 2015;182(2):93–104
Metaknowledge Analysis of Epidemiologic Topics 101
Table 1. Continued
Table 1. Continued
1996–2001
Topica
Epidemiology
Journals 4,492
Articles, 1,346
Terms, %
2002–2007
General
Medicine
Journalsb 6,891
Articles, 1,009
Terms, %
Physician
7
Clinical trials
22
Trial
21
Placebo
11
Efficacy
10
Dose
7
Double blind
6
2002–2007
Epidemiology
Journals 4,180
Articles, 1,236
Terms, %
Nutritional epidemiology
24
Consumption
9
Diet
3
Inverse association
3
Incident case
2
Lower risk
2
Cardiovascular disease
epidemiology
General
Medicine
Journalsb 6,283
Articles, 978
Terms, %
22
Topica
Reproductive and
perinatal epidemiology
Pregnancy
Epidemiology
Journals 4,180
Articles, 1,236
Terms, %
General
Medicine
Journalsb 6,283
Articles, 978
Terms, %
10
7
Mother
6
Infant
4
Birth weight
4
Offspring
3
Environmental
epidemiology
7
Asthma
2
Air pollution
2
Nonsmoker
2
Susceptibility
2
Season
2
Health care quality
20
28
Practice
7
Health
7
Survey
6
Research
6
Problem
5
10
4
Clinical trials
Cardiovascular disease
6
5
Placebo
Weight
5
Controlled trial
12
Coronary heart
disease
5
Hazard ratio
10
Efficacy
10
Height
5
Body mass index
Dose
23
12
8
Stroke
5
Meta-analysis
13
Hypertension
5
Quality
10
Case control study
4
Review
6
Medline
6
Systematic review
6
Meta analysis
6
Infectious disease
epidemiology
20
Infection
8
Mortality rate
3
Transmission
3
HIV
3
Setting
2
4
Gene
4
Cell
4
Progression
3
Vaccine
3
Methodology
Cost-benefit analysis
2
Cost
5
Dollar
2
Cost effectiveness
2
Life year
2
Life expectancy
1
2008–2013
Epidemiology
Journals 4,550
Articles, 1,354
Terms, %
16
Approach
7
Epidemiology
5
Bias
5
Paper
5
Problem
4
Table continues
Am J Epidemiol. 2015;182(2):93–104
Cardiovascular disease
epidemiology
23
Body mass index
12
General
Medicine
Journalsb 6,159
Articles, 1,040
Terms, %
18
5
Table continues
102 Trinquart and Galea
Table 1. Continued
Table 1. Continued
2008–2013
Topica
Epidemiology
Journals 4,550
Articles, 1,354
Terms, %
2008–2013
General
Medicine
Journalsb 6,159
Articles, 1,040
Terms, %
Epidemiology
Journals 4,550
Articles, 1,354
Terms, %
Topica
General
Medicine
Journalsb 6,159
Articles, 1,040
Terms, %
Height
5
Season
2
Childhood
3
Temperature
1
Cause mortality
3
Hospital admission
1
Blood pressure
3
Meta-analysis
3
13
9
Cardiovascular disease
5
Meta analysis
5
Prospective cohort study
4
Gene
4
Smoking
4
Systematic review
3
3
Gene
8
Genotype
3
Nutritional
epidemiology
20
Polymorphism
2
8
Hazard ratio
10
Randomized controlled
trial
7
Consumption
5
Medline
Diet
3
Database
Health study
3
Global health
38
3
Country
11
Smoker
Infectious disease
epidemiology
19
Research
8
Epidemiology
6
Infection
6
Issue
4
Article
4
Methodology
15
Method
13
Approach
10
7
Prevalence
9
Trend
6
Survey
6
Cost
6
Clinical trials
20
Hazard ratio
15
Therapy
15
Week
13
Placebo
12
Clinical trial
11
Bias
7
5
Cardiovascular disease
epidemiology
13
Problem
5
Hazard ratio
15
Design
Reproductive and
perinatal epidemiology
11
Pregnancy
8
Mother
5
Childhood
4
Birth weight
3
Infant
3
Environmental
epidemiology
7
Air pollution
3
Particulate matter
2
Table continues
were related to infectious diseases except the words “systolic”
and “diastolic,” which were related to the epidemiology of cardiovascular diseases. The word “seropositive” showed the
most intense burst. After 2000, there was no clear time pattern
of burst. Some terms belonged to a single topic. For instance,
Stroke
7
Significant difference
6
Myocardial infarction
6
Cause mortality
5
Abbreviation: HIV, human immunodeficiency virus.
a
Data are clusters of terms interpreted as major topics (with the
percentage of terms within each cluster) and the top 5 terms within
each cluster (with the percentage of articles including each term).
The major topics are ordered by their importance in the epidemiologic
journals and then in the general medicine journals.
b
For the 6 high-impact medical journals, we retrieved articles most
likely of relevance to the field of epidemiology by using a custom
search filter.
the words “ambient” and “particulate” were exclusively related
to environmental epidemiology. Eight terms were related to
genetics and meta-analysis. Two of the terms with the most intense bursts (“gestational” and “preterm”) were related to reproductive and perinatal epidemiology. A majority of burst
Am J Epidemiol. 2015;182(2):93–104
Metaknowledge Analysis of Epidemiologic Topics 103
words were related to methodology (e.g., “pathway,” “Bayesian,”
“causal,” “mediation,” and “simulation”). Finally, approximately one-third of burst words did not belong to any particular cluster, but most of them were related to methodology
(e.g., “multilevel,” “P for trend,” “Cox,” “heterogeneity,”
“modeling,” and “confounder”).
Ancillary analysis of high-impact general medicine
journals
Our search retrieved 32,760 articles most likely of relevance to the field of epidemiology that were published in
the 6 general medicine journals. Web Figure 3 shows the
mapping and clustering of terms over time. Table 1 shows a
summary of the clusters of terms and allows comparison with
articles from epidemiologic journals.
Some topics were similar to those identified in epidemiology journals and showed similar evolution. Cardiovascular
diseases and infectious diseases were among the main topics
over the 5 time periods, except the last time period for infectious diseases. We identified a cancer cluster in the 1974–
1989 period and a reproductive and perinatal health cluster
over the 1974–1989 and 1990–1995 periods. A cluster related to meta-analysis appeared during 2002–2007 and persisted in the subsequent period.
Other topics differed from those identified previously. One
cluster corresponding to clinical trial terms (in multiple clinical specialties) was identified over the 5 periods. Another
cluster related to health care quality was identified over the
periods from 1974 to 2007. Lastly, a cost-benefit analysis
cluster was identified in 2002–2007, and a global health cluster was identified in 2008–2013.
DISCUSSION
We analyzed 20,895 articles published in 5 epidemiology
journals over 4 decades using a production-oriented approach
to investigate the epistemic core of epidemiology. We found a
clear pattern of leading areas of epidemiologic inquiry during
this period and patterns in the evolution of these areas. We
found, first, that the epidemiology of infectious and cardiovascular diseases have consistently been the main topics of interest
in these 5 journals. Second, cancer epidemiology has been a
major topic, with a peak in knowledge production in 1990–
1995, where 2 clusters related to cancer and female cancer
were identified but stopped being a leading focus of papers in
the 5 epidemiologic journals after 2001. Third, nutritional epidemiology gained importance over time. Fourth, 3 topics were
among the leading areas of inquiry for time-delimited periods,
namely environmental epidemiology since 1996, whereas methodology and meta-analysis in genetics appeared in 2008–2013.
Because we focused our inquiry on these 5 leading epidemiology journals, we can interpret our findings as representing
knowledge produced and regulated through peer review principally by epidemiologists and shaped by editorial processes in
line with the leading epidemiologic organizations. The American Journal of Epidemiology is published in association with
the Society for Epidemiologic Research, the International
Journal of Epidemiology is published on behalf of the International Epidemiological Association, Epidemiology is the offiAm J Epidemiol. 2015;182(2):93–104
cial journal of The International Society for Environmental
Epidemiology, and the Annals of Epidemiology is the official
publication of the American College of Epidemiology. By design, this analysis, excludes papers published by epidemiologists, or papers published by nonepidemiologists that would
nonetheless be considered within the field’s remit, that were
published in nonepidemiologic journals. There is little question that such papers thrive, particularly in clinical journals.
So, for example, the decrease in focus on cancer in these 5
journals over the past decade represents most likely a shift in
where these papers are being published—away from epidemiology journals to cancer journals. We would argue, however,
that there is consequence to publishing the relevant papers
in the leading journals in the discipline. Epidemiologists
are, in many respects, the keepers of the methodological flame
in population health sciences. If cancer epidemiology is evolving in nonepidemiology journals, it represents a tremendous
lost opportunity for the field to make the contribution it can
and should make to one of the leading global causes of death.
Therefore, although our observations are in some ways heartening, reinforcing that we are focusing on cardiovascular disease commensurately with the contribution of cardiovascular
disease to burden of mortality, they also suggest that the discipline is playing a far smaller role in other areas that are also
important. For example, although several new areas have gained prominence in the field over the past decades, including social epidemiology, these are clearly not represented among the
key areas in these 5 leading epidemiology journals over the
time period of interest (21–23). Moreover, areas such as injury,
psychiatric, or neurological epidemiology are clearly not among
the main topics identified; this is dissonant with the importance
that these areas have for global burden of disease (24–26).
Within a consequentialist epidemiology framework, it would
certainly stand the field in good stead if we engaged actively
around inquiry concerning the major causes of morbidity and
mortality worldwide, with an eye to how we may prevent disease and improve health (27–29).
The increasing role that methodological papers play in
publication in epidemiology journals over the past decade
presents both opportunities and challenges. In some respects,
this evolution represents an evolution in the field, wherein
methods for epidemiology are being developed principally
by epidemiologists. This reflects a maturity in the field, moving well beyond its origins where methods in the discipline
emerged from other areas (8). However, it also suggests that
the field takes upon itself greater responsibility, both to keep
developing methods that are adequate to the evolving population health challenges we face and to ensure flexibility to
the incorporation of methods that do arise in other areas that
may be fruitful for epidemiology to adopt.
These observations also have implications for our educational programs and how we train the next generation of epidemiologists. If the leading epidemiology journals focus
insufficiently on significant areas of population health, we
as a discipline may fall short on our self-definition and our
promise as a field. This stands both to change the composition of those who are attracted to the discipline and potentially to influence the structural factors (such as promotion
expectations and criteria) that stand to reinforce our areas
of focus and growth in the field going forward.
104 Trinquart and Galea
Our analysis has limitations. First, our results and interpretation depended on our selection of epidemiology journals. A
different list of journals could be considered. For instance, the
Public Health/Health Administration Section of the Medical
Library Association considered 10 journals as “essential for a
collection that supports a program with subject specialization
in this area” (30, p. 572). Moreover, many epidemiologic articles are published in nonepidemiologic journals. However,
in our ancillary analysis of 6 high-impact general medicine
journals, we found patterns of epidemiologic papers that
were consistent overall with the findings in epidemiologic
journals, which suggests that the trends observed here hold
across the discipline. Second, our results correspond to a
macroscopic rather than microscopic mapping of the discipline in the sense that we may have missed subtle regularities
in the objects of research. We were, in fact, interested by the
identification of main topics, suggesting that our approach
suited our purpose. We may have missed the exact dynamics
of appearance or disappearance of these main topics because
our categorization of articles aimed for an approximately
equal number of articles in each time period.
In sum, we identified the major topics in 5 high-impact journals of epidemiology, and we analyzed the trends of these
main topics. This allowed for an empiric perspective on the
discipline’s past, with an eye to its future. Our metaknowledge
investigation, which relied on freely accessible data sources
and free software, is replicable. Monitoring the evolution of
the science of epidemiology may help inform our efforts to
consider appropriate recalibration of the field’s scope.
ACKNOWLEDGMENTS
Author affiliations: Department of Epidemiology, Mailman
School of Public Health, Columbia University, New York,
New York (Ludovic Trinquart); and School of Public Health,
Boston University, Boston, Massachusetts (Sandro Galea).
Conflict of interest: none declared.
REFERENCES
1. Lilienfeld DE. The general epidemiologist: Is there a place in
today’s epidemiology? Am J Epidemiol. 2007;166(1):1–4.
2. Armenian HK. Epidemiology: a problem-solving journey. Am J
Epidemiol. 2009;169(2):127–131.
3. Ness RB, Andrews EB, Gaudino JA Jr, et al. The future of
epidemiology. Acad Med. 2009;84(11):1631–1637.
4. Bhopal R, Macfarlane GJ, Smith WC, et al. What is the future of
epidemiology? Lancet. 2011;378(9790):464–465.
5. Pearce N. Epidemiology in a changing world: variation,
causation and ubiquitous risk factors. Int J Epidemiol. 2011;
40(2):503–512.
6. McKeown RE. Is epidemiology correcting its vision problem?
A perspective on our perspective: 2012 presidential address for
American College of Epidemiology. Ann Epidemiol. 2013;
23(10):603–607.
7. Khoury MJ, Lam TK, Ioannidis JP, et al. Transforming
epidemiology for 21st century medicine and public health.
Cancer Epidemiol Biomarkers Prev. 2013;22(4):508–516.
8. Morabia A. A History of Epidemiologic Methods and Concepts.
Basel, Switzerland: Birkhäser Verlag; 2004.
9. Buck C, Llopis A, Najera E, et al. The Challenge of
Epidemiology. Issues and Selected Readings. Washington, DC:
Pan American Health Organization; 1988.
10. Evans JA, Foster JG. Metaknowledge. Science. 2011;
331(6018):721–725.
11. Griffiths TL, Steyvers M. Finding scientific topics. Proc Natl
Acad Sci U S A. 2004;101(suppl 1):5228–5235.
12. ISI Web of Knowledge. 2012 Journal Citation Reports Science
Edition. http://admin-apps.webofknowledge.com/JCR/JCR.
Thomson Reuters; 2014.
13. van Eck N, Waltman L. Text Mining and Visualization Using
VOSviewer. Leiden, The Netherlands: Centre for Science and
Technology Studies, Leiden University; 2012.
14. Borg I, Groenen P. Modern Multidimensional Scaling. 2nd ed.
New York, NY: Springer; 2005.
15. Van Eck NJ, Waltman L. Bibliometric mapping of the
computational intelligence field. Int J Unc Fuzz Knowl Based
Syst. 2007;15(5):625–645.
16. Newman MEJ, Girvan M. Finding and evaluating
community structure in networks. Phys Rev E. 2004;69(2):
026113.
17. Waltman L, van Eck NJ, Noyons ECM. A unified approach to
mapping and clustering of bibliometric networks. J
Informetrics. 2010;4(4):629–635.
18. van Eck NJ, Waltman L. Software survey: VOSviewer, a
computer program for bibliometric mapping. Scientometrics.
2010;84(2):523–538.
19. Kleinberg J. Bursty and hierarchical structure in streams. Data
Min Knowl Discov. 2003;7(4):373–397.
20. Mane KK, Börner K. Mapping topics and topic bursts in
PNAS. Proc Natl Acad Sci U S A. 2004;101(suppl 1):
5287–5290.
21. Berkman L, Kawachi I. Social Epidemiology. New York, NY:
Oxford University Press; 2000.
22. Cwikel J. Social Epidemiology: Strategies for Public
Health Activism. New York, NY: Columbia University Press;
2006.
23. O’Campo P, Dunn J. Rethinking Social Epidemiology: Towards
a Science of Change. New York, NY: Springer; 2011.
24. Vos T, Flaxman AD, Naghavi M, et al. Years lived with
disability (YLDs) for 1160 sequelae of 289 diseases and
injuries 1990–2010: a systematic analysis for the Global
Burden of Disease Study 2010. Lancet. 2012;380(9859):
2163–2196.
25. Murray CJ, Vos T, Lozano R, et al. Disability-adjusted life
years (DALYs) for 291 diseases and injuries in 21
regions, 1990–2010: a systematic analysis for the Global
Burden of Disease Study 2010. Lancet. 2012;380(9859):
2197–2223.
26. Whiteford HA, Degenhardt L, Rehm J, et al. Global burden of
disease attributable to mental and substance use disorders:
findings from the Global Burden of Disease Study 2010.
Lancet. 2013;382(9904):1575–1586.
27. Galea S. An argument for a consequentialist epidemiology. Am
J Epidemiol. 2013;178(8):1185–1191.
28. Cates W Jr. Invited commentary: consequential(ist)
epidemiology: let’s seize the day. Am J Epidemiol. 2013;
178(8):1192–1194.
29. Galea S. Galea Responds to “consequential(ist) epidemiology:
finally”. Am J Epidemiol. 2013;178(8):1195–1196.
30. Ascher M. Journals, epidemiological. In: Boslaugh S, ed.
Encyclopedia of Epidemiology. Thousand Oaks, CA: SAGE
Publisher, Inc.; 2008:572–573.
Am J Epidemiol. 2015;182(2):93–104