session 5

Course:
EPIB 679-001 Clinical Epidemiology
Date:
May 9 to June 3
8:35 – 11:40
Session 5: Cohort studies
Learning objectives
• To understand the concepts of the different
study designs
• To learn the advantages and disadvantages
of the different study designs
Dr. J. Brophy
Epidemiology Definition
Basic Question of Epidemiology
• “The study of the distribution and
determinants of health-related states or
events in specified populations, and the
application of this study to control of health
problems” (Last, 2004)
Epidemiology
Epidemiology is:
(1) health and medically oriented
(2) based on POPULATIONS
(3) statistical in nature (quantitative)
(4) multi-disciplinary
Epidemiology
• Epidemiologists are:
– Scientists
– Population Doctors
– Match-makers (bring together various disciplines
to answer health-related questions)
1
Choice of study design
Validity & Reproducibility
Biased but reliable
Definition: Reliability
Accuracy
• The degree of stability exhibited when a
measurement is repeated under identical
conditions.
• Synonyms: Repeatability, reproducibility
• Antonym: Uncertainty
Valid and reliable
Valid but unreliable
OVERVIEW OF STUDY DESIGNS
Study designs - RCTs
POPULATION LEVEL
• Ecologic or Correlational
studies
DESCRIPTIVE
• Randomized controlled trials (RCTs) as the
“gold standard”
INDIVIDUAL LEVEL
GOAL
• Hypothesis generating
• Resource allocation
• Educational needs
• Drug utilization studies
• Case reports / series
• Cross-sectional surveys
OBSERVATIONAL
TYPES OF
STUDIES
Cohort studies
• Prospective vs retrospective
• Field vs database studies
Nested case-control studies
– Short follow-up
Case-cohort studies
– Highly selected populations
Case-control studies
• Prospective vs retrospective
• Field vs database studies
GOAL
Case-crossover studies
• Hypothesis testing
• Provide evidence to
establish causality
• Prospective vs Retrospective
• Field vs database studies
EXPERIMENTAL /
INTERVENTIONAL
– Small sample size
• Retrospective
• Field vs database studies
• Retrospective
• Field vs database studies
ANALYTICAL
• Limitations of RCTs
– Highly controlled conditions
• Justification for RCT = clinical equipoise
Randomized controlled trials
• Prospective
• Field
2
OVERVIEW OF STUDY DESIGNS
Randomized controlled trials
POPULATION LEVEL
• Ecologic or Correlational
studies
DESCRIPTIVE
INDIVIDUAL LEVEL
• Strongest level of evidence … Caution! Not
necessarily bias free
• Distinguishing feature ; exposure is
randomly assigned by investigator
GOAL
• Hypothesis generating
• Resource allocation
• Educational needs
• Drug utilization studies
• Case reports / series
• Cross-sectional surveys
OBSERVATIONAL
TYPES OF
STUDIES
– Comparing effect of drug only … Caution!
• Retrospective
• Field vs database studies
Case-cohort studies
• Retrospective
• Field vs database studies
ANALYTICAL
Case-control studies
• Prospective vs retrospective
• Field vs database studies
GOAL
Case-crossover studies
• Hypothesis testing
• Provide evidence to
establish causality
• Prospective vs Retrospective
• Field vs database studies
EXPERIMENTAL /
INTERVENTIONAL
Descriptive studies
• Prospective vs retrospective
• Field vs database studies
Nested case-control studies
• Major strength
– Randomization ; all differences between groups
are balanced across treatment arms
Cohort studies
Randomized controlled trials
• Prospective
• Field
Descriptive study designs
• Use of drugs & distribution of outcomes
– Person
– Time
– Place
• Goal
– Resource allocation & education
– Hypothesis generating
Descriptive studies
• Types of descriptive studies
– Population level information
– Individual level information
• Strengths
Analytical studies
• Types of analytical studies
– Interventional / Experimental
– Observational
– Relatively quick, easy & inexpensive
• Cohort studies
– Provide justification for more expensive study
• Nested case-control studies
• Limitations
– Cannot establish causality
Strongest
• Randomized controlled trials
Weakest
• Case-control studies
• Case-crossover
• Major difference
3
Observational studies
Definition: Cohort
• Distinguishing feature ; investigator does
not control the exposure
• From the Latin cohors – warriors, the tenth part of a
legion.
Any group of persons (usually sharing some common
characteristic) who are followed-up or traced over
a period of time.
• Types of observational studies
– Cohort studies
– Case-control studies
• Classification
– Prospective vs retrospective study
– Field vs database study
Schematic of a Cohort Study
Populations and General Design
Overarching population (universe) that
we would like to make inferences
Select
Ascertain
Exposure
Target population
to which inferences
are drawn
Source population
Source of persons without
outcome (sampling frame)
Nonparticipants
Participants
Outcome
Past
Present
Exposed (dynamic)
cohort
Non-exposed (dynamic)
or reference population
Assessment of Outcome
Assessment of Outcome
Time
Cohort studies
Comparison
• Group that shares a common experience
• Subjects classified on the basis of exposure
status
• Longitudinal studies ; followed for a
specified period of time until events occur
• Distinguishing feature ; compare rates of
events/outcomes by exposure group
4
Cohort studies
• If rate of event among exposed > rate of
event in unexposed = harmful drug
• If rate of event among exposed < rate of
event in exposed = protective drug
Cohort design
Cohort studies
• Strengths
– Can study rare exposures
– Can study multiple outcomes
– Temporality is assured ; causality criteria
– Unbiased selection of comparator group
– Retrospective studies are relatively quick and
inexpensive … caution re: bias
Cohort studies
Potential problems
• Limitations
– Inefficient for rare events/diseases or outcomes
with long induction periods
– If prospective ; expensive and time consuming
• Sources of bias
– Non-participation (selection in)
– Losses to follow-up (selection out)
– Recall / interviewer bias if retrospective
5
Definition: Bias
•
Bias: Deviation of results or inferences from
the truth, or processes leading to such
deviation.
1. Systematic variations of measurements from their true
values (systematic error; antonym, validity)
2. Variations of statistics from their true values as a result
of systematic variation of measurements, other flaws in
data collection, or flaws in study design and analysis.
Biases
• Misclassifcation
• Selection (chanelling)
• Losses to follow-up (correlated to exposure
and disease)
• Effect of non-participation
Antonym: Validity
Selection bias
Confounding bias
Intervention
Outcome
Confounder
•
•
•
•
•
•
•
Channeling Effect (or Channeling Bias):
Age
Sex
Stage of disease
Previous treatments
Genetics
Behaviour
Others
Effect modification
• The tendency of clinicians to prescribe
treatment based on a patient’s prognosis. As
a result of the behavior, comparisons
between treated and untreated patients will
yield a biased estimate of treatment effect.
6
Key Questions
Comparisons
Comparisons
Key Questions
All patients
Restricted cohort
7
Key Questions
Strengths of cohort studies
• Useful if exposure is rare
• Can examine multiple effects of a single
exposure
• Can elucidate temporal relationship
• If prospective, minimizes ascertainment bias
• Allows direct measurement of disease
incidence in both exposed and non-exposed
groups
Limitations of cohort studies
• Inefficient for rare diseases
• Can be expensive and time consuming if
prospective
• If retrospective, need reliable records
• Validity affected by losses to follow-up
Miasmata Theory
Cholera in London in the mid-1800s: John
Snow and the Beginnings of Epidemiology
• Thought that cholera was brought to Europe
from India
• Prevailing theory in the 1880s: airborne
poison arising from unhealthy and unsanitary
conditions (“miasmata”)
– Miasma: noxious exhalations from putrescent
organic matter; poisonous effluvia or germs
infecting the environment
8
Hypothesis
Snow’s Experimentum Crucis
Water Supply from Polluted Thames River:
Southwark & Vauxhall Co.
Lambeth Co.
• Higher rates in the south because water
companies drew water from the polluted
Thames River
Natural experiment:
In 1852, Lambeth changed its source
to a less polluted part of The Thames
1849
Epidemiology
•
1854
Epidemiology
Unit of observation is mixed:
1. Numerator - the individual: fact, date, cause of
death, and water company
Water company obtained from detailed inquiry or
test of water for concentrations of NaCl
• Exposures to the “causal agent”: inferred to be
related to the water supply
– Thus, the company that supplied the water is a surrogate
variable
• Use of “company” is referred to as an “ecological”
variable
– Every individual and home so classified is assumed to have
the same exposure (homogeneity of exposure)
2. Denominator – the number of homes (not
individuals) served by each company
•
Relatively low rates of
cholera in London
1854 epidemic: Snow determined no. of homes
served by each company
Collected death reports and classified deaths
by water company
Calculated ratios of deaths to no. of homes,
by water company
• This study would probably now be referred to as an
“ecological study”
Statistic: Ratio=Numerator/Denominator
(unit: persons/homes)
– not a proportion (unitless)
Deaths from Cholera per 10,000 Homes, by Source of Water
Supply, London, 1854
Company
Number of
homes
served
Deaths from
cholera
Deaths
/10,000
homes
Ratio
Difference
Rest of London
256,423
1,422
55.5
1
0
Lambeth
26,107
98
37.5
0.7
-18.0
Southwark &
Vauxhall
40,046
1,263
315.4
5.7
+259.9
Estimate of average no. of deaths per home
Broad Street Pump Episode
• Another detailed cluster investigation by
Snow
• Occurred at the end of August 1854
• Attributed source: polluted well water
contaminated from an adjoining cesspool
which was contaminated with water from a
young girl who apparently had the cholera
ratio=8.4
9
Pump Handle Removed
Contingency Table: Mortality from Cholera in the
Broad Street, Aug. 31-Sept. 2 (Whitehead’s observations: Shephard, p. 224)
Broad Street Pump - Number of Deaths in 1854
Pump handle
removed (Sept 8)
140
Acknowledged to have:
No. of deaths
120
Drank water Did not drink Total
water
100
80
Cholera
80
20
100
No cholera
57
279
336
Total
137
299
436
Total deaths=573
60
40
20
0
Date (from Aug 31, 1854)
Relative Risk as a Measure of Association
Risk of dying from cholera:
drank water:
80 ÷ 137=0.584
did not drink water: 20 ÷ 299 = 0.067
Relative risk (RR) = 0.584 ÷ 0.067 = 8.72
RRs and ORs
• The OR>>RR because the disease is not rare
– i.e., the risk of dying is 100/436=23%
• For small risks (∼≤0.05), OR~RR
Odds Ratio as a Measure of Association
Odds of dying from cholera:
drank water:
80 ÷ 57=1.40
did not drink water: 20 ÷ 279 = 0.072
Odds ratio (OR) = 1.40 ÷ 0.072 = 19.6
Classic cohort studies
• British Doctors Cohort
• Framingham
• Harvard Nurses’ Health Study
10
British Doctors Cohort
Follow-up for mortality
Design
More
quests
Example: British Doctors Cohort Study
1951
1957
1966
1972
time
Questionnaires on
smoking habits to 59,600
male & female
physicians - 34,440
responded 1st quest.
Response~69%
British Doctors Cohort
British Doctors Cohort
• Overarching population (universe): entire
population
• Target population: Men and women, age >20,
in 1951
• Source population: British MDs, age >20, in
1951
• Exposure: Smoking information from subjects
based on a short postal questionnaire
– Current smokers
• Age started smoking
• Amount consumed currently
• Method of smoking
– Past smokers
• Same as above
• Date stopped smoking
– Sampling frame: Medical register of MDs
– Never smoked regularly (<1 cigarette/year for
one year)
British Doctors Cohort
• Outcome:
– Mortality ascertained by looking-up death
certificates
– Cause of death is filled in by a physician or the
coroner
• Analysis:
– Compare rates of death according to level of selfreported smoking
Typical Questions about Smoking
•
•
•
•
Type of smoking (cigarettes, cigars, pipes)
Have you ever smoked regularly?
How old were you when you started to smoke?
How many cigarettes per day do you smoke
now?
• If you stopped completely, how long ago was
this?
11
Metrics of Exposure
Metrics of Exposure to Tobacco Smoke
• The following indices can be estimated:
–
–
–
–
–
–
Type of smoking (cigarettes, cigars, pipes)
Duration (time since starting)
Time since quiting
Average Intensity (e.g., no. of cigarettes/day)
Frequency (e.g., percent time smoked in a week)
Current smoking status
• Cumulative exposure: frequency of
smoking x intensity x duration
– E.g., 1 pack per day x 20 cigarettes/pack x 365
days/year x 30 years= 219,000 cigarette-days=30
pack-years
• Lagged cumulative exposure (e.g., excluding last 10
years of smoking)
Definitions: Exposure and Dose
British Doctors
• Exposure: The presence of a substance in
the environment external to the subject
(external/environmental)
• Amount smoked at time of administration of first
questionnaire:
• Dose: The amount of a substance that
reaches susceptible targets in the body
(internal)
• These groups represent sub-cohorts defined by
exposure at time of entry into the study
• However, information obtained during follow-up can
change exposure status, so these sub-cohorts would
not be fixed
British Doctors Cohort: Men
British Doctors Study: Lung Cancer in Men among
Current Smokers from Data Obtained at Last
Questionnaire
Survey period
1st Quest
2nd Quest
3rd Quest
4th Quest
Known to have
died
N/A
3122
7301
10634
Presumably alive
N/A
31318
27139
23806
Replied
40,637 (69%)
30,810 (98.4%)
26,163 (96.4%)
23,299 (97.9%)
Reasons for
nonresponse
18,963
508
1156
507
Too ill
NA
31
65
21
Refused
NA
36
63
102
Not found
NA
72
403
22
Other
NA
369
445
362
Non-smokers
Current: 1-14 cigs/day
15-24
≥ 25
Age-standardized
death rate (10-5)
Mortality Rate Ratio
Non smokers
10
1
Cigarettes only
140
14 (=140/10)
Pipe &/or cigars
58
5.8 (=58/10)
Mixed
82
8.2
1-14
78
7.8
15-24
127
12.7
>25
251
25.1
Cigarettes only (No. per day)
12
Nested Case-Control Studies
• Sub-study that is based on an explicit cohort
• Motivation:
– Computational ease for large datasets
– Require additional information not already
collected
• To reduce costs, a sample of subjects from the original
cohort is taken
Synonyms
• Case-control-within-cohort studies
• Incidence density sampling studies
• Synthetic case-control studies
• Case-control studies are also referred to as
case-referent studies
Incidence Density Sampling
No.
1
2
3
4
5
6
7
8
Time for 1st failure
Incidence Density Sampling
Risk set for 2nd failure
1. For each failure time (T) of each case,
define all subjects who at that time are still
at risk of developing the outcome
–
–
Risk set for 1st failure
Time for 2nd failure
The complete set of such subjects is called the
risk set for the case
Will exclude all subjects who before T were:
•
•
Censored
Failed
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Time
Incidence Density Sampling
2. Randomly select without replacement a
sample of “controls” from the risk set
•
•
•
•
These subjects are therefore “matched” to the
case by time of event
Other matching variables can be used so that
the sampling is stratified; e.g., select only a
random sample of women
If a potential control eventually becomes a case,
he is still at-risk at the time of the event
A fixed number of controls can be selected; that
number can vary from risk set to risk set
Incidence Density Sampling
3. The analysis of these data is similar to the
stratified analysis used in the M-H
procedure for rates
4. The strata are now defined as each
selected risk set.
13
Incidence Density Sampling
5. The measure of association is the odds
ratio. With this sampling strategy and a
matched analysis, it provides an unbiased
estimate of the rate ratio.
•
A matched analysis is one that accounts
explicitly for the matching during the fieldwork
Incidence Density Sampling
Incidence Density Sampling
6. The estimated OR will have more variability
than the full M-H cohort analysis because
fewer subjects are included
7. There is no need to calculate person-years
in this analysis. It is subsumed
automatically in the sampling.
Examples
8. Odds ratios in each risk set are not
calculated; rather a summary estimate
across all risk sets is obtained.
•
This assumes that the rate ratio does not vary by
time (proportional hazards assumption).
Equivalently, the OR across strata (matched
subjects) are ~ equal (homogeneous).
9. Only risk sets that are discordant on
exposure contribute information
Background
• Stenting common Rx for CAD symptoms
• Statin therapy improves survival in secondary
prevention in conservatively treated patients
• Is the same benefit present following
stenting?
14
Methods
•
•
•
•
4,520 patients < 80
Examined 1 year mortality
3,585 with statins on discharge
935 no statins on discharge
Typical RCT
Results
•
•
•
•
Mortality 2.6% statins, 5.6% no statins
Unadjusted OR 0.46 (95% 0.33 – 0.65)
Adjusted OR 0.51 (95% 0.36 – 0.71)
Methods included propensity analysis for
statin prescription and Cox PH model with a
substantial number of clinical covariates
So, what’s the problem?
51% reduction in
mortality observed in
12 months
24% reduction in
mortality observed in
72 months
NEJM 1998;339:1349-57
Red Flag
• If it looks too good to be true, it probably is
too good to be true
Potential Biases
• Channeling (selection bias in pharmacoepi
studies)
• Misclassification (exposure is not time
independent)
15
Person-time
RR = 1/4 / 2/4 =.5
RR = 1/ 42 person-months / 2 /42 pm = 0.5
X
8
Statin = 2 / 37 Non statin = 1 / 47
X
8
RR = 2.4
7
Statin
Group
7
Statin
Group
6
5
5
X
4
No
Statin
Group
6
X
4
3
2
D/C @ 1 month - 11 months non-statin exposure
X
No
Statin
Group
X
1
3
2
Start statin @ 1 month - 6 months -statin exposure
X
X
1
1
2
3
4
5
6
7
8
9
10
11
12
Time (Months)
A different approach
1
2
3
4
5
6
7
8
9
10
11
12
Time (Months)
Message
• Vital to consider the time dependency of drug
exposure
• Another relatively easy method is to perform
a nested case control study that matches on
cohort entry
• Assure equal follow-up time
Results: Decrease in mortality of 34% 95%CI (4-55%) after 36 months)
16