case-control study

Spring 2008
Case-Control Studies
and Odds Ratio
STAT 6395
Filardo and Ng
Types of Epidemiologic studies
Case-Control studies
A study in which a group of persons with a disease
(cases) and a comparison group of persons without
the disease (controls) are compared with respect to
the history of past exposures to factors of interest
Case-Control studies
A study in which a group of persons with a disease
(cases) and a comparison group of persons without
the disease (controls) are compared with respect to
the history of past exposures to factors of interest
Past
Time
Type of studies  Observational  Case-Control
Present
Case-Control studies (study schema)
Type of studies  Observational  Case-Control
Cohort studies
A study in which a group
of persons exposed to a
factor of interest and a
group of persons not
exposed are followed
and
compared
with
respect to the incidence
rate of the disease or
other condition of interest
Time
Type of studies  Observational  Cohort studies
Comparison is fundamental in determining the
relationship between an exposure and a
disease
• Cohort study

Incidence rate in exposed group is not enough
(Needs nonexposed comparison group)
• Case-control study

Past exposures of a group of cases is not enough
(Needs a control group for comparison)
Fundamental difference between a casecontrol study and a cohort study
• A case-control study starts with people:
a) with; and b) without the disease of interest and compares their past
exposures
• A cohort study starts with people:
a) with; and b) without the exposure of interest and compares their
future disease
Time
The fundamental difference between a case-control study
and a cohort study is not the calendar time period during
which exposures took place
…for example, in a both a retrospective cohort study
and a case-control study, the calendar time period
during which exposures took place is in the past
How do we measure past exposures in a
case-control study?
• Interview
• Medical records/charts
• Assays of biological specimens (e.g. nested casecontrol studies)
Goal is to measure exposures that occurred before the
onset of disease
Selection of cases: incident vs. prevalent
• Incident (newly diagnosed) cases

Risk factors might contribute to the development of the disease
• Prevalent (existing) cases

Cannot distinguish between risk factors for the development of the disease and risk
factors for cure or survival
 More difficult to know which came first, the exposure or the disease
 Exposure measurements problematic
Prevalence = Number of a given disease at a particular
time point or during a particular time period (this is a
proportion and not a rate)
Selection of cases
• Definition of source population -- the population that
gives rise to the cases
• Case definition
-- need to have definite medical
criteria for who is a case of the disease
• Case identification -- need to put a system in place
for finding all cases who meet the case definition and
are members of the source population
Source Population  Cases and Controls
Selection of controls
• Selection of appropriate controls is the major
methodological challenge in case-control studies
• In a case-control study, we want to determine
whether exposures of interest differ between the case
group and the source population
• Controls should be selected from the source
population that gave rise to the cases.
Source Population  Cases and Controls
Selection of controls (continued)
• The controls should be representative of the source
population with respect to the exposures of interest.
• Ideally, controls should be a random sample of the
source population.
Prevalent cases of the disease should not be eligible to be controls
Study I: Controls selected such that they have a higher level of exposure
than the source population, producing an an artifactual result that the
exposure is negatively associated with the disease
40
35
30
25
20
15
10
5
0
Cases
Controls
Source
Population
Percent Exposed
Study II: Controls selected such that they have a lower level of exposure
than the source population, producing an artifactual result that the exposure
is positively associated with the disease
25
20
15
10
5
0
Cases
Controls
Source
population
Percent Exposed
Study III: Controls selected such that they have the same level of exposure
as the source population, producing the unbiased (true) result that the
exposure is not associated with the disease
25
20
15
10
5
0
Cases
Controls
Source
Population
Percent Exposed
Case-control studies classified by type of
source population (population that gives rise
to the cases)
• Population-based case-control studies
• Hospital (or clinic)-based case-control studies
• Nested case-control studies
Case-Control studies  Population-based, hospital-based, nested
Population-based case-control studies
• Source population: all residents of a defined
geographic area who do not have Disease X
• Cases: all new cases of Disease X that occur among
residents of a defined geographic area over a
specified period of time
• Controls:
sample (ideally random) of the source
population over the same period of time
Case-Control studies  Population-based, hospital-based, nested
Population-based: Parkinson’s disease
• Source population: residents of Texas who do not
have Parkinson’s disease
• Cases: all new cases of Parkinson’s disease among
Texas residents identified over a 3 year period
through a rapid-reporting system
• Controls: sample (ideally random) of residents of
Texas who do not develop Parkinson’s disease over
the same 3 year period
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in population-based casecontrol studies
• Random sample from a population registry
• Neighborhood controls -- sample of persons
who reside in the same neighborhoods as the
cases

Often done by matching
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in population-based casecontrol studies (continued)
• Random selection of telephone numbers
(random digit dialing)
members of the source population does not have the
same probability of being contacted (not a random
sample):



persons without a landline have zero probability of being selected
households vary the amount of time someone is home and number of telephones
many people screen their telephone calls
Case-Control studies  Population-based, hospital-based, nested
Hospital-based case-control studies
• Source population: all people without Disease X who
would attend Hospital A if they had Disease X
• Cases: all new cases of Disease X identified in
Hospital A over a specified period of time
• Controls (most commonly): sample of patients in
Hospital A with diagnoses other than Disease X over
the same period of time
Case-Control studies  Population-based, hospital-based, nested
Hospital-based: Parkinson’s disease
• Source population: all persons who would attend
Baylor University Medical Center (BUMC) if they had
Parkinson’s disease

Note that this source population is, in practice, impossible to
identify
• Cases: all new cases of Parkinson’s disease seen at
BUMC over a 3 year period
• Controls: sample of patients at BUMC with diagnoses
other than Parkinson’s disease over the same 3 year
period
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in hospital-based casecontrol studies
• Source population: all persons who would
attend the hospital if they developed the
disease of interest

Source population in a hospital-based case-control study is usually not identifiable

A random sample of the general population will not necessarily correspond to a
random sample of the source population because it does not take into account the
referral patterns of the hospital

Furthermore, referral pattern depends on the disease
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in hospital-based casecontrol studies (continued)
• Hospital-based controls are patients without
the disease from the same hospital

Hospital-based controls are a nonrandom sample of the source population, most of
whom are healthy

Nonrandom sampling of the source population introduces the possibility that the
distribution of the exposure of interest among the controls is not the same as it is
in the source population
Case-Control studies  Population-based, hospital-based, nested
Hospital-based controls may not reflect the
exposure distribution in the source population
• Exposures of interest may cause or prevent the diseases
for which patients in the control group were hospitalized
Case-Control studies  Population-based, hospital-based, nested
Hospital-based controls may not reflect the
exposure distribution in the source population
• Persons with an exposure of interest may be more or
less likely than persons without the exposure to be
hospitalized for their disease if they develop it (this could
also be an issue for cases)
Case-Control studies  Population-based, hospital-based, nested
Hospital-based controls unrepresentative of the
exposure distribution in the source population:
Parkinson’s disease
• Controls:
random sample of persons
hospitalized for other diseases, many of
whom were hospitalized for heart disease

Low folic acid intake is a risk factor for heart disease

This control group would have a lower proportion of persons with high folic acid
intake than the source population
Case-Control studies  Population-based, hospital-based, nested
Controls selected such that they have a lower level of exposure than the
source population, producing an artifactual result that the exposure is
positively associated with the disease
25
20
15
10
5
0
Cases
Controls
Source
population
Percent Exposed
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in hospital-based casecontrol studies (continued)
• Limit the controls to those hospitalized for
diseases for which there is no suspicion of a
relationship with the exposures of interest
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in hospital-based casecontrol studies (continued)
• Include a variety of diseases in the control
group, so as to dilute the biasing effects of
including a disease that might related to the
exposure, unbeknownst to the investigator
Case-Control studies  Population-based, hospital-based, nested
Selection of controls in hospital-based casecontrol studies (continued)
• Excluded diseases should only apply to the
diagnosis at the current hospitalization
Case-Control studies  Population-based, hospital-based, nested
Excluded diseases should only apply to the
diagnosis at the current hospitalization:
Parkinson’s disease
• Controls:
persons hospitalized due to
traumatic injury, who are believed to be
representative of the source population with
respect to folic acid intake

Persons with a history heart disease should not be excluded from this traumatic
injury control group. This would cause the control group to have an overrepresentation of persons with high folic acid intake
Case-Control studies  Population-based, hospital-based, nested
Controls selected such that they have a higher level of exposure than the
source population, producing an an artifactual result that the exposure is
negatively associated with the disease
40
35
30
25
20
15
10
5
0
Cases
Controls
Source
Population
Percent Exposed
Case-Control studies  Population-based, hospital-based, nested
Nested case-control studies (nested within a
concurrent cohort study)
• Source population:
the subjects in an
ongoing concurrent cohort study who did not
have Disease X at baseline
• Cases:
all new cases of Disease X that
occurred in the cohort over a defined period
of follow-up
• Controls: random sample of subjects in the
cohort who did not develop Disease X over
the defined period of follow-up
Case-Control studies  Population-based, hospital-based, nested
Nested case-control studies (nested within a
concurrent cohort study)
• Exposures measured by assay of stored
biologic specimens
subjects at baseline
collected
from
the
Nested case-control study has advantage of cohort studies:
exposure measured at baseline before development of disease
Nested case-control studies (nested within a
concurrent cohort study)
• Collection of additional exposure information
not collected at baseline requires laborintensive data collection activities, such as
abstraction from records
Nested case-control study has advantage of cohort studies:
exposure measured at baseline before development of disease
Nested case-control studies (nested within a
concurrent cohort study)
cases
and
identified
controls
specimens used to assess
exposure and compare it
among study groups
biologic specimens
collected from the
subjects at baseline
Time
Case-Control studies  Population-based, hospital-based, nested
Nested case-control: Parkinson’s disease
• Source population: the members of the Nurses’ Health Study cohort
who donated blood samples in 1989-1990 and had no history of
Parkinson’s disease
• Cases: all new cases of Parkinson’s disease that developed in this
source population from 1991 to 2000
• Controls: random sample of the Nurses’ Health Study cohort who did
not develop Parkinson’s disease from 1991 to 2000
• Measurement of exposure: serum folic acid level at baseline
Case-Control studies  Population-based, hospital-based, nested
Advantages of the nested case-control studies
over the concurrent cohort study itself
• Cost: suppose there were 32,000 women in the
source population, 200 cases, and 200 controls.
• Causality: the exposure occurred before the disease
• Further research: preservation of precious biologic
specimens –remaining specimens available for other
studies
Case-Control studies  Population-based, hospital-based, nested
Accounting
controls
for
confounders
in
selecting
Matching -- selection of controls such that
they are similar to cases with respect to
factors other than the exposures of interest
Matching
• Common matching factors: age, sex, race,
socioeconomic status
Accounting for confounders  Matching
Matching
• Frequency matching:
selection of controls
such that the distributions of the matching
factors (e.g., age, sex) are similar in the case
and control groups
Accounting for confounders  Matching
Matching
• Individual
matching: each control is
individually matched to a case with respect to
specific factors, resulting in matched casecontrol pairs

For example, for each case, select a control of the same
race, sex, age (within 3 years), neighborhood (within 3
blocks)
Accounting for confounders  Matching
Matching is intuitively appealing, but its
implications are complicated …
• In a case-control study, the association
between matching
cannot be studied
Accounting for confounders  Matching
factors
and
disease
Matching is intuitively appealing, but its
implications are complicated …
• Overmatching can occur if a matching factor
is associated with the exposure of interest,
thus making the controls artifactually like the
cases with respect to that exposure
Accounting for confounders  Matching
Matching is intuitively appealing, but its
implications are complicated …
• Matching must be taken into account in the
analysis through special analytic techniques

We will cover some of these techniques in this course
Accounting for confounders  Matching
Conventional data layout for case-control
study (2x2 table)
First select
Then
Measure
Past
Exposure
Cases
Controls
Total
Exposed
a
b
m1
Not
Exposed
c
d
m2
Total
n1
n2
N
Estimating relative risk in a case-control study
• In a case-control study, we cannot measure
incidence rates in the exposed and nonexposed
groups, and therefore cannot calculate the
relative risk directly
• In a case-control study, the odds ratio is a good
approximation of the relative risk in some
circumstances …
Odds
• Probability that cases were exposed =
a/(a+c)
• Probability that cases were not exposed =
c/(a+c)
• Odds of a case having been exposed =
[a/(a+c)]/[c/(a+c)] = a/c

Similarly, odds of a control having been exposed = b/d
Odds Ratio (OR)
The ratio of the odds that the cases were
exposed to the odds that the controls were
exposed = (a/c)/(b/d) =
ad/bc

The odds ratio is the cross-product ratio in the 2x2 table
Interpretation of the Odds Ratio
• The odds ratio is a good approximation of the
relative risk when the disease being studied
occurs infrequently (which is the situation in
most circumstances case-control studies are
conducted)

ONLY in this case the interpretation of the odds ratio in case-control
studies is the same as the interpretation of the relative risk in cohort
studies
Interpretation of the Odds Ratio
• OR = 1


Risk in exposed = risk in nonexposed
No association
• OR > 1




Risk in exposed > risk in nonexposed
Positive association
The larger the OR, the stronger the association
May or may not be causal
Interpretation of the Odds Ratio
• OR < 1


Risk in exposed < risk in nonexposed
Negative association
 The smaller the OR, the larger the negative association
 May or may not be causal
 If causal, indicates a protective effect
Interpretation of the Odds Ratio: Example
OR = (59 X 44) / (33 X 17) = 4.63
Patients that eat ¾ of served or less are 4.63 times more
likely to be dependent feeding than patients that eat
more than ¾ of served food
Interpretation of the Odds Ratio
A further example of the calculation and
interpretation of the odds ratio is given by
Bland & Altman (Bland J.M. & Altman D.G.
(2000) The odds ratio. British Medical Journal
320, 1468.)
Interpretation of the Odds Ratio
• The odds ratio may be a misleading
approximation to relative risk if the event rate
is high (Deeks (1996) and Davies et al.
(1998))
Interpretation of the Odds Ratio
Since the odds ratio is difficult to interpret,
why is it so widely used?
Odds ratios can be calculated for casecontrol studies whilst relative risks are not
available for such studies.
Attributable risk percent (exposed) using
odds ratio
[(OR - 1)/OR] x 100
Tells us what percent of the disease among the exposed is due to
the exposure
Attributable risk percent (population) using
odds ratio
_P x (OR-1) _ x 100
P x (OR-1) + 1
where P is the population prevalence of the exposure
• P can be estimated by the prevalence of the
exposure in the controls
• Tells us what percent of the disease in the
total population is due to the exposure