EPB PHC 6000 EPIDEMIOLOGY FALL, 1997

Unit 5:
Evaluating Associations
Unit 5 Learning Objectives:
1. Understand the concepts of “valid” and
“causal” statistical associations.
2. Understand the possible explanations for
statistical associations:
--- Chance
--- Bias
--- Confounding
3. Distinguish between the major types of
bias in epidemiologic studies.
4. Distinguish between “internal validity”
and “external validity.”
Unit 5 Learning Objectives (cont.):
5. Understand the concept of confounding
and methods to assess its presence.
6. Understand the concept of causality,
including necessary and sufficient causes.
7. Understand pros and cons of guidelines
used to evaluate causal associations in
epidemiology.
Assigned Readings:
Textbook (Gordis):
Chapter 14, pages 206-222 (from
association to causation)
Chapter 15, pages 224-230 (bias
and confounding)
Evaluating Associations
If we observe an exposure/disease
association, we must consider:
1.
Is the association valid?
(do the study findings reflect the
true relationship between the
exposure and disease?)
2.
Is the association causal?
(Is there sufficient evidence to infer
that a causal association exists
between the exposure and the disease?)
Evaluating Associations
EVALUATING THE VALIDITY OF AN
ASSOCATION:
In any epidemiologic study, there are at least 3
possible explanations for the observed results:
1. CHANCE
2. BIAS
3. CONFOUNDING
These explanations are not mutually exclusive -more than one can be present in the same study
CHANCE
The “luck of the draw”
1.
Rarely can we study an entire population, so
inference is attempted from a sample of
the population
2.
There will always be random variation
from sample to sample
3.
In general, smaller samples have less
precision, reliability, and statistical power
(more sampling variability)
CHANCE
4.
A test of “statistical significance” is
performed to assess the degree to which
the data are compatible with the null
hypothesis of no association
5.
The “p-value” reflects the probability that
the test statistic (e.g. t-statistic or chisquare statistic) produced from the data
would be greater than or equal to the
observed value under the null hypothesis
of no association
CHANCE
6.
By convention, if p < 0.05, then the
association between the exposure and
disease is considered to be “statistically
significant.”
(e.g. we reject the null hypothesis (H0) and
accept the alternative hypothesis (H1))
What does p < 0.05 mean?
Indirectly, it means that we suspect that the
magnitude of effect observed (e.g. risk
ratio) is not due to chance alone (in the
absence of biased data collection or
analysis)
CHANCE
Example: Possible biased coin
Coin Toss - 10 Times: Observed Expected
Heads
7
5
Tails
3
5
Odds H:T 7:3 = 2.33
Excess Heads = O - E = 7 - 5 = 2
p-value: > 0.05 (computation not shown)
The observed excess of heads to tails is not much
greater than that which might be expected by
chance
CHANCE
Example: Possible biased coin
Coin Toss – 1,000 Times:
Observed Expected
Heads
700
500
Tails
300
500
Odds H:T 700:300 = 2.33
Excess Heads = O - E = 700 - 500 = 200
p-value: < 0.05 (computation not shown)
The observed excess of heads to tails is much
greater than that which might be expected by
chance
CHANCE
BEWARE:
The p-value reflects both the magnitude of the
difference between the study groups AND the
sample size
So, a related, but more informative measure known
as the confidence interval (CI) is also calculated.
CI = a range of values within which the true
population value falls, with a certain degree of
assurance (probability).
CHANCE
HYPOTHETICAL EXAMPLE OF 95%
CONFIDENCE INTERVAL
Exposure:
Outcome:
Risk Ratio:
p-value:
95% C.I.:
Caffeine intake (high versus low)
Incidence of breast cancer
1.32 (point estimate)
0.14 (not statistically significant)
0.87 - 1.98
95% confidence interval
____________________________________________
0.0
0.5
1.0
1.5
2.0
(null value)
CHANCE
INTERPRETATION:
Our best estimate is that women with high caffeine
intake are 1.32 times (or 32%) more likely to develop
breast cancer compared to women with low caffeine
intake. However, we are 95% confident that the
true value (risk) of the population lies between
0.87 and 1.98 (assuming an unbiased study).
95% confidence interval
____________________________________________
0.0
0.5
1.0
1.5
2.0
(null value)
CHANCE
So, if the 95% confidence interval does NOT
include the null value of 1.0 (p < 0.05), then we
declare a “statistically significant” association.
Question:
How do we declare a “valid” statistical
association?
Answer:
By ruling out, to the extent possible,
bias or confounding
CHANCE
Note: Although we have initially reviewed the
role of chance, the conventional sequence
would be to first assess (rule out) the presence
of systematic bias.
In other words, it makes no sense to evaluate
chance as a possible explanation when a study
is biased from the start (i.e. the “experiment” we
have set up is flawed, and hence, we should
discard it).
BIAS
BIAS: Systematic error in the design,
conduct, or analysis of a study that
results in a mistaken estimate of an
exposure/disease relationship
1. SELECTION BIAS
2. INFORMATION BIAS
*
*
*
*
Interviewer
Recall Bias
Reporting Bias
Surveillance Bias
BIAS
SELECTION BIAS: Any systematic error that
arises in the process (mechanism) of
identifying the study populations (i.e. the
two 2 study groups to be compared)
• Occurs when selection of study subjects
(whether by exposure or disease status) is
based on different criteria related to
exposure or disease
• Results in the study groups being noncomparable, unless some type of statistical
adjustment can be made
SELECTION BIAS
EXAMPLE: Case Control Study
Outcome: Hemorrhagic stroke
Exposure: Appetite suppressant products that
contain Phenylpropanolamine (PPA)
Cases:
Persons who experienced a stroke
Controls: Persons in the community without
stroke
Bias:
Control subjects were recruited by
random-digit dialing from 9:00 AM to
5:00 PM. This resulted in overrepresentation of unemployed persons
who may not represent the study base
in terms of use of appetite
suppressant products.
SELECTION BIAS
EXAMPLE: Retrospective Cohort Study
Outcome:
Exposure:
Exposed:
Non-exposed:
COPD
Employment in tire
manufacturing
Plant assembly line workers
Plant administrative personnel
Bias: The exposed were contacted (selected) at a
local pub while watching Monday night
football; the non-exposed were identified
through review of plant personnel files.
Exposed persons may have been more likely
to be smokers (related to COPD)
SELECTION BIAS
EXAMPLE: Non-Response
• If refusal or non-response is related to
exposure, the estimate of effect
(exposure/disease) may be biased. For
example, if controls are selected by use of
a household survey, non-response may be
related to demographic and lifestyle factors
associated with employment.
• Responders often differ systematically
from persons who do not respond.
SELECTION BIAS
NOTE:
• Restrictive sampling alone, so long as
different criteria are not used between
study groups, does not confer selection
bias
• It merely means that the study results may
not generalize to the larger population
(external validity)
INFORMATION BIAS
Definition: Systematic differences in the
way in which data on exposure and
outcome are obtained from the various
study groups.
Some Types/Sources of Information Bias:
•
•
•
•
•
Bias in abstracting records
Bias in interviewing
Bias from surrogate interviews
Surveillance bias
Reporting and recall bias
INTERVIEWER BIAS
DEFINITION: Systematic difference in the
soliciting, recording, or interpretation of
information from study participants
• Can affect every type of epidemiologic
study
• May occur when interviewers are not
“blinded” to exposure or outcome status of
participants.
INTERVIEWER BIAS
• Interviewer’s knowledge of subjects’
disease status may result in differential
probing of exposure history
• Similarly, interviewer’s knowledge of
subjects’ exposure history may result in
differential probing and recording of the
outcome under examination
• Placebo control is one method used to
maintain observer blindness in randomized
trials.
RECALL BIAS
DEFINITION: Study group participants
systematically differ in the way data on
exposure or outcome are recalled
• Particularly problematic in case-control
studies
• Individuals who have experienced a disease
or adverse health outcome may tend to
think about possible “causes” of the
outcome. This can lead to differential recall
RECALL BIAS
EXAMPLE:
Case-Control Study
Outcome: Cleft palate
Exposure: Systemic infection during pregnancy
Cases:
Mothers giving birth to children with
cleft palate
Controls: Mothers giving birth to children free
of cleft palate
Bias: Mothers who have given birth to a child
with cleft palate may recall more thoroughly
colds and other infections experienced during
pregnancy
REPORTING BIAS
DEFINITION: Selective suppression or
revealing of information such as past
history of sexually transmitted disease.
• Often occurs because subject reluctance
to report an exposure due to attitudes,
beliefs, and perceptions
• “Wish bias” may occur among subjects
who have developed a disease and seek
to show that the disease “is not their
fault.”
SURVEILLANCE BIAS
• If a population is monitored over a period
of time, disease ascertainment may be
better in the monitored population than in
the general population (“surveillance
bias”).
• May lead to biased estimate of
exposure/disease relationship.
MISCLASSIFICATION
DEFINITION: Erroneous classification of the
exposure or disease status of an individual
into a category to which it should not be
assigned
Example:
--Cases incorrectly classified as controls
--Controls incorrectly classified as cases
--Exposed incorrectly classified as nonexposed
--Non-exposed incorrectly classified as
exposed
MISCLASSIFICATION
Non-differential misclassification:
The proportion of subjects misclassified on
exposure does not depend on disease
status
OR
The proportion of subjects misclassified on
disease does not depend on exposure
status
MISCLASSIFICATION
Non-differential misclassification:
• Tends to make the exposure or disease
groups more similar than they really are
• Some non-differential misclassification is
inevitable
• Almost always results in bias towards the
null
• In interpretation, researcher must consider
what real effect might have been obscured
MISCLASSIFICATION
Differential misclassification:
Classification error of exposure status occurs
more frequently among the diseased or
non-diseased
OR
Classification error of disease status occurs
more frequently among the exposed or nonexposed
MISCLASSIFICATION
Differential misclassification:
• Results in relatively unpredictable
effects
• Can exaggerate or underestimate the
true exposure/disease relationship
• By chance (infrequently), can also
result is estimate that is the same as
the true exposure/disease relationship.