Internal Validity

Research Design for Evaluations
Tim Bruckner, PhD, MPH
Assistant Professor
Public Health & Planning, Policy, and Design
University of California, Irvine
[email protected]
Training Programme on Health Workforce Analysis and Planning
Rio de Janeiro
August 8-13, 2011
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Learning Objectives
•Define the goal of an evaluation
•Understand internal and external validity
•Identify threats to internal validity
•Be able to describe the following designs: single group
case, randomized two-group, non-equivalent two group,
and qualitative study
• Describe the strengths and limitations of 3 example
study designs
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Definition of Evaluation*

The systematic and objective
assessment of an ongoing or completed
initiative, its design, implementation and
results
– Assessments (measurements) are
comparable across place & time
– Others that repeat the assessment could
achieve the same results (repeatable)
* Source: Handbook on monitoring and evaluation of human resources for
health: with special applications for low- and middle-income countries. Edited
by Mario R Dal Poz, Neeru Gupta, Estelle Quain and Agnes LB Soucat. WHO
2009.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Definition of Evaluation*

Especially in HRH evaluations, we must
ensure
– systematic measurements
– measuring of the right variables
» Joanne will talk about useful survey designs
•Source: Handbook on monitoring and evaluation of human resources for
health: with special applications for low- and middle-income countries. Edited
by Mario R Dal Poz, Neeru Gupta, Estelle Quain and Agnes LB Soucat. WHO
2009.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Research Design and Evaluations

The goal of the research is often to determine
whether there exists a causal relation between two
variables
– Ex: did coordinated clinical care cause better health
outcomes?

I will describe research designs often used for
human resource evaluations
– only a small subset of all designs
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
http://www.youtube.com/watch?v=R3zGJdFPBFY&feature=related
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
An Impact Evaluation of Coaches
Brazil beats Argentina 3-0 in finals
Paraguay beats Brazil in Quarterfinals
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
With data only from Copa Americas:
Is Dunga a better Coach?
What about the team members?
Is 2007 team just better?
Even if 2007 and 2011 teams had
same roster, would aging matter?
Are CBF resources to program similar?
What about quality of other teams? Have they improved?
Were the assistant coaches the real difference makers?
So many other potential explanations!!!
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Key Goal of Research Design
•
Minimize Threat of Plausible Rival Hypotheses
•
use various designs to reduce the plausibility of
other explanations
•
Often, a design goal is to use a comparison group
that approximates the test group in every way
EXCEPT for the variable we want to evaluate
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Ideal Research Design
•
use a comparison group that approximates the
test group in every way EXCEPT for the variable
we want to evaluate
•
In an “ideal world,” to evaluate Dunga vs.
Menezes, take the 2007 team  make it the 2011
team and change nothing but the coach
•
This is impossible, but research design
attempts to approximate “ideal world”
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Establishing Cause and Effect
Temporal precedence
Cause
then
Effect
Time
if X, then Y
if not X, then not Y
Covariation of cause and effect
No alternative
explanations
Program
Alternative
cause
Alternative
cause
Causes?
Alternative
cause
Outcome
Alternative
cause
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Internal Validity
Ability to rule out alternative explanations for the findings
(stronger validity = fewer alternative explanations)
Threats to internal validity
•
History, Maturation, Testing, Instrumentation, Mortality,
Regression to the mean
•
Differences in treatment and comparison (control) group before
study begins (selection)
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
External Validity
•
Extent to which results generalize to a larger population
•
“Generalizability”
•
ex: Para region as representative of Brazil?
•
ex: survey administered via cell phone roster
•
We will focus primarily on internal validity
•
However, Health Ministers view external validity as
crucially important when considering a “scale-up”
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Single Group Case
Alternative
explanations
Measure
baseline
O
Administer
program
X
Measure
outcomes
O
Alternative
explanations
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Example


Increase productivity (visits per hour) – incentives
Increase knowledge of quality procedures incentives

Pre-post single group design

Measures (O) are visits per hour
– Period 0 (beginning)
– Period 1 (end)
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
History Threat to Internal Validity
Pretest
O


Program
Posttest
X
O
Any other event that occurs between pretest
and posttest
New wide-spectrum drug available (reduces
visit time)
– Ex: antidepressants in Sweden
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Maturation Threat
Pretest
O


Program
Posttest
X
O
Normal growth between pretest and
posttest.
High % of new MDs, they get more
productive over time anyway.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Testing Threat
Increase knowledge of
quality procedures (incentives)

Pretest
O


Program
Posttest
X
O
The effect on the posttest of taking the
pretest
MDs may have learned from the test,
not the program
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Instrumentation Threat
Pretest
O


Program
Posttest
X
O
Any change in the test from pretest and
posttest
Ex: change due to different forms of test, not
to program (problem: not systematic)
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
“Mortality” Threat
Pretest
O


Program
Posttest
X
O
Nonrandom dropout between pretest
and posttest
Ex: worst MDs or clinics drop out of
program
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Regression Threat
Pretest
O


Program
Posttest
X
O
Group is a nonrandom subgroup of
population (all poor performing clinics)
For example, clinics will appear to improve
because of regression to the mean
– Previous poor performance was an extreme
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Why Does It Happen?
• Regression artifacts occur whenever
you sample asymmetrically from a
distribution.
• Ex: pull the top 10%, or the bottom 10%
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Multiple-Group Studies:
Threats to Internal
Validity
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Central Issue



When you move from single to multiple group
research the big concern is whether the
groups are comparable.
Usually this has to do with how you assign
units (for example, persons) to the groups (or
select them into groups).
We call this issue selection
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Multiple Group Case
Alternative
explanations
Measure
baseline
O
Administer
program
X
O
Measure
baseline
Measure
outcomes
O
O
Do not
administer
program
Measure
outcomes
Alternative
explanations
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license.
All rights reserved.
Example

Increase productivity (visits per hour) – incentives
Increase knowledge of quality procedures

Pre-post comparison group design

Measures (O) are visits per hour

– Period 0 (beginning)
– Period 1 (end)
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Selection Threats
O
O

X
O
O
Any factor other than the program that
leads to posttest differences between
groups.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Threats to Internal Validity
Selection
Selection-history
Selection- maturation
Selection- testing
Selection- instrumentation
Selection- mortality
Selection- regression
Social Threats
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
“Social” Threats to Validity for Two-Group?
• All are related to social pressures in the
research context, which can lead to
posttest differences that are not directly
caused by the treatment itself.
• Most of these can be minimized by
isolating the two groups from each
other, but this leads to other problems
• Ex: hard to randomly assign and then
isolate
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Diffusion or Imitation of Treatment
• Control providers (i.e., NOT assigned to
treatment) might learn about methods to
improve productivity from treated people (for
example, physician friend in treatment clinic)
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Compensatory Equalization of
Treatment
=
• Administrators give a compensating
treatment to controls.
• Often an attempt to appease upset controls
• Ex: hints to improve “quality” of procedures
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Compensatory Rivalry
• Controls compete to keep up with
treatment group.
• May increase productivity as a result
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Resentful Demoralization
• Controls "give up" or get discouraged
• Out of frustration, they try even less
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Types of Designs
Random assignment of treatment?
Yes
Randomized or
true experiment
No
Control (comparison) group or
multiple measures?
Yes
Quasi-experiment
No
Nonexperiment
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Two-Group
Randomized Experiment
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Basic Design
R
R


X
O
O
Proper random assignment assures that we have
probabilistic equivalence between groups with
respect to all plausible alternative explanations
Differences between groups on posttest indicate a
treatment effect
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Randomized Experimental Design
Kinnersley, P. et al (2000). “Randomised
controlled trial of nurse practitioner versus
general practitioner care for patients
requesting “same day” consultations in
primary care.” British Medical Journal
320:1043–8.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Randomized Experimental Design
Kinnersley et al. (2000)



Setting: Ten general practices in south Wales
and England (near Bristol), 1999
Research question: what are the differences
between care from nurse practitioners and
general practitioners for patients seeking “same
day” consultations in primary care?
Patients seeking same-day consultations were
randomly assigned to a nurse practitioner (NP)
or general practitioner (MD).
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Kinnersley et al. (2000)

Types of randomization:
– Block day or within day (clinic’s choice)

Data:1,368 patients (10 practices)

Outcome measures
– Patient satisfaction 2 weeks after initial visit
– Resolution of symptoms
– Length and quality of consultation
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Potential threats to Internal validity
– What factors may influence whether a person
visits a GP versus NP, and are those factors
related to outcome measures?





Patient preferences
Health / co-morbidity of patient
Age of patient
Education of patient
Familiarity with practice
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Other issues

External validity
– 27 NPs invited to participate
» 12 participated
» 7 non-respondents and 8 declined

Practical Implementation issue
– Randomization was either done by day (4 general
practices) or within a day (6 general practices),
» By day: disrupts work for general practice; easier to
administer
» Within day: controls for temporal events related to a day;
harder to administer
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Findings from Kinnersley et al. (2000)

no difference in reported symptoms 2 weeks after visit
– (NP vs GP)


NP spent more time per consultation
On average, increased patient satisfaction with NPs (vs
GP) without a significant change in practice or outcomes

However, patients still preferred to see GP next time!

Results support increased use of NPs for “same day” care
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Nonequivalent
Groups Design
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
The Basic Design
N
N

O
O
X
O
O
Key Feature: Nonequivalent
assignment
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
What Does Nonequivalent Mean?




Assignment is nonrandom.
Researcher did not control assignment.
Groups may be different.
Group differences may affect outcomes.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Barber et al. (2007)

“The Contribution Of
Human Resources For
Health To The Quality
of Care In Indonesia.”
Health Affairs 26(3):
w367-w379.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Non-equivalent two group design
Barber et al. (Health Affairs, 2007)

Setting: Indonesia, 1993-1997

Research questions:



(1) does eliminating incentive to practice in rural
areas reduce rural MDs?
(2) how does the presence and quantity of physicians
affect primary health care quality, especially in
remote rural areas?
Design: pre- and post-measures
– The “X” treatment: beginning in 1992, government froze
hiring of civil servants due to decline oil revenues
» MDs in rural areas decreased between 1994 and 1998
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Two Groups: “Treatment” and control

Rural Java Bali gets the “X” (change in incentive)
’93
N
N

O
O
‘97
X
O
O
Urban Java Bali does not
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Non-equivalent two group design
Barber et al. (2007)



Data: Indonesian Family Life Surveys, 1993 and
1997 (n=7,000 households). Public facility data
(n=992 in 1993; n=915 in 1997)
Analytical methods: regression analyses with
community-level fixed effects
Measures:
– concentration of MDs in facility
– Quality of care survey using several case scenarios
» Prenatal care, adult care
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Internal validity
Barber et al. (2007)

Internal validity
– What factors other than the reduced incentive may
affect where MDs locate?
– Are locations related to ability to deliver highquality care?
• between 1993 to 1997, MoH developed contracting
mechanisms to fill positions
•lower technological and facility investment in rural areas may
have led MDs to stay in urban areas & reduce quality of care
• declining social and economic infrastructure of rural
communities during the 90’s may reduce retention
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
External validity
Barber et al. (2007)

External validity: what population does the
sample represent
Perhaps study pertains to situation in Indonesia only,
for this particular time period,
In which elimination of incentive occurred
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Barber et al. Findings
Health care quality declined everywhere, but large MD
reductions in rural outer Java-Bali
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Large reduction in MDs, controlling for rivals
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
More MDs related to Increase in Quality of
Prenatal Care
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
What about exploratory studies?

Goal: Hypothesis-generating rather than hypothesis-testing

Qualitative research can



help to understand and categorize the complex environment in
which the health worker functions
assist in unravelling the motivation behind certain behaviours
methods typically involve in-depth interviews or observation with
key informants
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Examples of qualitative inquiry


Why do too few health workers opt for a
career in the public sector, favouring
private employment instead?
Why do trained nurses choose to
remain unemployed ?
* Source: Handbook on monitoring and evaluation of human resources for
health: with special applications for low- and middle-income countries. Edited
by Mario R Dal Poz, Neeru Gupta, Estelle Quain and Agnes LB Soucat. WHO
2009.
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
Summary




Most study designs attempt to establish cause and
effect relation
Strong designs minimize threats to internal validity
Systematic and objective data collection methods are
a core foundation for a research design of HRH
evaluation
Qualitative approaches, when rigorously performed,
generate hypotheses regarding more complex
workforce issues
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.
And . . .

I will happily serve as a consultant to evaluate
Brazil’s coaching staff in return for World Cup
2014 tickets
© 2007 Thomson, a part of the Thomson Corporation. Thomson, the Star logo, and Atomic Dog are trademarks used herein under license. All rights reserved.