PowerPoint

Research Design:
The Forgotten Stepchild of Quantitative Research Methods
Stephen G. West
Arizona State University
1
Preamble:
Motivating Quotes for An Overlooked Perspective
“You can't fix by analysis what you bungled by design."
Light, Singer & Willett (1990) “By Design”
“When it comes to causal inference from quasi-experiments,
design rules, not statistics”
Shadish & Cook (1999)
2
Two Surveys of Quantitative Research Methods
Aiken, West, Sechrest, & Reno (1990); Aiken, West, & Millsap (2008), American Psychologist
Population: PhD programs in APA’ s “Graduate Training in Psychology”
All Programs N = 234 (achieved n = 201, 86% response rate)
Subsample: 25 Elite Programs according to NRC (achieved n = 23, 88%)
Surveyed Three Broad Areas
1. Statistics and Statistical Modeling
2. Measurement/Assessment
3. Research Design/Causal Inference
3
Selected Results
From 1990 to 2008 surveys, gains in exposure of PhD students to new
techniques in statistics (e.g., SEM, MLM); also minor gains in judged
competence in statistics
Some gains in measurement (still not strong)
NO improvement in research design
Judged competence of graduates: high in laboratory experiments
All other designs (field experiments, longitudinal, quasi)
judged competence weak to nonexistent
4
Instruction: Research Methods
57% of departments had some form of graduate departmental research
methods segment or course.
About 30% had no course (either department or program-based)
Elite and Non-elite programs did not differ
Example: Statistical Power (36% of grad students) could use in own
research—i.e., do a power analysis,
not necessarily use design features to strengthen power
Respondents: Free listing of topics to be added to curriculum or be
covered in new faculty hires—research methods almost never listed or
perceived to be needed
5
Some History:
Psychology’s Current Perspective on Research Design
Canon of Psychology’s Research Design
Developed largely around Donald Campbell
1950s, 1960s, 1970s at Northwestern
Group of scholars—Robert Boruch, Thomas Cook, Albert Erlebacher,
Lee Sechrest, Benton Underwood; graduate students (e.g., David
Kenny; Charles Reichardt) and postdocs (David Rindskopf, Will Shadish)
Key Feature: Represented multiple substantive areas within
psychology and education
Developed the perspective:
KEY SOURCES: Campbell (1957); Campbell & Stanley (1963); Cook & Campbell (1979);
Shadish, Cook, & Campbell (2002)
6
Psychology’s Perspective:
Campbell’s Perspective on Research Validity
Four Types of Validity
A. Statistical Conclusion Validity – Validity of inferences about
association
B. Internal Validity – Validity of inferences about whether association
reflects a causal relationship
C. Construct Validity—validity of inferences about higher order
constructs (independent, dependent variables)
D. External Validity– Generalization across variation (to) persons,
settings, treatments, measurements
Cook & Campbell (1979); Shadish, Cook, & Campbell (2002)
7
Campbell’s Perspective
Campbell and colleagues developed exhaustive (!?) list of threats primarily
from research in psychology and education.
Published lists of threats for each type of validity and provided examples.
Provided lists of potential remedies to address threat
Strategy for the researcher:
1. Identify and minimize the threats that exist in the particular
research setting.
2. Add design elements to address remaining threats.
3. Compare the pattern of results predicted from the threat(s) to
validity and the pattern of obtained results (pattern matching)
8
Example List 1: Threats to Statistical Conclusion Validity
1.
2.
3.
4.
5.
6.
7.
8.
9.
Low Statistical Power
Violated Assumptions
Fishing and the Error Rate Problem
Unreliability of Measures
Restriction of Range
Unreliability of Treatment Implementation
Extraneous Variance in the Experimental Setting
Heterogeneity of Units
Inaccurate Effect Size Estimation
Source: Shadish, Cook, and Campbell (2002), p. 45
9
Methods to Increase Statistical Power
1. Use matching, stratifying, blocking
2. Measure and correct for covariates
3. Use larger sample sizes
4. Use equal cell sample sizes
5. Improve measurement
6. Increase the strength of treatment
7. Increase the variability of treatment
8. Use a within-participants design
9. Use homogeneous participants
10. Reduce random setting irrelevancies
11. Use powerful statistical tests; insure their assumptions are met.
Source: Shadish, Cook, and Campbell (2002), p. 47
10
Comment
Most reviews of journal articles in Psychology show:
for moderate effect size, inadequate statistical power of .50 to .60
vs. desired .80 benchmark
Rossi (2013)
Health Psychology area one exception
Maddock & Rossi (2001)
In my opinion, serious consideration of the threats to statistical conclusion
validity and methods of increasing statistical power would have avoided
much of the replication crisis in psychology, particularly if considered in the
context of publication bias from meta-analysis.
11
Example List 2: Threats to Internal Validity
1. Ambiguous temporal precedence
2. Selection
3. History
4. Maturation
5. Regression
6. Attrition
7. Testing
8. Instrumentation
9. Interactive Effects (e.g., Selection x History; Selection x Maturation)
Source: Shadish, Cook, & Campbell (2002), p. 55.
Parallel lists exist for construct, external validity
12
Adding Design Elements and Pattern Matching:
An Illustration: Stores Matched on Prior Sales, Zip Code
Reynolds and West (1987)
13
Pattern Matching
Adding Design Element:
Nonequivalent DVs
Selection x History Threat
14
Pattern Matching
Adding Design Element
Short Time Series
Selection x Maturation
15
New Perspective 1: Rubin’s Potential Outcomes Approach
Developed primarily by statisticians and epidemiologists
Original development in 1970s (Rubin, 1974; 1978)
Used in field research in public health, medicine, education
Recent texts by Hernan & Robins (in press), Hong (2015),
Imbens & Rubin (2015), Morgan & Winship (2015)
Some differences among variants, but much agreement
Largely complements traditional Campbell perspective
16
Some New Emphases
17
Potential and Observed Outcomes
Promotes focus on selection/
missing data mechanism
18
Some Gains Over Traditional Approaches
Methods of Treating “Broken” (imperfect) Randomized Experiments
Attrition from Measurement
Treatment Noncompliance (binary and dose-related)
Variation in Treatment Conditions
Contamination of Treatments (Non-independence)
Methods of Treating Observational Studies, e.g., propensity scores
Methods of Addressing Mediation
Methods of estimating effects of Sequential Treatments
Sensitivity Analysis and Bounds on Effects to assess potential effects of
violations of assumptions
19
Slow Diffusion of Potential Outcomes to Psychology and Education
Illustration: Propensity Scores (Rosenbaum & Rubin, 1983; 1984)
Source: Thoemmes and Kim (2011)
20
New Perspective 2: Pearl
Origins in Bayesian networks and control systems engineering
Pearl asks researchers to write down explicit causal networks
Represents networks using directed acyclic graphs—brings graph theory
and powerful associated mathematics to bear
Well developed for binary, linear models (can be extend to nonlinear)
Some foci
identifying key nodes in graph that can block (causal paths)
identifying back door paths that can confound causal effects
emphasis on local, not global fit
do(X) operator—node set to value; prior links to node blocked
Key sources: Pearl (2000, 2009)
21
Some gains
Elucidates consequences of causal premises and the data
Elucidates confounding
Broadens understanding of missing data (e.g. MAR)
Broadens understanding of mediation
Provides understanding of conditions for generalizability
(termed transportability)
22
Challenges in Reinvigorating Instruction in Research Methods 1
1. New methods best taught in the substantive context in which they
will be applied (Lovett & Greenhouse, 2000)
Relatively few application examples in many substantive areas
No comprehensive texts written for psychology and education
Only a couple of introductory chapters:
Shadish & Sullivan (2012); West, Cham, & Liu (2014)
2. New methods require relatively high level of mathematical and
statistical knowledge. Require more precision in statement of
treatments, relationships, nature of question. Require understanding
new conceptions, new ways of thinking.
23
Challenges in Reinvigorating Instruction in Research Methods 2
3. New Methods require careful specification and examination of
assumptions. Also, probing those assumptions: bounds,
sensitivity analyses. Helps disrupt confirmatory biases.
4. (Some) new methods only work well in large samples.
5. New emphases on computer-intensive descriptive/predictive
methods for complex data sets (big data) which “find”
relationships then implicitly treated as causal.
6. Several areas of psychology emphasize randomized laboratory
experiments in contexts in which there are be less relevance
of classic and new approaches to research methods.
24
A Challenge: Achieving Relevance
Several areas (e.g., behavioral neuroscience, cognitive, social) primarily
emphasize randomized laboratory experiments.
What do we have to offer? We need to demonstrate relevance.
Develop appropriate level articles/chapters with area specific examples
to illustrate potential gains. Needs input from both substantive area
specialists and methodologists. We have not done much of this. Can be
challenging.
25
Example: Methods Chapter for Behavioral Neuroscience
Observations
1. Effect size = 1.0 not unreasonable
(brain area destruction; ovarectomy)
2. N = 20 per group is large given $500
per rat
3. Design methods of increasing power
often cannot be implemented
Source: Talboom, West, & Bimonte-Nelson (2015)
26
A Final Challenge
Foreshadowing Leona Aiken’s talk, we need to interest the next
generation in research methods and identify ways to transmit this
knowledge to graduate students in psychology and education.
According to the mathematics genealogy project, Donald Rubin has 51
PhD students and 186 descendants (many, not all work in research
design). Majority in academia. None appear to have faculty
appointments in psychology or education; a few consult in these areas
(e.g., Jennifer Hill; Liz Stuart).
The number of methods faculty trained during the heyday of the
Northwestern group was small, and are nearing retirement. They
trained few students of their own in research methods. Few
27
descendants. Lack of fecundity.
Some Opportunities 1:
Workshop in Quasi-Experimental Designs
Sponsor: The Institute for Education Sciences; No Fees
State-of-the-art quasi-experimental methods for evaluating education
interventions.
Dates: July 31 - August 11, 2017 (Two Weeks)
Location: Northwestern University
Scholarly homes of presenters—Policy, Education, with exception of one presenter
Thomas D. Cook, Northwestern University , Institute for Policy Research
Peter Steiner, University of Wisconsin, Madison, Educational Psychology
Stephen G. West, Arizona State University, Psychology
Coady Wing, Indiana University, Bloomington, Public and Environmental Affairs
Vivian Wong, University of Virginia, Education
Attendees in previous years —mainly outside Psychology
Most from Education and Public Policy
28
Some Opportunities 2:
Atlantic Causal Inference Conference
Several Workshops on Design and Approaches to Causal Inference
No additional fee
E. G. 2016 Workshop Topics
Principal Stratification
Causal Inference Multilevel Observational Studies
Causal Inference with Error Prone Covariates
Design Based Inference for Experiments and Observational Studies
Sensitivity Analysis
Most Instructors, Particpants from Statistics, Biostatistics, Epidemiology
29
Our Hope: Young Faculty with Interests in Research Design and
Behavioral Science
Heining Cham Peter Streiner
Psychology
U. Wisconsin
Fordham U.
Education
Felix Thoemmes Coady Wing Vivian Wong
Cornell U.
Indiana U.
U. Virginia
Human Dev.
Public Affairs Education
Only one has a primary faculty appointment in psychology
30
Thank You
Arizona State University
Psychology Building
Freie Universität Berlin
Silberlaube
31
32
We only think when we are confronted with problems.
--attributed to John Dewey
Researchers prefer to have standard methods (e.g., randomized
experiment) and standard paradigms. To carefully think through a new
methodological approach (i.e., to evaluate the tradeoffs) or to think
through what evidence is required to disconfirm one’s hypothesis
requires much effort.
33
Alternative Didactic Presentation
Introduce Structure into Lists
Threats to Internal Validity
A. Threats arising from participant’s growth and experience
1. History
2. Maturation
B. Threats arising from measurement process
3. Testing
4. Instrumentation
5. Statistical Regression
34
Why the loss of interest in Campbell’s Approach?
Some Conjectures
1. No longer central research group expanding knowledge
2. Perception—few new developments (nothing new to learn)--but
see Cook, Shadish—e.g., four arm; single subject designs)
3. Few academic offspring (Campbell vs. Rubin)
4. Usual Didactic Presentation—Lists—are boring—need alternatives
5. Lack of relevance (real and perceived) to some areas of
psychology
35
(RE)
(RD)
(OS)
Source: Reichardt, Psychological Methods (2006)
36