Michael Lechner, PEF, University St. Gallen
21.06.2016
Econometrics of Policy Evaluations
Michael Lechner
Reading before class
It will be helpful to read this paper before class.
Imbens, G.W., and J.M. Wooldridge (2009): "Recent Developments in the Econometrics of Program
Evaluation", Journal of Economic Literature, 47 (1), 5-86 , relevant sections.
A very up-to-date and comprehensive (and easy to read) overview of the field.
List of references
General
Good surveys
Imbens, G.W., and J.M. Wooldridge (2009): "Recent Developments in the Econometrics of Program
Evaluation", Journal of Economic Literature, 47 (1), 5-86 .
A very up-to-date and comprehensive (and easy to read) overview of the field.
Heckman, J.J. (2000): „Causal Parameters and Policy Analysis in Economics: A Twentieth Century
Retrospective“, Quarterly Journal of Economics, 115, 45-97.
The grand picture by the Nobel laureate. A very complete account of the foundation of econometric causal analysis –
although some of his thoughts may give raise to serious controversies.
Heckman, J.J., R. LaLonde and J. Smith (1999) [HLS]: "The Economics and Econometrics of Active Labor
Market Programs", in: O. Ashenfelter and D. Card (eds.), Handbook of Labour Economics, Vol. 3, 18652097, Amsterdam: North-Holland. Sections 1-3. .
This is still the classical survey, but, of course, some new developments are missing.
Blundell, R., and M. Costa Dias (2009): "Alternative Approaches to Evaluation in Empirical Microeconomics",
Journal of Human Resources, 44, 565-640.
Easy to read.
Heckman, J. J. , E. J. Vytlacil (2007): “Econometric Evaluation of Social Programs, Part I: Causal Models,
Structural Models and Econometric Policy Evaluation,” Handbook of Econometrics, Volume 6B, Chapter 70,
Elsevier B.V., doi: 10.1016/S1573-4412(07)06070-9.
Comprehensive summary of the contributions to and the views on the field of causal analysis by a Nobel Laureate.
DiNardo, J. and D. S. Lee (2011): Program Evaluation and Research Designs, Handbook of Labor Economics,
Volume 4a ISSN 0169-7218, doi: 10.1016/S0169-7218(11)00411-4, 463-536.
This survey draws conclusions about ordering of research design that may be controversial.
A recent debate on how to do econometrics
Leamer, Edward (1983): “Let’s Take the Con Out of Econometrics,” American Economic Review, 73(1), 31–43.
Leamer argues that econometrics will be dying because the results are not credible.
Michael Lechner, PEF, HS13
21.06.2016
Angrist, Joshua D., and Jörn-Steffen Pischke (2010): “The Credibility Revolution in Empirical Economics: How
Better Research Design is Taking the Con out of Econometrics”, Journal of Economic Perspectives, 24 (2),
3–30, doi: 10.1257/jep.24.2.3.
This paper argues that nowadays econometric results are much more credible than one or two decades ago (and maybe
the near-death experience three decades ago was helpful for changing the way how empirical econometrics is done in our
days).
Leamer, Edward E. (2010): “Tantalus on the Road to Asymptopia”, Journal of Economic Perspectives, 24 (2),
31–46, doi: 10.1257/jep.24.2.31.
Ed Leamer’s response.
Keane, Michael P. (2010): “A Structural Perspective on the Experimentalist School,” Journal of Economic
Perspectives, 24 (2), 47–58, doi: 10.1257/jep.24.2.47.
Nevo, Aviv, and Michael D. Whinston (2010): “Taking the Dogma out of Econometrics: Structural Modeling
and Credible Inference,” Journal of Economic Perspectives, 24 (2), 69–82, doi: 10.1257/jep.24.2.69.
Stock, James H. (2010): “The Other Transformation in Econometric Practice: Robust Tools for Inference,”
Journal of Economic Perspectives, 24 (2), 83–94, doi: 10.1257/jep.24.2.83.
Angus Deaton (2010): “Instruments, Randomization, and Learning about Development,” Journal of Economic
Literature, 48, 424–455, http:www.aeaweb.org/articles.php?doi=10.1257/jel.48.2.424.
An attack on new econometrics. Very successful in not maximising friends among ‘new’ econometricians.
Heckman, James J. (2010): “Building Bridges Between Structural and Program Evaluation Approaches to
Evaluating Policy,” Journal of Economic Literature 48, 356–398, http:www.aeaweb.org/articles.php?doi:
10.1257/jel.48.2.356.
An even more powerful rocket directed at the camp of the ‘new’ econometricians.
Imbens, Guido W. (2010): “Better LATE Than Nothing: Some Comments on Deaton (2009) and Heckman and
Urzua (2009)”, Journal of Economic Literature 48, 399–423, http:www.aeaweb.org/articles.php?doi:
10.1257/jel.48.2.399.
A LATE, but still powerful defence.
Causality and identification
General
Pearl, J. (2000), Causality - Models, Reasoning, and Inference, Cambridge: Cambridge University Press.
This book provides a fairly general overview over methods that are used in- and outside economics (although at least the
part about DAG's met some severe resistance in econometrics and epidemiology). Note that terminology and presentation
of the problems differ somewhat from the standard used in other papers. (library; epilogue as hardcopy)
Manski, C.F. (1995), Identification Problems in the Social Sciences, Cambridge (MA): Harvard University
Press.
A very nice and readable book directed at a nontechnical audience.
Matzkin, Rosa L. (2007): "Nonparametric Identification", Handbook of Econometrics, Volume 6B, Chapter 73,
Elsevier B.V. DOI: 10.1016/S1573-4412(07)06073-4
This paper gives a fairly comprehensive picture of the identification issue, but using a very high technical level. It’s a difficult,
but rewarding reading.
Economics
Hicks, J. (1979), Causality in Economics, Basil Blackwell: Oxford. (library)
A nice little with some key ideas on c.p. causality. Focuses on the economic theory as popular until the 1970'ies. He does
not seem to like econometrician, though.
Roy, A.D. (1951): "Some Thoughts on the Distribution of Earnings", Oxford Economic Papers, 3, 135-146.
Some economists think that this paper already outlines the potential outcome model of causality that statistician usually
attribute to Neyman and Rubin.
2
Michael Lechner, PEF, HS13
21.06.2016
Statistics
Overview
Holland, P.W. (1986): "Statistics and Causal Inference", Journal of the American Statistical Association, 81,
945-970, with discussion.
Explains fundamental concepts used in statistics and relates them to some selected concepts of philosophy. Very
interesting discussion (in particular Granger and Rubin).
Rubin, D. B. (2005): "Causal Inference Using Potential Outcomes: Design, Modelling, Decisions", Journal of
the American Statistical Society, 100, 322-331.
A nice historical overview with high practical relevance.
Dawid, A. P. (2000): "Causal Inference Without Counterfactuals", Journal of the American Statistical
Association, 95, 407-448. With discussion.
Do we really need counterfactuals to define causal effects? Or do they even harm? A heated debate with a couple of fairly
clear statements.
Greiner, D. James, and Donald B. Rubin (2011): "Causal Effects of Perceived Immutable Characteristics," The
Review of Economics and Statistics, 93, 775–785.
Is causal analysis possible if the object of interest cannot be changed (e.g. sex or age in the discrimination literature)?
Causal inference may be possible if we are not interested in the effect of something that cannot be changed, but in the
effect of the perception of that trait by some other unit ('decider') who influences the outcomes.
Historical references to the potential outcome approach
Neyman, J. (1923): "On the Application of Probability Theory to Agricultural Experiments. Essay on Principles.
Section 9", translated in Statistical Science (with discussion), 1990, 5, 465-480.
Classical piece introducing potential outcomes and randomized experiments to statistics (application to agriculture),
although most of the paper is concerned with getting the right variance in such a case.
Neyman, J. (1935): "Statistical Problems in Agricultural Experimentation", Supplement to the Journal of the
Royal Statistical Society, 2, 107-180.
Gives a summary of development of agricultural experiments so far – nice classical piece.
Rubin, D.B. (1974): "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies",
Journal of Educational Psychology, 66, 688-701.
An influential paper because it can be seen as a forceful ad for using potential outcomes explicitly when discussing and
estimating causal effects.
Hurwicz, L. (1950): “Generalization of the Concept of Identification,” in Statistical Inference in Dynamic
Economic Models, Cowles Commission Monograph, 10, New York: Wiley.
Simpson's Paradox
Simpson, E.H. (1951): "The Interpretation of Interaction in Contingency Tables", Journal of the Royal Statistical
Society. Series B, 33 (2), 238-241.
All correlations can be reversed by conditioning on additional variables! Or: How selection into a treatment can reverse the
causal implications.
Chen, A., T. Bengtsson, and T. Kam Ho (2009): A Regression Paradox for Linear Models: Sufficient Conditions
and Relation to Simpson’s Paradox, The American Statistician, August 2009, 63 (3), 218-225, DOI:
10.1198/tast.2009.08220.
The regression formulation of Simpson's paradox.
Cross, P.J., and C.F. Manski (2002): "Regressions, short and long", Econometrica, 70, 357-368.
In this paper the use of the information in the 'short-regressions' P(Y|X) and P(Z|X) for the identification of the 'long
regression' E(Y|X,Z) is studied, a problem that is very much related to Simpson (1951).
Pavlides, M.G., and M.D. Perlman (2009): How Likely Is Simpson’s Paradox?, The American Statistician,
August 2009, 63 (3), 226-233, DOI: 10.1198/tast.2009.09007.
A Bayesian-type analysis using binary random variables.
Alin, Aylin (2010): "Simpson's paradox", Wiley Interdisciplinary Reviews: Computational Statistics, 2, 247-250,
doi: 10.1002/wics.72.
Directed acyclic graphs (DAGs)
Freedman, D.A. (2004). “On specifying graphical models for causation, and the identification problem.”
Evaluation Review (2004), 26, 267–93.
3
Michael Lechner, PEF, HS13
21.06.2016
An introduction to causal graphs and their relation to other causal approaches. The book by Pearl (2000) contains a lot on
DAGs as well.
Econometrics
The very early days: Demand and supply in agricultural good and the statistical demand curve
Working, E.J. (1927): "What Do Statistical "Demand Curves" Show?", Quarterly Journal of Economics, 41 (2),
212-235.
Perhaps surprisingly the terms endogeneity and exogeneity that are later on intrinsically related to issues of identification do
not appear in this seminal work about the identification issue.
The early days: Linear simultaneous equations and the Cowles Commission
Christ, C.F. (1994): "The Cowles Commission's Contribution to Econometrics at Chicago, 1939-1955", Journal
of Economic Literature, 32 (1), 30-59.
Christ (1994) provides an excellent account of the Cowles Commission's contributions to Econometrics while be hosted at
the University of Chicago (1939-1955) (members were for example Haavelmo, Koopmans, Marshak and Wald).
Simon, H.A. (1953): "Causal Ordering and Identifiability", in Hood, W. C., and T. C. Koopmans, Studies in
Econometric Method, Chapter 3, 49-74, New York: Wiley.
Haavelmo, T. (1943): "The Statistical Implications of a System of Simultaneous Equations", Econometrica,
11(1):1–12.
Making the c.p. condition precise in simultaneous linear equations.
The later days: Causality based on observables (time series econometrics)
Granger C.W.J. (1969): “Investigating Causal Relations by Econometric Models and Cross-Spectral Methods”,
Econometrica, 37, 424-438.
Causality or predictability?
Sims, C.A. (1972): "Money, Income, and Causality", American Economic Review, 1972, 540-552.
A different formulation of the same idea. One of the starting points for the VAR approach (for more references on the first
VAR papers, see Heckman, 2000).
Chamberlain G. (1982): "The General Equivalence of Granger and Sims Causality", Econometrica, 50, 569-581.
The proof that both concepts used in time series econometrics are indeed the same.
Engle R., D. F. Hendry, and J.-F. Richard (1983): "Exogeneity", Econometrica, 51, 277-304.
A rigorous definition of exogeneity and endogeneity. Both concepts are inherently tied to the desire to draw causal
conclusions from observable data.
Adams, P., M.D. Hurd, D. McFadden, A. Merrill, and T. Ribeiro (2003): "Healthy, wealthy, and wise? Tests for
direct causal paths between health and socioeconomic status", Journal of Econometrics, 112, 3-46. (with most
interesting discussion about concepts of causality)
They apply Granger-Sims type non-causality tests plus Markov-chain(1) assumption on panel data. A bit informal what a
causal effect is – therefore they come heavily under attack by the Robins-Rubin-Heckman types in the discussion of the
paper.
Cooley, T. F., and S. F. LeRoy (1985): "Atheoretical Macroeconometrics: A Critique", Journal of Monetary
Economics, 16, 283-308.
This is an attack on using time series methods to detect causality.
Our days: Credible exogeneity conditions
Meyer, Bruce D. (1995): "Natural and Quasi-Experiments in Economics", Journal of Business & Economic
Statistics, 13, 151-161. http://www.jstor.org/stable/1392369.
Angrist, Joshua D., and Jörn-Steffen Pischke (2010): “The Credibility Revolution in Empirical Economics: How
Better Research Design is Taking the Con out of Econometrics”, Journal of Economic Perspectives, 24 (2),
3-30, doi=10.1257/jep.24.2.3.
Comparing causality in econometrics with causality in statistics (and time series
econometrics)
Holland, P.W. (1986): "Statistics and Causal Inference", Journal of the American Statistical Association, 81,
945-970, with discussion.
4
Michael Lechner, PEF, HS13
21.06.2016
Holland and Granger try to explain the differences between the concepts of statistics and time series econometrics - with
limited success.
White, H. (2006): "Time-series estimation of the effects of natural experiments", Journal of Econometrics 135,
527–566.
He extends the analyis by Holland (1986) addresses the topic of estimating the effects of single interventions with time
series data. He calls these interventions natural experiments and uses a technically highly sophisticated framework …
Lechner, M. (2011): "The Relation of Different Concepts of Causality in Econometrics", Econometric Reviews,
30, 109-127.
This paper extends Holland (1986) and White (2006) as it uses a general framework allowing for dynamic interventions and
dynamic data. It shows that Granger causality and Potential outcome causality coincide if a dynamic version of the selection
of observables is added to the Granger definition of causality.
Angrist, Joshua D., and Guido M. Kuersteiner (2011): "Causal Effects of Monetary Shocks: Semiparametric
Conditional Independence Tests With A Multinomial Propensity Score," The Review of Economics and
Statistics, 93, 725–747.
Application to the analysis of monetary policy (and a lot of methodological developments in a time series context).
Causality and correlation: How things can go wrong …
Cohen-Cole, E., and J.M. Fletcher (2009): "Detecting implausible social network effects in acne, height, and
headaches: longitudinal analysis", BMJ, 2009, DOI: 10.1136/bmj.a2533.
They show that a previous study that detected that happiness is contagious would also show acne, height and headaches
are contagious. Since this cannot be true, the previous study may not have discovered a correlation. See also the funny
discussion in the freakonomics blog (in the New York Times, Dec. 8, 2008) by Justin Wolfers. This blog contains many
discussions whether specific papers may or may not have found causal relationships.
The value of the data may be smaller than expected: Bounds
Balke, A., and J. Pearl (1997): "Bounds on Treatment Effects from Studies with Imperfect Compliance", Journal
of the American Statistical Association, 92, 1171-1176.
Black, D., M. Lechner, and J. Smith (2008): "The Use and Abuse of Matching in Program Evaluation", mimeo,
Section 2 (bounds).
Summarizes the literature and contains many references and other fields of applying the bounding approach.
Manski, C. F. (1989): "The Anatomy of the Selection Problem", Journal of Human Resources, 24, 343-360.
One of the three early and probably independent papers discovering the bounding approach.
Manski, C. F. (1990): "Nonparametric Bounds on Treatment Effects", American Economic Review, Papers and
Proceedings, 80, 319-323.
Robins, J. M. (1989): "The Analysis of Randomized and Nonrandomized AIDS Treatment Trials Using a New
Approach to Causal Inference in Longitudinal Studies", Sechrest, L., H. Freeman, A. Mulley (eds.), Health
Service Research Methodology: A Focus on Aids, 113-159, Washington, D.C.: Public Health Service,
National Center for Health Services Research.
One of the three early and probably independent papers discovering the bounding approach.
Smith, J. P. and F. Welch (1986): "Closing the Gap: Forty Years of Economic Progress for Blacks," Prepared for
the U.S. Department of Labor, RAND Corporation, Santa Monica, CA, #R-3330-DOL.
One of the three early and probably independent papers discovering the bounding approach.
Hurwicz, L. (1950): “Generalization of the Concept of Identification,” in Statistical Inference in Dynamic
Economic Models, Cowles Commission Monograph, Vol. 10. New York: Wiley.
Contains the standard definition of identification.
Social experiments
Survey
List, J.A., and I. Rasul (2011): "Field Experiments in Labour Economics", in Handbook of Labor Economics, Volume 4a
Ch. 2, 103, DOI 10.1016/S0169-7218(11)00408-4.
5
Michael Lechner, PEF, HS13
21.06.2016
An excellent and very comprehensive survey on all issues concerning social experiments with examples taken from labour
economics.
Newspaper article
Brynjolfsson, E. And M. Schrage (2009): "The New, Faster Face of Innovation, Thanks to technology, change
has never been so easy—or so cheap", Wall Street Journal, August 17, 2009,
http://online.wsj.com/article/SB1000142405297020483030457
Key sentence of that article: "Gary Loveman, the chief executive who brought the experimentation mindset to 70-year-old
Harrah's, quips, "There are two ways to get fired from Harrah's: stealing from the company, or failing to include a proper
control group in your business experiment."
General
Levitt, S. D., and J. A. List (2009): "Field experiments in economics: The past, the present, and the future",
European Economic Review 53 (2009) 1–18.
A good overview of what was and what is going on in the field of (economic) experiments.
Bloom, H. S., L.L. Orr, S.H. Bell, G. Cave, F. Doolittle, W. Lin, and J. M. Bros (1997): "The Benefits and Costs
of JTPA Title II-A Programs", Journal of Human Resources, 32, 549-576.
A very comprehensive description of the large JTPA experiments – institutions, implementation, methods, results.
Card, D. and D. Sullivan (1988): "Measuring the Effect of Subsidized Training Programs on Movements in and
out of Employment", Econometrica, 56, 497-530.
Training seems to have a positive employment effect based on a large scale US social experiment.
Card, D., and P.K. Robins (1999): "Do financial incentives encourage welfare recipients to work? Evidence from
a randomized evaluation on the the self-sufficiency project", Research in Labor Economics, 17, 1-56. (NBER
paper as pdf)
An experimental evaluation of an earnings subsidy to welfare receipients in Canada. It increases labor market attachment
and reduce welfare participation. Nice paper explaining the experiment and showing how a good experimental evaluation
can be performed for a welfare experiment.
Heckman, J.J., N. Hohmann, J. Smith, and M. Khoo (2000): “Substitution and Dropout Bias in Social
Experiments: Evidence from an Influential Social Experiment." Quarterly Journal of Economics, 651-694.
Things go wrong even in experimental studies: Here, they consider problem coming from control groups that get almost a
similar treatment and from participants who drop out of the programme before it ended.
Heckman, J., J. Smith, and C. Taber (1998): “Accounting for Dropouts in Evaluation of Social Programs", The
Review of Economics and Statistics, 80-1, 1-14.
They discuss the more complicated case when participants receive only a partial treatment.
Greenberg, D., and M. Shroder (2004), The Digest of Social Experiments, 3rd edition, Washington, DC: The
Urban Institute Press.
A brief description of many, many social experiments.
Levitt, Steven D., and John A. List (2011): “Was There Really a Hawthorne Effect at the Hawthorne Plant? An
Analysis of the Original Illumination Experiments, American Economic Journal: Applied Economics, 3,
224–238, http://www.aeaweb.org/articles.php?doi=10.1257/app.3.1.224
Maybe the Hawthorne effect wasn’t a real effect after all.
Stuart, Elizabeth A., Stephen R. Cole, Catherine P. Bradshaw and Philip J. Leaf (2010): „The use of propensity
scores to assess the generalizability of results from randomized trials,” Journal of the Royal Statististical
Society, A, 174, Part 3, forthcoming.
How to obtain external validity of the experimental results when selection into the experimental groups can be considered as
a case of ‘selection-on-observables’
The pro and con's of social experiments
Burtless, G. (1995): "The Case for Randomized Field Trials in Economic and Policy Research", Journal of
Economic Perspectives, 9, 63-84.
Pro.
Heckman, J.J. and J.A. Smith (1995): "Assessing the Case for Social Experiments", Journal of Economic
Perspectives, 9, 85-110.
Con.
6
Michael Lechner, PEF, HS13
21.06.2016
Experimental design
Duflo, E., Glennerster, R. and Kremer, M. (2008): “Using Randomization in Development Economics Research:
A Toolkit.” In Handbook of Development Economics, Vol. 4, ed. T. P. Schultz, J. A. Strauss, 3895-3962,
Amsterdam: Elsevier,.
A nice survey about issues arising in designing instruments. Includes also discussions about some econometric estimation
strategies.
Hinkelmann, K., and O. Kempthorne (2008), Design and Analysis of Experiments, Volume 1: Introduction to
Experimental Design, Wiley.
Hinkelmann, K., and O. Kempthorne (2008), Design and Analysis of Experiments, Volume 2: Advanced
Experimental Design, Wiley.
Power calculation
Maxwell, S.E., K. Kelly, and J.R. Rausch (2008): "Sample Size Planning for Statistical Power and Accuracy in
Parameter Estimation", Annual Review of Pyschology, 59, 537-563, doi:
10.1146/annrev.psych.59.103006.093735.
Survey about how to avoid 'underpowered' experiments and how to compute the 'necessary' sample sizes.
Bloom, H. S. (2005): "Randomizing groups to evaluate place-based programs", in: Learning more from social
experiments, New York: Russel Sage.
Bloom, H. S. (1995): "Minimum detectable effects: A simple way to report the statistical power of experimental
designs", Evaluation Review, 19, 547-556.
General equilibrium effects (SUTVA; relevant for all empirical designs)
Miguel, Edward, and Michael Kremer (2004): "Worms: Identifying Impacts on Education and Health in the
Presence of Treatment Externalities", Econometrica, 72, 159-217.
Use treatment density to identify controls that are more influenced by the treatment than others - randomized experiment.
Coady, David P. and Rebecca L. Harris (2004): Evaluating Transfer Programmes Within A General Equilibrium
Framework, The Economic Journal, 114, 778–799.
Lise, Jeremy, Shannon Seitz, and Jeffrey Smith (2004): "Equilibrium Policy Experiments and the Evaluation of
Social Programs", NBER WP 10283.
Use theoretical equilibrium search model to get to macro effects.
Abbring, J. H., and J. J. Heckman (2007): "Econometric Evaluation Of Social Programs, Part III: Distributional
Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy
Evaluation", Handbook of Econometrics, Volume 6B, Chapter 72, Elsevier B.V., DOI: 10.1016/S15734412(07)06072-2.
Janssens, W. (2011): "Externalities In Program Evaluation: The Impact of a Women’s Empowerment Program
On Immunization," Journal of the European Economic Association, forthcoming, DOI: 10.1111/j.15424774.2011.01041.x.
Manski, Charles F. (2011): "Identification of Treatment Response With Social Interactions", Department of
Economics and Institute for Policy Research, Northwestern University.
Crépon, Bruno, Esther Duflo, Marc Gurgand, Roland Rathelot, Philippe Zamora (2013): Do Labor Market
Policies Have Displacement Effects? Evidence From A Clustered Randomized Experiment, The Quarterly
Journal of Economics, 531–580. doi:10.1093/qje/qjt001.
Use clever experimental designs to uncover general equilibrium (spillover) effects.
Aldashev, Gani, Georg Kirchsteiger and Alexander Sebald (2015): “Assignment procedure biases in randomized
policy experiments”, forthcoming in the Economic Journal, doi: 10.1111/ecoj.12321
Systematic study of resulting bias from ‘reciprocal’ subjects a game theoretic model.
7
Michael Lechner, PEF, HS13
21.06.2016
Matching (selection on observables)
Survey and paper collections
Heckman, J., R. LaLonde and J. Smith (1999): "The Economics and Econometrics of Active Labor Market Programs", in: O. Ashenfelter and D. Card (eds.), Handbook of Labour Economics, Vol. 3, 1865-2097, Amsterdam: North-Holland.
Imbens, G. W. (2004): "Nonparametric Estimation of Average Treatment Effects under Exogeneity: A Review",
The Review of Economics and Statistics, 86, 4–29.
Rubin, D.B. (2006), Matched Sampling for Causal Effects, Cambridge: CUP.
Imbens, G. W. (2014): Matching Methods in Practice: Three Examples, NBER Working Paper No. 19959.
Historical papers
Fechner, G.T. (1860): Elemente der Psychophysik, Leipzig: Breitkopf und Härtel.
Fechner in words proposes something which is very similar to a matching algorithm. Written in German. Nice read.
Heckman found that source to demonstrate that matching has not been invented by the statisticians. (hardcopy)
Wilks, S.S. (1932): "On the distribution of statistics in samples from a normal population of two variables with
matched sampling of one variable," Metron, 9, 87-126. (pdf –almost)
This is probably the first matching paper within statistics.
Statistical papers
Rubin, D.B. (1974): "Estimating Causal Effects of Treatments in Randomized and Nonrandomized Studies",
Journal of Educational Psychology, 66, 688-701. (hardcopy available)
An influential paper because it can be seen as a forceful ad for using potential outcomes explicitly when discussing and
estimating causal effects.
Rubin, D. B. (1977): "Assignment to Treatment Group on the Basis of a Covariate", Journal of Educational
Statistics, 2 (1), 1-26.
Not the oldest reference for matching methods, but probably the one that is most frequently quoted.
Dawid, A.P. (1979): "Conditional Independence in Statistical Theory", The Journal of the Royal Statistical
Society, Series B, 41, 1-31.
A clear definition of conditional independence, a concept not being really well defined before.
Rubin, D.B. (1979): "Using Multivariate Matched Sampling and Regression Adjustment to Control Bias in
Observational Studies", Journal of the American Statistical Association, 74, 318-328.
This paper advocate a combination of regression adjustment and matching that has been suggested by Cochran and Rubin
(1973).
Rosenbaum, P.R., and D.B. Rubin (1983): "The Central Role of the Propensity Score in Observational Studies
for Causal Effects", Biometrika, 70, 41-50.
This paper explains why it is sufficient to search for a comparison group that has the same distribution of the conditional
participation probability. The nice implication is that such a comparison group will nevertheless have the same distribution of
the control variables as the treatment group.
Rubin, D.B. (1991): "Practical Implications of Modes of Statistical Inference for Causal Effects and the Critical
Role of the Assignment Mechanism", Biometrics, 47, 1213-1234.
Argues that it is much more important to take of the correct specification of the assignment mechanism than to use different
methods of inference (model based, matching. Bayesian, …).
Is matching on more variables always better? … and other practical issues
Heckman, J.J. (1998): "Detecting Discrimination", Journal of Economic Perspectives, 12, 101-116.
Use some common sense example to show that matching may make things worse (if there are so-called colliders).
Imai, K., G. King, and E. A. Stuart (2008): "Misunderstandings among Experimentalists and Observationalists
about Causal Inference,'' Journal of the Royal Statistical Society, Series A (Statistics in Society), 171, 481502.
8
Michael Lechner, PEF, HS13
21.06.2016
The paper discusses a couple of issues on how to ideally and practically perform empirical studies in experimental and nonexperimental (matching) settings.
Ho, D. E., K. Imai, G. King, and E. A. Stuart. (2007): "Matching as a Nonparametric Preprocessing for Reducing
Model Dependence in Parametric Causal Inference", Political Analysis, 15, 199-236.
The paper suggests performing matching as a first step even when parametric estimation is performed for whatever reason.
The idea is that indeed efficiency may be increased, as well as the parametric results will be more robust and less model
dependent.
Halbert White and Xun Lu (2012): Causal Diagrams For Treatment Effect Estimation With Application To
Efficient Covariate Selection, The Review of Economics and Statistics, 93 (4), 1453–1459.
Considers the question whether one should condition also on variables affecting the outcome or the treatment only and
concludes that ‘A useful practical insight emerging from our examples is that efficiency is attained by conditioning as
much as possible on drivers of the outcome or their proxies and as little as possible on drivers of treatment choice
or their proxies.’
Testing the identifying assumptions
Friedlander, D., and P.K. Robins (1995): "Evaluating Program Evaluations: New Evidence on Commonly Used
Nonexperimental Methods", The American Economic Review, 85 (4), 923-937.
Rosenbaum, P.R. (1984): "From Association to Causation in Observational Studies: The Role of Tests of
Strongly Ignorable Treatment Assignment", Journal of the American Statistical Association, 79, 41-48.
Can we test the CIA? Yes, almost (if we are prepared to make other assumptions, some of which are explained in this
paper).
Heckman, J.J., and V.J. Hotz (1989): "Choosing Among Alternative Nonexperimental Methods for Estimating
the Impact of Social Programs: The Case of Manpower Training", Journal of the American Statistical
Association, 84, 862-880 (includes comments by Holland and Moffitt).
They claim that they can find a good model for treatment effects by appropriate testing (this paper invents the famous preprogramme test). A claim that is controversial.
Endogeneity of control variables
Rosenbaum, P. R. (1984): "The Consequences of Adjustment for a Concomitant Variable That Has Been Affected by the Treatment", The Journal of the Royal Statistical Society, Series A, 147, 656-666.
When the control variables are influenced by the treatment, things can go very wrong.
Frangakis, C. E., and D. B. Rubin (2002): "Principal Stratification in Causal Inference," Biometrics, 58, 21–29.
The statistical approach to endogeneity in matching studies.
Lechner, M. (2008): “A Note on Endogenous Control Variables in Evaluation Studies,” Statistics and
Probability Letters, 78, 190-195.
The econometric approach to endogeneity in matching studies.
Common support
Crump, R.K., V.J. Hotz, G.W. Imbens, and O.A. Mitnik (2009): "Dealing with limited overlap in estimation of
average treatment effects", Biometrika, 96, 1, 187–199
Since estimators are more precise when the support is 'thick', they develop an estimator that is computed only over 'thicksupport' regions.
Heckman, J., H. Ichimura, and P. Todd (1998): "Matching as an Econometric Evaluation Estimator", Review of
Economic Studies, 65, 261-294.
They use a somewhat different terminology than the statistical papers and use advanced asymptotic theory to derive the
asymptotic distribution of matching estimators.
Lechner, M. (2008): " A Note on the Common Support Problem in Applied Evaluation Studies ", Annales
d'Économie et de Statistique, 91, 219-236.
Suggest bounding effects instead of changing estimand.
9
Michael Lechner, PEF, HS13
21.06.2016
Estimators
Hahn, J. (1998): “On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average
Treatment Effects.” Econometrica 66, 315-331.
Dehejia, R.H. and S. Wahba (1999): "Causal Effects in Non-experimental Studies: Reevaluating the Evaluation
of Training Programmes", Journal of the American Statistical Association, 94, 1053-1062.
This paper made matching very popular.
Heckman, J.J., and J.A. Smith (1999): "The Pre-programme earnings dip and the Determinants of Participation
in a Social Programme. Implications for Simple Programme Evaluation Strategies", The Economic Journal,
109.
Frequently quoted when it comes to 'Ashenfelter's dip' and the variables needed to avoid a misspecification of the selection
process.
Joffe, M. M., T. R. Ten Have, H. I. Feldman, and S. Kimmel (2004): "Model Selection, Confounder Control, and
Marginal Structural Models", The American Statistician, November 2004, 58-4, 272-279.
Very easy to read paper advocating the combination of parametric and nonparametric estimators (weighting plus
regression) to get the robust properties of the nonparametric methods as well as the additional identification coming from
parametric methods. Uses terminology from statistics and epidemiology (part of the literature on so called double
robustness: estimator is consistent if either of two assumptions is correct).
Hahn, J. (1998): “On the Role of the Propensity Score in Efficient Semiparametric Estimation of Average
Treatment Effects.” Econometrica 66, 315-331.
Abadie, A., and G.W. Imbens (2006): "Large Sample Properties of Matching Estimators for Average Treatment
Effects,” Econometrica, 74(1), 235-267, 2006
They show that a matching estimator with a fixed number of control observations (with increasing sample size) is
asymptotically biased and how that bias can be removed (regression plus matching!). They also present a new estimator for
the variance. Not concerned with propensity score matching.
Frölich, M (2004): "Finite-sample properties of propensity score matching and weighting estimators”, Review of
Economics and Statistics, 86, 77-90.
Uses a Monte Carlo study to compare different nonparametric matching estimators.
DiNardo, J., N. M. Fortin, and T. Lemieux (1996): "Labor Market Institutions and the Distribution of Wages,
1973-1992: A Semiparametric Approach," Econometrica, 64, 1001-1044
The classical paper on inverse probability-of-selection weighting in econometrics.
Hirano, K., G. W. Imbens, and G. Ridder (2003): "Efficient Estimation of Average Treatment Effects Using the
Estimated Propensity Score", Econometrica, 2003, 1161-1189.
They show that a weighting estimator based on the weighting by an estimated propensity score achieves the asymptotic
efficiency bound for ATE and ATET and related estimands. The estimator based on an estimated p-score can be more
efficient than the estimator based on the true one. Results apply for p-scores nonparametrically estimated by sieve
estimators.
Graham, B. S., C. C. de Xavier Pinto, and D. Egel (2010): "Inverse probability tilting for moment condition
models with missing data," mimeo.
They introduce a clever version of inverse probability weighting by estimating the p-score differently. There method should
do better than IPW with a propensity score estimated by maximum likelihood.
Abadie, A., and G.W. Imbens (2008): "On the Failure of the Bootstrap for Matching Estimators," Econometrica,
76, 1537-1557.
Bootstrap is not a valid way to obtain inference for simple matching estimators.
Abadie, A., and G.W. Imbens (2009): "Matching on the estimated propensity score", NBER Working Paper
15301.
Inference for matching estimators using a estimated propensity score based on a parametric model. This is most relevant
case in practice. Surprisingly, before this article there was no thorough analysis of the asymptotic distribution of this
estimator.
Heckman, J.J., H. Ichimura, and P. Todd (1998): "Matching as an Econometric Evaluation Estimator", Review of
Economic Studies, 65, 261-294.
They use a somewhat different terminology than the statistical papers and use advanced asymptotic theory to derive the
asymptotic distribution of matching estimators.
Black, D., M. Lechner, and J. Smith (2008): "The Use and Abuse of Matching in Program Evaluation", mimeo,
Section 4.2.1 (efficient estimation).
10
Michael Lechner, PEF, HS13
21.06.2016
Lechner, M. (2010): "A Note on the Relation of Weighting and Matching Estimators", forthcoming in
Communications in Statistics: Theory and Methods.
This paper shows how matching and reweighting estimators are directly related.
Abadie, A., and G. W. Imbens (2011): “Bias-Corrected Matching Estimators for Average Treatment Effects”,
American Statistical Association, Journal of Business & Economic Statistics, 29 (1), DOI:
10.1198/jbes.2009.07333.
This paper shows how the correction can be implemented.
Huber, M., M. Lechner, and C. Wunsch (2013): “The performance of estimators based on the propensity score”,
Journal of Econometrics, 175 (1), 1-21, doi: 10.1016/j.jeconom.2012.11.006.
Compares many matching estimators in a realistic simulation scenario and finds that radius matching with bias adjustment
performs best.
Busso, Matias, John DiNardo, and Justin McCrary (2014): New Evidence On The Finite Sample Properties Of
Propensity Score Reweighting And Matching Estimators, The Review of Economics and Statistics, December
2014, 96(5): 885–897
Based on a large scale Monte Carlo study they conclude that inverse probability reweighting is competitive with the most
effective matching estimators when overlap is good, but that matching may be more effective when overlap is sufficiently
poor.
Nonparametric estimation
Racine, Jeffrey S. (2008): "Nonparametric Econometrics: A Primer," Foundations and Trends in Econometrics,
3 (1), 1–88, DOI: 10.1561/0800000009.
Ichimura, H., P. E. Todd (2007): "Implementing Nonparametric and Semiparametric Estimators," Handbook of
Econometrics, Volume 6B, Chapter 74, Elsevier B.V. DOI: 10.1016/S1573-4412(07)06074-6.
Multiple treatments and continuous treatment variables
Imbens, G.W. (2000): "The Role of the Propensity Score in Estimating Dose-Response Functions", Biometrika,
87, 706-710.
One of the two papers developed in parallel to discuss identification in the case of more than 2 discrete treatments and may
or may not be ordered.
Lechner, M. (2001): "Identification and Estimation of Causal Effects of Multiple Treatments under the Conditional Independence Assumption", in: M. Lechner and F. Pfeiffer (eds.), Econometric Evaluation of Active
Labour Market Policies, 43-58, Heidelberg: Physica.
One of the two papers developed in parallel to discuss identification in the case of more than 2 discrete treatments and may
or may not be ordered.
Lechner, M. (2002): "Programme Heterogeneity and Propensity Score Matching: An Application to the Evaluation of Active Labour Market Policies", Review of Economics and Statistics, 84, 205-220.
An application of the methods suggested by Lechner (2001) to the evaluation of Swiss labour market programmes.
Black, D., M. Lechner, and J. Smith (2008): "The Use and Abuse of Matching in Program Evaluation", mimeo,
Section 5 (multiple treatments).
Black, D., M. Lechner, and J. Smith (2008): "The Use and Abuse of Matching in Program Evaluation", mimeo,
Section 6 (continuous treatment variables).
Kluve, Jochen, Hilmar Schneider, Arne Uhlendorff and Zhong Zhao (2012): Evaluating continuous training
programmes by using the generalized propensity score, forthcoming in Journal of the Royal Statististical
Society, Series A, 175, Part 2.
Hirano, K., and G. Imbens (2004): "The Propensity Score with Continuous Treatments,", in Applied Bayesian
Modeling and Causal Inference from Incomplete-Data Perspectives, ed. A. Gelman and X.-L. Meng, New
York: Wiley.
Becker, Sascha O., Peter H. Egger, Maximilian von Ehrlich (2012): Too much of a good thing? On the growth
effects of the EU’s regional policy, European Economic Review 56 (2012) 648–668,
http://dx.doi.org/10.1016/j.euroecorev.2012.03.001.
An application of the continuous treatment model.
Egger, Peter H., Maximilian von Ehrlich (2013): Generalized propensity scores for multiple continuous
treatment variables, Economics Letters, 199, 32-34.
Multiple and continuous treatments.
11
Michael Lechner, PEF, HS13
21.06.2016
Galvao, Antonio F., and Liang Wang (2014): Uniformly Semiparametric Efficient Estimation of Treatment
Effects with a Continuous Treatment, Journal of the American Statistical Association, DOI:
10.1080/01621459.2014.978005.
A rigorous treatment of the continuous treatment case.
Covariate selection and efficiency
Hahn, J. (2004): Functional restriction and efficiency in causal inference, Review of Economic Studies, 86, 7376.
White, H. and X. Lu (2012): Causal Diagrams For Treatment Effect Estimation With Application To Efficient
Covariate Selection, The Review of Economics and Statistics, 93 (4), 1453–1459.
Both papers emphasise that from the point of efficiency, variables influences only the outcome increase precision, while
variables influencing only the treatment reduce precision.
Lu, Xun (2014): A Covariate Selection Criterion for Estimation of Treatment Effects, Journal of Business &
Economic Statistics, DOI: 10.1080/07350015.2014.982755.
He essentially states that we do not reliably know whether a variable influences only treatment or outcome. Therefore, we
should use all (asymptotically) valid combinations of covariates and average the estimated effects from the different
covariate sets.
Timing of events or endogenous start dates
Using duration models in evaluation studies
Abbring, J. H., and G. van den Berg (2003): "The Nonparametric Identification of Treatment Effects in Duration
Models", Econometrica, 71, 1491-1517.
They show how specific assumptions on the timing of training can be used to identify the training effect. They identify the
hazard rate without treatment globally from the nonparticipants and participants prior to treatment and project it to the time
after treatment (any difference between pre-treatment hazard rates between treated and controls conditional on X are thus
due to time constant unobservables). Therefore the post treatment differences between the hazards of treated and controls
conditional on X must can be separated into a selection effect (estimated prior to treatment) and a treatment effect.
The problem of different start dates of treatments
Li, Y., K.J. Propert, and P. Rosenbaum (2001): "Balanced Risk Set Matching", Journal of the American
Statistical Association, 96, 870-882.
Match persons treated in time tm with persons at risk of being treated at tm. Do that for all possible starting dates. Estimate
effect of 'delay'. There is also some strange matching algorithm in form of integer programming.
Sianesi, B. (2004): "An evaluation of the Swedish system of active labor market programs in the 1990s", Review
of Economics and Statistics.
Applies an idea very similar to Li et al (2001) to the Swedish labour market data.
Lechner, M. (1999): "Earnings and Employment Effects of Continuous Off-the-Job Training in East Germany
after Unification," Journal of Business and Economic Statistics, 17, 74-90.
Three different ways to tackle the problem of different start dates within a matching approach.
Fredriksson, P., and P. Johansson (2008): "Dynamic Treatment Assignment: The Consequences for Evaluations
Using Observational Data," Journal of Business & Economic Statistics, October 2008, Vol. 26, No. 4, DOI
10.1198/073500108000000033.
They show that the approach by Lechner (1999) can be problematic at times and suggest a different method.
Crépon, B., M. Ferracci, G. Jolivet, and G.J. van den Berg (2008), Active Labor Market Policy Effects in a
Dynamic Setting, IZA DP No. 3848.
Dynamic treatments
Robins, J.M. (1986). "A new approach to causal inference in mortality studies with sustained exposure periods Application to control of the healthy worker survivor effect." Mathematical Modelling, 7:1393-1512, with
1987 Errata to "A new approach to causal inference in mortality studies with sustained exposure periods Application to control of the healthy worker survivor effect.'' Computers and Mathematics with Applications,
14:917-921; 1987 Addendum to "A new approach to causal inference in mortality studies with sustained
12
Michael Lechner, PEF, HS13
21.06.2016
exposure periods - Application to control of the healthy worker survivor effect." Computers and Mathematics
with Applications, 14:923-945; and 1987 Errata to "Addendum to 'A new approach to causal inference in
mortality studies with sustained exposure periods - Application to control of the healthy worker survivor
effect'." Computers and Mathematics with Applications, 18:477.
This seems to be the first paper that systematically considers dynamic treatment. Unfortunately, this paper with an
background from epidemiology is not very accessible for researchers trained in econometrics.
Lechner, M., and R. Miquel (2010): "Identification of the Effects of Dynamic Treatments by Sequential
Conditional Independence Assumptions", Empirical Economics, DOI: 10.1007/s00181-009-0297-3.
Paper develops a way to tackle the problem of the identification of dynamic treatments with dynamic selection problems in a
framework that is more akin to econometrics in statistics.
Lechner, M. (2009): "Sequential Causal Models for the Evaluation of Labor Market Programs", Journal of
Business & Economic Statistics, 27, 71-83, DOI 10.1198/jbes.2009.0006.
The estimators for the Lechner, Miquel model.
Lechner, M. (2008): "Matching estimation of dynamic treatment models: Some practical issues," in D. Millimet,
J. Smith, and E. Vytlacil (eds.), Advances in Econometrics, Volume 21, Modelling and Evaluating Treatment
Effects in Econometrics, 289-333.
Illustrates the potential as well as some of the problems of the dynamic treatment approach.
Black, D., M. Lechner, and J. Smith (2008): "The Use and Abuse of Matching in Program Evaluation", mimeo,
Section 7 (dynamic treatments).
Lechner, M., and S. Wiehler (2011): "Does the order and timing of active labor market programs matter?,"
forthcoming in the Oxford Bulletin of Economics and Statistics.
Abbring, J. H., and J. J. Heckman (2007): "Econometric Evaluation Of Social Programs, Part III: Distributional
Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy
Evaluation", Handbook of Econometrics, Volume 6B, Chapter 72, Elsevier B.V., DOI: 10.1016/S15734412(07)06072-2.
Practical implementation of (static) matching estimators (and sensitivity
analysis)
Bhattacharya, J. and W.B. Vogt (2007): "Do Instrumental Variables Belong in Propensity Scores?" NBER
Technical Working Paper No. 343
Investiges the effect of including an instrument in matching when the CIA does not hold. They find that the bias is larger the
stronger the instrument.
Caliendo, M., and S. Kopeinig (2008): Some Practical Guidance for the Implementation of Propensity Score
Matching, Journal of Economic Surveys, 22 (1), 31–72
Very nice and accurate and fairly complete summary of the matching literature as it concerns applications.
Lechner, Michael (2002): "Some practical issues in the evaluation of heterogeneous labour market programmes
by matching methods", Journal of the Royal Statistical Society, Series A, 165, 59-82.
Discuss a couple of issues are important when applying matching methods in practice.
Robustness to misspecification of the propensity score
Drake, Christiana (1993): "Effects of Misspecification of the Propensity Score on Estimators of Treatment
Effect", Biometrics, 49, 4, 1231-2336, DOI: 10.1002/pam.20262.
As long as no confounder is omitted, the effects estimates seem not to be sensitive to a misspecified propensity score.
Zhao, Z. (2008): "Sensitivity of Propensity Score Methods to the Specifications", Economics Letters, 98, 309319.
Monte Carlo results including theoretical motivations of the findings – suggest robustness to misspecification of the score.
Shaikh, A. M., M. Simonsen, E. J. Vytlacil, and N. Yildiz (2009): "A specification test for the propensity score
using its distribution conditional on participation", Journal of Econometrics, 151, 33-46.
Using the balancing score property to construct a test for the misspecification of the propensity score.
Sensitivity analysis
Altonji, J.G., T.E. Elder, and C.R. Taber, (2005): Selection on Observed and Unobserved Variables: Assessing
the Effectiveness of Catholic Schools, Journal of Political Economy, 152.
Idea: Selection on unobservables may be as large as selection on observables.
13
Michael Lechner, PEF, HS13
21.06.2016
De Luna, X., and P. Johansson (2014): Testing for the Unconfoundedness Assumption Using an Instrumental
Assumption, Journal of Causal Inference, 2(2), 187–199
Idea: If CIA is valid, an instrument has no effect (in reduced form) conditional on treatment Estimate the effect of
instrument in subsample of treated and/or controls and test for a zero effect.
Imbens, G. W. (2003): "Sensitivity to Exogeneity Assumptions in Program Evaluation", American Economic
Review, Papers &Proceedings, 126-132.
Specifies how an observed factor is correlated with the selection process to derive bounds on treatment effects for all
plausible values of these correlations.
Ichino, A, F. Mealli, and T. Nannicini (2008): From Temporary Help Jobs To Permanent Employment: What
Can We Learn From Matching Estimators And Their Sensitivity? Journal of Applied Econometrics, 23: 305–
327.
They simulate an artificial confounder and analyse its potential impact on the bias of the estimates. Their procedure is
essentially non-parametric, in contrast to the other papers.
Rosenbaum and Rubin (1983): Assessing sensitivity to an unobserved confounder in an observational study with
a binary outcome, Journal of the Royal Statistical Society, Series B, 45, 212-218.
Rosenbaum, Pau R. (2002), Observational studies, New York: Springer.
Confounders measured with error
Battistin, Erich, Andrew Chesher (2014): Treatment effect estimation with covariate measurement error, Journal
of Econometrics, 178, 707–715.
They provide approximations for (small) measurement errors that could also be used for a sensitivity
analysis. Contrary to the linear model, classical measurement error does not always lead to an attenuation
bias (and it is a priori very difficult to the sign the bias).
Uncovering effect heterogeneity
Abrevaya, Jason, Yu-Chin Hsu & Robert P. Lieli (2014): Estimating Conditional Average Treatment Effects,
Journal of Business & Economic Statistics, doi: 10.1080/07350015.2014.981980.
Systematic approach to uncover effect heterogeneity w.r.t. continuous covariates.
Only one treated observations
Synthetic cohort method
This method is for the case of one (or very few) treated, for which standard inference cannot be used. If all
confounders are observed, one can construct an artificial control observation as a weighted mean of several
possible control observations, such that the mean of the confounders of the artificial control resembles the
values of the confounders of the treated. Randomization ‘inference’ is available to assess whether the ‘effect’
is too large (compared to effects that would occur with many placebo experiments, in which each control
takes the role of a ‘pseudo-treated). Examples of its application are:
Abadie, Alberto, and Javier Gardeazabal (2003): “The Economic Costs of Conflict: A Case Study of the Basque
Country.”, American Economic Review 93(1): 112–32.
Abadie, Alberto, Alexis Diamond and Jens Hainmueller (2010) Synthetic Control Methods for Comparative
Case Studies: Estimating the Effect of California’s Tobacco Control Program, Journal of the American
Statistical Association, 105:490, 493-505, DOI: 10.1198/jasa.2009.ap08746 .
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller (2014): Comparative Politics and the Synthetic Control
Method, American Journal of Political Science. DOI: 10.2139/ssrn.1950298, forthcoming.
Applications (some examples)
Labour
Frölich, M. (2007): "Propensity score matching without conditional independence assumption—with an
application to the gender wage gap in the United Kingdom", Econometrics Journal, 10, 359–407 .
Propensity score matching is frequently used for estimating average treatment effects. Its applicability, however, is not
confined to treatment evaluation. In this paper, it is shown that propensity score matching does not hinge on a selection on
14
Michael Lechner, PEF, HS13
21.06.2016
observables assumption and can be used to estimate not only adjusted means but also their distributions, even with noni.i.d. sampling. Propensity score matching is used to analyze the gender wage gap among graduates in the UK. It is found
that subject of degree contributes substantially to explaining the gender wage gap, particularly at higher quantiles of the
wage distribution.
Lechner, M. (2001): "The Empirical Analysis of East German Fertility after Unification: An Update", European
Journal of Population, 17, 61-74.
An application of matching to analyse difference in the fertility rates of East and West Germany after unification.
Lechner, M. (2009): "Long-run labour market and health effects of individual sports activities", The Journal of
Health Economics, 28, 839–854, doi: 10.1016/j.jhealeco.2009.05.003.
Does your individual sports activity increase your earnings in the long run?
Lechner, M., and C. Wunsch (2009): Are Training Programmes More Effective When Unemployment is High?,
Journal of Labor Economics, 27, 653-692, doi: 10.1086/644976.
Using matching methods to analyse the dependence of the effectiveness of active labour market programmes on the
business cycles.
Sianesi, B. (2004): "An Evaluation of the Swedish System of Active Labor Market Programs in the 1990s", The
Review of Economics and Statistics, 86, 133–155.
An evaluation of the Swedish active labour market policy.
Macro economics
Angrist, Joshua D., and Guido M. Kuersteiner (2011): "Causal Effects Of Monetary Shocks: Semiparametric
Conditional Independence Tests With A Multinomial Propensity Score," The Review of Economics and
Statistics, 93, 725–747.
We develop semiparametric tests for conditional independence in time series models of causal effects. Our approach is
motivated by empirical studies of monetary policy effects and is semiparametric in the sense that we model the process
determining the distribution of treatment—the policy propensity score—but leave the model for outcomes unspecified. A
conceptual innovation is that we adapt the cross-sectional potential outcomes framework to a time series setting.We also
develop root-T consistent distribution-free inference methods for full conditional independence testing, appropriate for
dependent data and allowing for first-step estimation of the (multinomial) propensity score.
Esaka, Taro (2011): "Do hard pegs avoid currency crises? An evaluation using matching estimators," Economics
Letters 113, 35–38.
Using the bias-corrected matching estimators of Abadie and Imbens (2006) as a control for the selfselection problem of
regime adoption, we estimate the average treatment effect of hard pegs on the occurrence of currency crises. We find the
evidence that hard pegs significantly decrease the likelihood of currency crises compared with other regimes.
Persson, T. (2001): “Currency Unions and Trade: How Large is the Treatment Effect?” Economic Policy, Oct.,
434-448.
An application of matching methods to analyse the effects of currency unions on trade.
Public Health
Galiani, Sebastian, Paul Gertler, and Ernesto Schargrodsky, “Water for Life: The Impact of the Privatization of
Water Services on Child Mortality,” Journal of Political Economy 113:1 (2005), 83–120.
Statistics
Rubin, Donald B (2006): Matched Sampling for Causal Effects (Cambridge: Cambridge University Press).
Rosenbaum, Paul R, Observational Studies, 2nd ed. (New York: Springer-Verlag, 2002).
Medicine
Christakis, Nicholas A., and Theodore I. Iwashyna (2003): “The Health Impact of Health Care on Families: A
Matched Cohort Study of Hospice Use by Decedents and Mortality Outcomes in Surviving, Widowed
Spouses,” Social Science and Medicine 57:3 (2003), 465–475.
Rubin, Donald B., “Estimating Causal Effects from Large Data Sets Using Propensity Scores,” Annals of
Internal Medicine 127:8S (1997), 757–763.
Vaughan, Adam S, Colleen F Kelley, Nicole Luisi, Carlos del Rio, Patrick S Sullivan and Eli S Rosenberg
(2015): An application of propensity score weighting to quantify the causal effect of rectal sexually
15
Michael Lechner, PEF, HS13
21.06.2016
transmitted infections on incident HIV among men who have sex with men, BMC Medical Research
Methodology, 15-25, DOI 10.1186/s12874-015-0017-y.
Political science
Gordon, Sandy, and Greg Huber, “The Effect of Electoral Competitiveness on Incumbent Behavior,” Quarterly
Journal of Political Science 2:2 (2007), 107–138.
Herron, Michael C., and Jonathan Wand (2007): “Assessing Partisan Bias in Voting Technology: The Case of
the 2004 New Hampshire Recount,” Electoral Studies 26:2, 247–261.
Imai, Kosuke (2005): “Do Get-Out-the-Vote Calls Reduce Turnout? The Importance of Statistical Methods for
Field Experiments,” American Political Science Review 99:2, 283–300.
Sekhon, Jasjeet S. (2004): “Quality Meets Quantity: Case Studies, Conditional Probability and Counterfactuals,”
Perspectives on Politics 2:2, 281–293.
Sociology
Diprete, Thomas A., and Henriette Engelhardt (2004): “Estimating Causal Effects with Matching Methods in the
Presence and Absence of Bias Cancellation,” Sociological Methods and Research 32:4, 501–528.
Morgan, Stephen L., and David J. Harding (2006): “Matching Estimators of Causal Effects: Prospects and
Pitfalls in Theory and Practice,” Sociological Methods and Research 35:1, 3–60.
Winship, Christopher, and Stephen Morgan (1999): “The Estimation of Causal Effects from Observational
Data,” Annual Review of Sociology 25, 659–707.
Law
Epstein, Lee, Daniel E. Ho, Gary King, and Jeffrey A. Segal, “The Supreme Court during Crisis: How War
Affects Only Non-War Cases,” New York University Law Review 80:1 (2005), 1–116.
Rubin, Donald B., “Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco
Litigation,” Health Services and Outcomes Research Methodology 2:1 (2001), 169–188.
Instrumental variable estimation
Imbens, Guido W. (2014): Instrumental Variables: An Econometrician's Perspective, NBER Working Paper No.
19983.
Very nice survey about the developments over the last 90 years.
New IV
Heckman, J.J. (1997): "Instrumental Variables", Journal of Human Resources, 32, 441-462.
The Nobel laureate explains why he is very sceptical about instrumental variable estimation.
Imbens, G.W. and J.D. Angrist (1994): "Identification and Estimation of Local Average Treatment Effects",
Econometrica, 62, 446-475.
The seminal (technical) paper on IV with heterogeneous effects.
Imbens, G.W., and D.B. Rubin (1997): "Estimating Outcome Distributions for Compliers in Instrumental
Variable Models", The Review of Economic Studies, 64, 555-574.
The show that not only complier mean effects, but also the distributions of the outcome variables are identified for the
compliers.
Angrist, Joshua D., and Guido W. Imbens (1995): Two-Stage Least Squares Estimation of Average Causal
Effects in Models With Variable Treatment Intensity, Journal of the American Statistical Association, Vol.
90, No. 430, 431-442.
Paper shows what 2SLS estimates when the instrument is binary, treatments are multivalued, and treatment effects are
heterogeneous.
Frölich, M. (2007): "Nonparametric IV estimation of local average treatment effects with covariates", Journal of
Econometrics, 139, 35-75.
Extends LATE framework to instruments that are only valid after only conditioning on covariates.
16
Michael Lechner, PEF, HS13
21.06.2016
Hong, H., and D. Nekipelov (2010): "Semiparametric efficiency in nonlinear LATE models", Quantitative
Economics 1, 279–304, DOI: 10.3982/QE43
More general analysis of semiparametric efficiency bounds than in Frölich (2007), incl. quantile effects etc.
Kitagawa, T. (2008): "A Bootstrap Test for the Instrument Validity in Heterogeneous Treatment Effect Model",
mimeo.
Paper uses the fact that IV identifies the cdf of the potential outcomes for the compliers. However, if the instruments are not
valid identification fails and the estimated cdf may be for example negative. This insight is used to form a test statistic for
instrument validity.
Manning, A. (2004): "Instrumental Variables for Binary Treatments with Heterogeneous Treatment Effects: A
Simple Exposition", Contributions to Economic Analysis & Policy (one of the BE Journal in Economic
Analysis & Policy), 2004-3, Article 9.
Nice and easy paper showing that 2SLS estimates an instrument dependent causal mean effect because there is a nonlinearity in the relation between the Expectation of the treatment conditional on the instrument and the outcome. Furthermore, if there is information on the never-takers, then ATET can be nonparametrically identified by IV; if there is information
on never-takers and always-taker then ATE is also identified. Very nice introduction connecting 'new IV' to 'old IV.
Newspaper
The Economist (2009): "Cause and defect, Instrumental variables help to isolate causal relationships. But they
can be taken too far," From The Economist print edition Aug 13th 2009.
Control functions
Heckman, J. J. (1979): "Sample Selection Bias as a Specification Error", Econometrica, 47, 153-161.
The classical paper on control function.
Vytlacil, E. (2002): „Independence, monotonicity and latent variable models: an equivalence result“,
Econometrica, 70, 331-341.
Controls function estimators are IV estimators in a nonparametric sense.
Aakvik, A., J. J. Heckman, and E. J. Vytlacil (2005): "Estimating treatment effects for discrete outcomes when
responses to treatment vary: an application to Norwegian vocational rehabilitation programs", Journal of
Econometrics, 125, 15-51.
They develop selection models (on unobservables) for discrete outcomes and discrete treatments and define the usual
treatment parameter (plus marginal treatment effects and distributional treatment effects) based on these models; models
are based on exclusion restrictions and factor structure for joint dependence in error terms; jointly normal errors.
IV and control functions for continuous instruments: marginal treatment
effects
Heckman, J.J, and E. Vytlacil (2005): "Causal Parameters, Structural Equations, Treatment Effects and
Randomized Evaluation of Social Programs", Econometrica.
This paper uses the marginal treatment effect (MTE) to unify the nonparametric literature on treatment effects with the
econometric literature on structural estimation using a nonparametric analogue of a policy invariant parameter; to generate
a variety of treatment effects from a common semiparametric functional form; to organize the literature on alternative
estimators; and to explore what policy questions commonly used estimators in the treatment effect literature answer.
Heckman, J. J. , E. J. Vytlacil (2007): "Econometric Evaluation Of Social Programs, Part II: Using the Marginal
Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Programs, and to
Forecast Their Effects in New Environments," Handbook of Econometrics, Volume 6B, Chapter 71, Elsevier
B.V., DOI: 10.1016/S1573-4412(07)06071-0.
All about the marginal treatment effect. Similar to the HV (2005) Econometrica paper, but more readable and more
comprehensive.
Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): Estimating Marginal Returns to Education, American
Economic Review ,101, 2754–2781, http://www.aeaweb.org/articles.php?doi=10.1257/aer.101.6.2754-2754.
Using the Policy-relevant-treatment effect (PRTE) framework to define policy relevant parameter and estimate it in a semiparametric framework (methods used in application more flexible than in Aakvik, Heckman, Vytlacil, 2006).
Heckman, J. J. (2010): "Building Bridges Between Structural and Program Evaluation Approaches to Evaluating
Policy", NBER working paper 16110.
This is a more readable version (and with some extensions) of the concepts developed in Heckman and Vytlacil (2005).
17
Michael Lechner, PEF, HS13
21.06.2016
Zumarro, G. (2005): "Accounting for Heterogeneous Returns in Sequential Schooling Decisions", mimeo.
One of the very applications of the marginal treatment effects approach.
Carneiro, P., and S. Lee (2009): "Estimating distributions of potential outcomes using local instrumental
variables with an application to changes in college enrollment and wage inequality," Journal of
Econometrics, 149, 191-208.
So we get the distributions as well with LIV and MTE's.
Quantile treatment effects
Abadie, A., J. Angrist, and G.W. Imbens (2002): "Instrumental Variable Estimates of the Effects of Subsidized
Training on the Quantiles of Trainee Earnings", Econometrica, 70, 91-117.
The paper shows how to use IV to identify and estimates treatment effects on quantiles (instead of the mean) of marginal
distributions.
Abbring, J. H., and J. J. Heckman (2007): "Econometric Evaluation Of Social Programs, Part III: Distributional
Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choice, and General Equilibrium Policy
Evaluation," Handbook of Econometrics, Volume 6B, Chapter 72, Elsevier B.V., DOI: 10.1016/S15734412(07)06072-2.
Weak instruments
Bound, J., Jaeger, D.A., and R.B. Baker (1995): "Problems With Instrumental Variable Estimation When the
Correlation Between the Instruments and the Endogenous Explanatory Variable is Weak", Journal of the
American Statistical Association, 90, 443-450.
Even if the instrument is valid, it may not be helpful if it is has not enough power to shift the treatment (endogenous)
variable enough.
Continuous treatments and discrete treatments with many values
Florens, J. P., J. J. Heckman, C. Meghir, and E. Vytlacil (2008): "Identification of Treatment Effects Using
Control Functions in Models With Continuous, Endogenous Treatment and Heterogeneous Effects",
Econometrica, Vol. 76, No. 5 (September, 2008), 1191–1206.
The control function approach with continuous instruments for full identification of the treatment response function.
Kasy, Maximilian (2014): Instrumental Variables with Unrestricted Heterogeneity and Continuous Treatment,
Review of Economic Studies 81, 1614–1636.
Again, continuous instruments used for nonparametric identification.
Heckman, James J., Sergio Urzua, and Edward Vytlacil (2008): "Instrumental Variables in Models with Multiple
Outcomes: The General Unordered Case", Les Annales d’Economie et de Statistique, 91-92, pp. 151-174.
Need as many instruments (less 1) as there are categories. Exclusion restrictions become more strict.
Angrist, Joshua D., and Guido W. Imbens (1995): Two-Stage Least Squares Estimation of Average Causal
Effects in Models With Variable Treatment Intensity, Journal of the American Statistical Association, Vol.
90, No. 430, 431-442.
Without covariates, 2SLS nonparametrically consistently estimates a weighted sum of different complier effects.
Frölich, M. (2007): "Nonparametric IV estimation of local average treatment effects with covariates", Journal of
Econometrics, 139, 35-75.
The case of Angrist and Imbens (1995) with covariates.
Swanson, S. A., J. M. Robins, M. Miller, and M. A. Hernán (2015): Selecting on Treatment: A Pervasive Form
of Bias in Instrumental Variable Analyses, American Journal of Epidemiology, Advance Access published
January 21, 2015.
Two-sample IV
Angrist, J. D., and A. B. Krueger. (1992). “The effect of age at school entry on educational attainment: an
application of instrumental variables with moments from two samples,” Journal of the American Statistical
Association, 87, 328 – 336.
Probably the first application of 2 sample IV (GMM).
18
Michael Lechner, PEF, HS13
21.06.2016
Inoue, A., and G. Solon (2010): Two-Sample Instrumental Variables Estimators, The Review of Economics and
Statistics, 92, 557–561
They compare 2SLS and IV in the 2-sample case and find 2SLS to be more efficient. They also discuss how to compute
standard errors.
Arellano, M., and C. Meghir (1992): Female Labour Supply and On-the-Job Search: An Empirical Model
Estimated UsingComplementary Data Sets, The Review of Economic Studies, 59, 537-559.
Another earlier application of similar ideas, although not exactly in a two sample IV setting.
Ridder, G., and R. Moffitt (2007): The Econometrics of Data Combination, in J.J. Heckman, and E. Leamer
(eds.), Handbook of Econometrics, vol. 6B, ch. 75, 5469-5547. DOI: 10.1016/S1573-4412(07)06075-8 .
Overview of data combination methods.
Angrist, J. D., and A. B. Krueger (1995): Split-Sample Instrumental Variables Estimates of the Return to
Schooling , Journal of Business & Economic Statistics, 13, 225-235.
Artificially split sample in two independent parts to improve small sample performance of the IV estimator.
Some empirical studies based on interesting instruments
Angrist, J. D., and W. N. Evans (1998): "Children and Their Parents Labor Supply: Evidence from Exogenous
Variation in Family Size", The American Economic Review, 88, 450-477.
An influential IV study.
Angrist, J.D. (1990): "Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security
Administrative Records", American Economic Review, 80, 313-335.
Another influential IV study.
Angrist, J.D., and A.B. Krueger (1991): "Does Compulsory School Attendance Affect Schooling and Earnings",
The Quarterly Journal of Economics, 106, 979-1014.
Yet another influential IV study.
Frölich, M., and M. Lechner (2010): "Exploiting Regional Treatment Intensity for the Evaluation of Labour
Market Policies," Journal of the American Statistical Association, 105, 1014-1029, doi:
10.1198/jasa.2010.ap08148 .
Another evaluation of Swiss active labour market policy based on a somewhat different use of instruments.
Regression discontinuity designs (RDD)
Surveys
Imbens, G. W. and T. Lemieux (2008): "Regression discontinuity designs: A guide to practice", Journal of
Econometrics 142, 615–635
A sophisticated users guide in how to draw causal inference from a RDD design.
Lee, D. S., and T. Lemieux (2009): “Regression Discontinuity Designs in Economics,” NBER Working Paper
14723
An well-done users guide in how to draw causal inference from a RDD design.
Van der Klaauw, W. (2008): "Regression-Discontinuity Analysis: A Survey of Recent Developments in
Economics," Labour, 22, 219-245.
A nice survey about RDD from one of the experts in the field.
History and early important papers
Cook, T.D. (2008): ‘‘Waiting for Life to Arrive’’: A history of the regression-discontinuity design in
Psychology, Statistics and Economics,” Journal of Econometrics 142, 636–654.
A historical perspective on the origins and developments of RDD.
Thristlewaite, D., and D. Campbell (1960): "Regression-discontinuity Analysis: An alternative to the ex post
facto experiment", Journal of Educational Psychology, 51, 309-317.
Probably the classical paper on the regression discontinuity design.
Campbell, D. and J. Stanley (1963): „Experimental and quasi-experimental designs for research on teaching“, in:
N. Gage, ed., Handbook of research on teaching (Rand McNally, Chicago), 171-246.
Another classical paper on the regression discontinuity design.
19
Michael Lechner, PEF, HS13
21.06.2016
Trochim, W. (1984), Research Design for Program Evaluation: the Regression Discontinuity Approach, Beverly
Hills: Sage Publications.
A classical book on the regression discontinuity design.
Selected important papers
Hahn, J., P. Todd, and W. van der Klaaw (2001): "Identification and Estimation of Treatment Effects Using a
Regression Discontinuity Design", Econometrica, 69, 201-209.
Discussion of nonparametric assumptions necessary for identification in RDD.
Frölich, M. (2007): “Regression Discontinuity Design with Covariates,” Discussion Paper No. 3024.
The natural extension of the Hahn, Todd, and van der Klaaw paper incorporating covariates that may be necessary for RDD
to be plausible.
Frölich, M., and B. Melly (2010): “Quantile Treatment Effects in the Regression Discontinuity Design: Process
Results and Gini Coefficient,” IZA DP 4993.
The extension to quantile treatment and similar effects.
A test about manipulating the cut-off
McCrary, J. (2008): „Manipulation of the running variable in the regression discontinuity design: A density test“,
Journal of Econometrics, 142, 698–714.
RDD assumptions are violated if agents are able to manipulate the running variable. This paper develops a test of
manipulation related to continuity of the running variable density function.
Bandwidth choice in non-parametric RDD estimation
Imbens, Guido, and Karthik Kalyanaraman (2012): Optimal Bandwidth Choice for the Regression Discontinuity
Estimator, Review of Economic Studies, 79, 933–959. doi:10.1093/restud/rdr043.
Inference
Calonico, Sebastian, Matias D. Cattaneo, Rocio Titiunik (2014): Robust Nonparametric Con.dence Intervals for
Regression-Discontinuity Designs, Econometrica, forthcoming.
Using kinks instead of jumps
The following papers exploit a discontinuity in the first derivative instead of the regression itself.
Card, D., D. S. Lee, Z. Pei, and A. Weber (2014): “Inference on Causal Effects in a Generalized Regression
Kink Design, Working paper, UC-Berkeley.”
Dong, Y. (2014): “Jumpy or Kinky? Regression Discontinuity Without The Discontinuity,” working paper, UCIrvine.
RDD and LATE
Frölich, M., and M. Lechner (2010): "Exploiting Regional Treatment Intensity for the Evaluation of Labour
Market Policies," Journal of the American Statistical Association, 105, 1014-1029, doi:
10.1198/jasa.2010.ap08148 .
Although framed as a local IV paper, this paper can also be viewed as exploiting a regional discontinuity.
Increasing the external validity of RDD
Angrist, Joshua, and Miikka Rokkanen (2012): Wanna Get Away? RD Identification Away from the Cutoff,
NBER Working Paper No. 18662.
They developed procedure to get some idea about the effect sizes further away from the cut-off. Of course there is a price to
pay: They need further identifying assumptions.
20
Michael Lechner, PEF, HS13
21.06.2016
Some important applications
Broockman, D. E. (2009): “Do Congressional Candidates Have Reverse Coattails? Evidence from a Regression
Discontinuity Design,” Political Analysis, 17(4):418-434; doi:10.1093/pan/mpp013.
He uses voting districts with very narrow margins (such that the outcome of the election can be seen as random) to analyse
whether there is an advantage for the ‚incumbent‘ party of the congressional election for the presidential election.
Black, Sandra E. (1999): „Do Better Schools Matter? Parental Valuation of Elementary Education“, The
Quarterly Journal of Economics, 577-599.
This paper used local neighbourhoods around boundaries of school districts to analyse the effect of school quality on
student outcomes.
Angrist, J. D., and V. Lavy (1999): „Using Maimonides‘ Rule to Estimate the Effects of Class Size on Scholastic
Achievement“, The Quarterly Journal of Economics, 114, 533-575.
This paper estimates the effects of class size on student outcomes by using a rule used in Israel on how to split classes.
Before-after
Senn, Stephen (2011): „Francis Galton and regression to the mean,“ significance, September 2011, 124-146.
Galton founded many concepts in statistics, among them correlation, quartile, and percentile. Here Stephen Senn examines
one of Galton’s most important statistical legacies – one that is at once so trivial that it is blindingly obvious, and so deep
that many scientists spend their whole career being fooled by it.
Very, very easy read and not only related to before-after comparisons.
Difference-in-difference
General
Heckman, J.J., and V.J. Hotz (1989): "Choosing Among Alternative Nonexperimental Methods for Estimating
the Impact of Social Programs: The Case of Manpower Training", Journal of the American Statistical
Association, 84, 862-880 (includes comments by Holland and Moffitt).
They claim that they can find a good model for treatment effects by appropriate testing (this paper invents the famous preprogramme test). A claim that is controversial.
Heckman, J.J., H. Ichimura, J. Smith, and P. Todd (1998): "Characterizing Selection Bias Using Experimental
Data", Econometrica, 66, 1017-1098.
They suggest a nonparametric (semiparametric?) difference-in-difference estimator.
Bertrand, M., E. Duflo, and S. Mullainathan (2004): "How much should we trust differences-in-differences
estimates", Quarterly Journal of Economics, 249-275.
They are concerned about the underestimation of standard errors in standard DiD applications and provide solutions.
Lechner, M. (2010): "Difference-in-Difference Estimation," Foundations and Trends in Econometrics, 4, 165–
224.
Generalizing the linear DiD regression model
Abadie, A. (2005): "Semiparametric difference-in-difference estimation", Review of Economic Studies.
Using inverse-probability-weighting (IPW) to adjust for differences due to covariates.
Athey Susan and Guido W. Imbens (2006): "Identification and Inference in Nonlinear Difference-In-Differences
Models," Econometrica, Vol. 74, No. 2 (March, 2006), 431–497.
They relax the units-of-measurement dependence of the DiD approach.
Bonhomme, S., and U. Sauder (2011): "Recovering Distributions in Difference-In-Differences Models: A
Comparison of Selective and Comprehensive Schooling," The Review of Economics and Statistics, 93, 479–
494.
Derive a formulation of DiD in form of differences in logs of characteristic function. Model is somewhat flexible with respect
to functional forms of time constant and time varying error terms and covariates, but still measurement dependent.
It is neither nested in nor nests the Athey-Imbens (2006) approach.
21
Michael Lechner, PEF, HS13
21.06.2016
Puhani, Patrick A. (2012): The treatment effect, the cross difference, and the interaction term in nonlinear
‘‘difference-in-differences’’ models, Economics Letters, 115, 85–87.
An interesting note on non-linear DiD’s.
Very early applications
Snow, J. (1855), On the Mode of Communication of Cholera, 2nd ed., London: John Churchill.
Probably first paper using DiD (from epidemiology). It is about the spread of Cholera and water quality in London.
Rose, A.M. (1952): "Needed Research on the Mediation of Labour Disputes", Personal Psychology, 5, 187-200.
Very early application (psychology). The effect of mandatory mediation on work disputes in the USA.
Lester, R.A. (1946): "Shortcomings of marginal analysis for the wage-employment problems", American
Economic Review, 36, 63-82.
Probably the first application in economics analysis the effect of minimum wages on employment.
Simon, J.L. (1966): "The Price Elasticity of Liquor in the U.S. and a Simple Method of Determination",
Econometrica, 34(1), 193-205.
A very early application.
Ashenfelter, O. (1978): "Estimating the Effect of Training Programs on Earnings", The Review of Economics
and Statistics, 60(1), 47-57.
The source of the famous Ashenfelter's dip.
Cook, P.J., and G. Tauchen (1982): "The Effect of Liquor Taxes on Heavy Drinking", The Bell Journal of
Economics, 13(2), 379-390.
Abney, F. Glenn, and Larry B. Hill Source (1966): Natural Disasters as a Political Variable: The Effect of a
Hurricane on an Urban Election, The American Political Science Review, 60 (4), pp. 974-981 Published by:
American Political Science Association Stable URL: http://www.jstor.org/stable/1953770
Political science.
Important applications
Ashenfelter, O., and D. Card (1985): "Using the Longitudinal Structure of Earnings to Estimate the Effect of
Training Programs", The Review of Economics and Statistics, 67 (4), 648-660.
Card, D. (1990): "The Impact of the Mariel Boatlift on the Miami Labor Market", Industrial and Labor
Relations Review, 43/2, 245-257.
Card, D. and A. B. Krueger (1994): Minimum Wages and Employment: A Case Study of the Fast-Food Industry
in New Jersey and Pennsylvania, The American Economic Review, 84 (4), 772-793.
Meyer, B. D., W. K. Viscusi, and D. L. Durbin (1995): "Workers' Compensation and Injury Duration: Evidence
from a Natural Experiment", The American Economic Review, 85 (3), 322-340.
Waldfogel, J. (1998): "The Family Gap for Young Women in the United States and Britain: Can Maternity
Leave Make a Difference", Journal of Labor Economics, 16, 505-545.
Blundell, R., A. Duncan, and C. Meghir (1998): "Estimating Labor Supply Responses Using Tax Reforms",
Econometrica, 66/4, 827-861.
Acemoglu, D., and J.D. Angrist (2001): "Consequences of Employment Protection? The Case of the Americans
with Disabilities Act", Journal of Political Economy, 109, 915-957.
Besley, T., and R. Burgess (2004): "Can Labor Regulation Hinder Economic Performance? Evidence From
India," The Quarterly Journal of Economics, 91-134.
Blundell, R., C. Meghir, M. Costa Dias, and J. van Reenen (2004): "Evaluating the Employment Impact of a
Mandatory Job Search Program", Journal of the European Economic Association, 2, 569-606.
DiD and matching on lagged outcomes
Chabé-Ferret, Sylvain (2015): Analysis of the bias of Matching and Difference-in-Difference under alternative
earnings and selection processes, Journal of Econometrics 185, 110–123, doi:
10.1016/j.jeconom.2014.09.013.
Using an example from the US job training literature, he compares matching using lagged outcomes to difference-indifference estimation.
22
Michael Lechner, PEF, HS13
21.06.2016
The view from statistics
Rosenbaum, P. (2001): "Stability in the Absence of Treatment", Journal of the American Statistical Association,
96, 210-219.
Using experiments to validate non-experimental estimators: The
Lalonde-et al debate…
LaLonde, R.J. (1986): "Evaluating the Econometric Evaluations of Training Programs with Experimental Data",
American Economic Review, 76, 604-620.
A paper that raised serious doubts about the relevance of microeconometric studies and started a race to find more robust
estimators (and use them more thoughtfully).
Heckman, J.J., and V.J. Hotz (1989): "Choosing Among Alternative Nonexperimental Methods for Estimating
the Impact of Social Programs: The Case of Manpower Training", Journal of the American Statistical
Association, 84, 862-880 (includes comments by Holland and Moffitt).
Heckman, J.J. , Hidehiko Ichimura, and Petra E. Todd (1997): "Matching as an Econometric Evaluation
Estimator: Evidence from Evaluating a Job Training Program", The Review of Economic Studies, 64, 605654.
Dehejia, R.H. and Sadek Wahba (1999): "Causal Effects in Non-experimental Studies: Reevaluating the
Evaluation of Training Programmes", Journal of the American Statistical Association, 94, 1053-1062.
Dehejia, R.H. and Sadek Wahba (2002): "Propensity score-matching methods for nonexperimental causal
studies", Review of Economics and Statistics, 84, 151-161.
Smith, J. and P. Todd (2005): “Does Matching Overcome LaLonde’s Critique of Nonexperimental Estimators?”,
Journal of Econometrics, 125, 305-353.
Deheija, R.H. (2005): "Practical propensity score estimation: a reply to Smith and Todd", Journal of
Econometrics, 125, 355-364.
Deheija, R.H. (2005): "Practical propensity score estimation: a reply to Smith and Todd – A Postscript.", mimeo.
Wilde, E., and R. Hollister (2007): "How Close is Close Enough? Evaluating Propensity Score Matching Using
Data from a Class Size Reduction Experiment", Journal of Policy Analysis and Management, 26, 455-477.
Using the Tennesse's Teacher Achievement Ratio Project, they compare matching results to the experimental results and
find a poor fit.
Peikes, D. N., L. Moreno, and S.M. Orzol (2008): "Propensity Score Matching: A Note of Caution for
Evaluators of Social Programs," The American Statistician, August 2008, Vol. 62, No. 3, 222-231, DOI:
10.1198/000313008X332016
Another study showing p-score matching may be far away from the experimental estimates.
Shadish, W.R., M.H. Clark, and P.M. Steiner (2008): "Can Nonrandomized Experiments Yield Accurate
Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments, Journal of the
American Statistical Association, Applications and Case Studies, December 2008, Vol. 103, No. 484, 13341356, DOI 10.1198/016214508000000733, including extensive and very informative discussion.
Probably the cleanest comparison as the author campare and the results from an observational and an experiment arm of
the same type of treatment.
Mediation analysis
Early key papers
Baron, Reuben M., and David A. Kenny (1986): The Moderator-Mediator Variable Distinction in Social
Psychological Research: Conceptual, Strategic, and Statistical Considerations, Journal of Personality and
Social Psychology, 51, No. 6, 1173-1182.
Probably the paper that popularizes the mediator approach.
Robins, James, and Sander Greenland (1992): Identifiability and Exchangeability of Direct and Indirect Effects,
Epidemiology, 3, 143-155.
Note: The G-computation algorithm, well known from his dynamic papers, plays a major role here as well.
23
Michael Lechner, PEF, HS13
21.06.2016
Pearl, Judea (2001): Direct and Indirect Effects, In Proceedings of the Seventeenth Conference on Uncertainy in
Artificial Intelligence, San Francisco, CA: Morgan Kaufmann, 411-420.
This papers use a language not rooted in the potential outcome framework but is based on graphical methods and their
analytical counterparts. Until today it is unclear which of the two approaches to causality are more fruitful.
Nice overviews of the non-parametric literature on a non-technical level
Imai, Kosuke, Luke Keele, Dustin Tingley, Teppei Yamamoto (2011): Unpacking the Black Box of Causality:
Learning about Causal Mechanisms from Experimental and Observational Studies, American Political
Science Review, 105, 765-789, doi:10.1017/S0003055411000414.
Sequential ignorability
Imai, Kosuke, Luke Keele, Dustin Tingley (2010): A General Approach to Causal Mediation Analysis,
Psychological Methods, 15, No. 4, 309–334, DOI: 10.1037/a0020761.
Nice overview.
Imai, Kosuke, Teppei Yamamoto (2012): Identification and Sensitivity Analysis for Multiple Causal
Mechanisms: Revisiting Evidence from Framing Experiments, mimeo
They show that then there are multiple mediators, ignoring them in estimation requires their independence from the
mediator considered. They develop a (semi-)parametric framework of a random coefficients linear model in which they can
allow dependent mediators and still identify their effects.
Huber, Martin (2012): Identifying causal mechanisms in experiments (primarily) based on inverse probability
weighting, mimeo.
This paper uses somewhat different identification condition of the case of endogenous mediation-outcome confounders and
proposes weighting methods for estimation.
Designing experiments to be able to identify mediation effects
Imai, Kosuke, Dustin Tingley, and Teppei Yamamoto (2013): Experimental designs for identifying causal
mechanisms, Journal of the Royal Statistical Society A, 176, 1-47.
These experiments are complex and may be only possible in specific situations, some of them are however very relevant for
empirical economics.
Instrumental variables
Sobel, M. E. 2008. “Identification of Causal Parameters in Randomized Studies with Mediating Variables.”
Journal of Educational and Behavioral Statistics 33 (2): 230–51.
Jo, B. 2008. “Causal Inference in Randomized Experiments with Mediational Processes.” Psychological
Methods 13 (4): 314–36.
See also the discussions and proposal in Imai, Kosuke, Keele, and Tingley (2010).
Useful background reading for econometric methods
Wooldridge, J. (2010): Econometric Analysis of Cross Section and Panel Data, 2nd edition, Cambridge: MIT
Press.
Racine, Jeffrey S. (2008): "Nonparametric Econometrics: A Primer," Foundations and Trends in Econometrics,
3 (1), 1–88, DOI: 10.1561/0800000009.
24
© Copyright 2026 Paperzz