900 Advanced Methods for Political Science

Smidt
PLS 900
PLS 900: Statistical Models for Political Science Data
Fall 2015
M/W: 12:40-2PM
Room: 217 Bessey Hall
Instructor: Prof. Corwin D. Smidt
Office: 320 S. Kedzie Hall
Email: [email protected] – I do not reply to messages over weekends.
Phone: 353-3292
Website: d2l.msu.edu
Office Hours: M/W 3-4PM, or by appointment
Course Description
The goal of this course is for you to gain sufficient working knowledge of statistical models
that extend beyond basic least squares regression and that are common or have special
relevance to contemporary political science research. Our focus is on familiarizing you with
the skills and techniques commonly deemed essential for publishing quantitative research in
today’s journals.
We will start by developing an understanding of likelihood as a mode of statistical inference and then apply popular maximum likelihood models within political science research.
We will then proceed to discuss other methodological issues/practices common to political
science and how we address/utilize them. We will mostly focus on generalizations of the
linear model to other types of measures, but will potentially discuss: panel data/multilevel
modeling, Bayesian estimation, semi/non-parametric estimation, matching, prediction, missing data, and beyond.
Those who successfully complete this course should have a sound working knowledge of
maximum likelihood estimation and be able to successfully utilize all practices covered within
this class in their own applied research. Students are also expected to gain an awareness of
other advanced modeling techniques and topics, especially those that may be more specific
to their research interests and which may be featured within advanced methods courses in
or outside this department.
Course Prerequisites
Students should have already taken a course on probability and linear regression analysis
(i.e., PLS 801 and 802).
Smidt
PLS 900
Course Materials
Required Book
Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables
Thousand Oaks, CA: SAGE Publications.
Recommended Books
You do not have to have these books to take this course.
Comprehensive Econometric Books: Cover course topics and much more. Each has
its own strengths and weaknesses. Search the Internet, go to the library, or come to my
office and read them if you are interested in knowing which one to buy:
Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications New York: Cambridge University Press.
Greene, William H. 2011. Econometric Analysis 7th Edition (6th is good too). Upper
Saddle River, NJ: Prentice Hall.
Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data 2nd
Edition. Cambridge, MA: MIT Press.
Software-Specific Books: Most often you have the choice to complete class assignments
in R or in Stata. While the code to estimate such models in both programs will be covered
in class, you are certain to come across problems and questions with each program during
your own estimation. In such cases I suggest you consider one or both of the following:
Hardin, James W., and Joseph M. Hilbe. 2012. Generalized Linear Models and Extensions, Third Edition College Station, TX: Stata Press.
Long, J. Scott and Jeremy Freese. 2006. Regression Models for Categorical Dependent
Variables Using Stata College Station, TX: Stata Press.
Maindonald, John and John Braun. 2007. Data Analysis and Graphics Using R: An Example Based Approach. New York, NY: Cambridge University Press.
In-depth Treatments of Specific Course Topics:
Agresti, Alan. 2002. Categorical Data Analysis. 2nd Edition. Hoboken, NJ: John Wiley & Sons.
Agresti, Alan. 2010. Analysis of Ordered Categorical Data. 2nd Edition. Hoboken, NJ:
John Wiley & Sons.
Smidt
PLS 900
Box-Steffensmeier, Janet M. and Bradford S. Jones. 2004. Event History Modeling: A
Guide for Social Scientists. New York: Cambridge University Press.
Cameron, A. Colin, and Pravin K. Trivedi. 1998. Regression Analysis of Count Data.
New York: Cambridge University Press.
Gelman, Andrew and Jennifer Hill. 2007. Data Analysis using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press.
Greene, William. 2010. Analysis of Ordered Categorical Data. New York: Cambridge
University Press.
Jackman, Simon. 2011. Bayesian Analysis for the Social Sciences. Hoboken, NJ: John
Wiley & Sons.
Keele, Luke. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John
Wiley & Sons.
King, Gary. 1989. Unifying Political Methodology. New York: Cambridge University Press.
Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal Inference.
New York: Cambridge University Press.
Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalize Latent Variable Modeling:
Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: CRC Press.
Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. New York: Cambridge
University Press.
Computing Resources
Most class assignments will allow you to use either R or Stata for completion, however
others might require you to use a method only available within one program. If you are
unfamiliar with either program I would recommend you familiarize yourself with it in this
class. It is your responsibility to make sure you have adequate access to a computer for class
assignments.
Requirements and Grading
Your grade will be comprised of the following components:
1. Assignments (40%): After a couple weeks, assignments will given on an approximate
bi-weekly basis. They will often require you to estimate various models, interpret
your findings, present your results, and perform diagnostics using a statistical software
Smidt
PLS 900
program. Along with answers, students need to include annotated copy of the code
they used to develop their answers.
2. Application/Replication Paper (40%): You need a data set to test and write up a research question of your own or you will need to find an article and corresponding data
set, preferably within your substantive area of interest, to perform a replication and/or
modification testing its research question. Either way, your paper needs to use techniques learned in this class. You are required to write up a description and presentation
of your analysis in a paper which will be due at the end of the semester. The paper
should showcase your ability to use course methods to address substantive questions
and your ability to discuss and present your findings in an accurate and persuasive
manner. For data of papers to replicate go to the ICPSR website (http://icpsr.org/)
or Harvard’s Dataverse website (http://dvn.iq.harvard.edu/dvn/).
3. Data Analysis Presentations (20%): Once we get into dichotomous dependent variables,
each student will give an in-class presentation of his or her efforts to answer a research
question using one of the models and presentation methods discussed in class. The goal
of discussions is to evaluate how well the statistical method and presentation of results
address your substantive research question or test your hypothesis. This presentation
will often be made in coordination with your final paper work.
Grading in this class follows typical graduate school conventions. A 4.0 represents very good
work, a 3.5 represents adequate completion of the course, and a 3.0 or lower generally indicates less than adequate or worse performance.
Note: For your benefit, I do not favor giving out incompletes. I also do not accept late
assignments.
Schedule
Most of the journal articles are found on Jstor or alternative library databases. For those
readings not available, electronic copies are available on the class website. We will see how
far we get into this.
Core Content
1. Introduction, Notation, Software, and Other Basics
2. Understanding Likelihood Theory and Inference
• Long Ch. 1-2
• Pawitan p. 1-48 (on d2l)
• King Ch. 4 (on d2l)
3. Dichotomous Dependent Variables: Logit/Probit, Extensions (Rare Events, Scobit,
Heteroskedastic Models, Bivariate Probit . . . )
Smidt
PLS 900
• Long Ch. 3
• Nagler, Jonathan. 1994. “Scobit: An Alternative Estimator to Logit and Probit.”
American Journal of Political Science 38: 230-255.
• King, Gary and Langche Zeng. 2001. “Explaining Rare Events in International
Relations.” International Organization 55: 693-715.
• Alvarez, R. Michael and John Brehm. 1995. “American Ambivalence Towards
Abortion Policy: Development of a Heteroskedastic Probit Model of Competing
Values.” American Journal of Political Science 39: 1055-1082.
4. Getting the Substance from MLE: Interpretation, Predicted Probabilities, Marginal
Effects, and Model Fit
• Long Ch. 4
• King, Gary, Michael Tomz, and Jason Wittenberg. 2001. “Making the Most
of Statistical Analyses: Improving Interpretation and Presentation.” American
Journal of Political Science 44: 347-361.
• King, Gary, and Langche Zeng. 2006. “The Dangers of Extreme Counterfactuals”
Political Analysis 14: 131-159.
• Hanmer, Michael J., and Kerem Ozan Kalkan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal
Effects from Limited Dependent Variable Models.” American Journal of Political
Science. 57: 263-77.
• Greenhill, Michael D. Ward, and Audrey Sacks. 2011. “The Separation Plot: A
New Visual Method for Evaluating the Fit of Binary Models.” American Journal
of Political Science 55: 991-1002.
5. Interactions, Variance Correction, and Bootstrapping in ML Models
• Freedman, David A. 2006. “On the So-Called ‘Huber Sandwich Estimator’ and
‘Robust Standard Errors.”’ The American Statistician 60: 299-302.
• Greene, William. 2010 “Testing hypotheses about interaction terms in nonlinear
models.” Economics Letters 107: 291-296.
• Berry, William D., Jackqueline H.R. DeMeritt, and Justin Esarey. 2010. “Testing
for Interactions in Binary Logit and Probit Models: Is a Product Term Essential?”
American Journal of Political Science 54: 248-266.
6. Panel/Longitudinal/Multilevel Data Models: Between and Within Effects, Fixed Effects, Random Effects, Variance Correction Approaches
• Gelman, Andrew and Jennifer Hill. 2007. Data Analysis using Regression and
Multilevel/Hierarchical Models (selected chapters)
• Steenbergen, Marco and Bradford Jones. 2002. “Modeling Multilevel Data Structures” American Journal of Political Science 46:218-237
Smidt
PLS 900
• Beck, Nathaniel and Jonathan Katz. 1995. “What to Do (and Not to Do) with
Time-Series Cross-Section Data.” American Political Science Review 89: 634
• Bartels, Brandon. nd. “Beyond Fixed vs. Random Effects: A Framework
for Improving Substantive and Statistical Analysis of Panel, Time-Series CrossSectional, and Multilevel Data.” (on d2l)
7. Ordinal Dependent Variables: Ordered Logit and Probit and Extensions (Generalized
Ordered Logit)
• Long Ch. 5
• King, Gary, Christopher J.L. Murray, Joshua A. Salomon, and Ajay Tandon.
2004. “Enhancing the Validity and Cross-Cultural Comparability of Measurement
in Survey Research.” American Political Science Review 98: 191-207.
8. Unordered or Semi-Ordered Dependent Variables: Multinomial Logit, Multinomial
Probit, Mixed/Random Parameters Logit, Ranked Logit, Stereotype Logit
• Long Ch. 6
• Glasgow, Garrett. 2001. “Mixed Logit Models for Multiparty Elections.” Political
Analysis 9: 116-36.
• Malhotra, Neil, and Alexander G. Kuo. 2008. “Attributing Blame: The Public’s
Response to Hurricane Katrina.” Journal of Politics 70: 120-135.
9. Limited Dependent Variable Models: Censoring, Tobit, Truncated regression, Heckman
Models, Mixture/Split-Population Models Models
• Long Ch. 7
• Durbin, Jeffrey A. and Douglas Rivers. 1989. “Selection Bias in Linear Regression, Logit and Probit Models.” Sociological Methods and Research 18: 360-390
• Sartori, Anne E. 2003. “An Estimator for Some Binary-Outcome Selection Models
Without Exclusion Restrictions.” Political Analysis 11: 111-138.
• Bagozzi, Benjamin E., and Bumba Mukherjee. 2012. “A Mixture Model for
Middle Category Inflation in Ordered Survey Responses.” Political Analysis 20:
369-386.
10. Counting Distributions: Poissons, Negative Binomial, Hurdle, and Zero-Inflated Models
• Long Ch. 8
11. Duration Distributions (Event History): Parametric, Semi-Parametric (Cox), Discrete
Time
• Box-Steffensmeier, Janet M. and Bradford S. Jones. 2004. Event History Modeling: A Guide for Social Scientists (selected chapters)
Smidt
PLS 900
• Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. “Taking Time
Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable”
American Journal of Political Science 43: 1260-1288. (also note Erratum for
Figure 1).
• Carter, David B., and Curtis S. Signorino. 2010. “Back to the Future: Modeling
Time Dependence in Binary Data.” Political Analysis 18: 271-292
• Beck, Nathaniel. 2010. “Time is Not a Theoretical Variable.” Political Analysis.
18: 293-294.
• Svolik, Milan. 2008. “Authoritarian Reversals and Democratic Consolidation.”
American Political Science Review 102: 153-168.
Additional Content: Depending on student interest
1. Model dependencies, model comparisons (i.e., Bayes Factor), and model-independent
inference (Matching, Regression Discontinuity)
• Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal
Inference (selected chapters)
• Arceneaux, Kevin, Alan S. Gerber, and Donald P. Green. 2006. “Comparing
Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment.” Political Analysis 14:37-62.
• Eggers, Andrew C., and Jens Hainmuller. 2009. “MPs for Sale? Returns to Office
in Postwar British Politics” American Political Science Review 103(4): 513-33.
• Iacus, Stefano M., Gary King, and Giuseppe Porro. 2011. “Causal Inference
Without Balance Checking: Coarsened Exact Matching.” Political Analysis
• Imai, Kosuke, and Dustin Tingley. 2012. “A Statistical Method for Empirical
Testing of Competing Theories.” American Journal of Political Science 56(1):
218-36.
• Hainmuller, Jens. 2012. “Entropy Balancing for Causal Effects: A Multivariate
Reweighting Method to Produce Balanced Samples in Observational Studies.”
Political Analysis 20(1): 25-46.
• Montgomery, Jacob, and Brendan Nyhan. 2010. “Bayesian Model Averaging:
Theoretical Developments and Practical Applications.” Political Analysis 18(2):
245-270.
2. Bayesian methods
• Western, Bruce and Simon Jackman. 1995. “Bayesian Inference for Comparative
Research.” American Political Science Review 88:412-23.
• Jackman, Simon. 2000. “Estimation and Inference via Bayesian Simulation:
an Introduction to Markov Chain Monte Carlo.” American Journal of Political
Science 44: 375-404.
• Jackman, Simon. nd. “Statistical Inference: Classical vs. Bayesian.” (on d2l)
Smidt
PLS 900
3. Nonlinear structural equation and endogenous regressor models (Instrumental Variables, 2SLS, Control Function, Special Regressors, Causal Mediation).
• Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods
and Applications (selected chapters)
• Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalize Latent Variable
Modeling: Multilevel, Longitudinal, and Structural Equation Models. (selected
chapters)
• Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. “A General Approach to
Causal Mediation Analysis” Psychological Methods 15(4): 309-334
• Lewbel, Arthur, Yingying Dong, and Thomas Tao Yang. 2012. “Comparing Features of Convenient Estimators for Binary Choice Models with Endogenous Regressors.” Canadian Journal of Economics available at: www2.bc.edu/~lewbel/
simple-comparisons5.pdf
4. Missing Data problems (Multiple Imputation, Ecological Inference): Probably just
Gary King’s Stuff
5. Prediction/Variable Selection/Data-Mining (Random Forest; Neural Networks; Lasso;
I-score)
A Couple Last Things
Group Work and Academic Misconduct
I recognize that working in groups is essential to scientific progress. You are allowed to collaborate with others to complete the assignment portion of this class. You can work together
but your answers should be submited individually and reflect your personal understanding of
the topic. If you did work with others, please also indicate what other students you worked
with on the assignment. Working in groups is not allowed for the paper component of this
class.
Academic misconduct will not be tolerated. Cheating or plagiarism is an insult to me,
your peers, and yourself; it is not to be tolerated. Instances of cheating will be handled
according the school’s policy on integrity of scholarship and grades.
Electronic Submissions
As a general rule, students should always submit their work in paper form. If, under special
circumstances, you are submitting a document electronically, then you need to submit it in
an archival format. This means no modifiable Word/Text documents (.doc, .txt, .rtf) and
instead formats where content is fixed (.pdf, .ps).