Smidt PLS 900 PLS 900: Statistical Models for Political Science Data Fall 2015 M/W: 12:40-2PM Room: 217 Bessey Hall Instructor: Prof. Corwin D. Smidt Office: 320 S. Kedzie Hall Email: [email protected] – I do not reply to messages over weekends. Phone: 353-3292 Website: d2l.msu.edu Office Hours: M/W 3-4PM, or by appointment Course Description The goal of this course is for you to gain sufficient working knowledge of statistical models that extend beyond basic least squares regression and that are common or have special relevance to contemporary political science research. Our focus is on familiarizing you with the skills and techniques commonly deemed essential for publishing quantitative research in today’s journals. We will start by developing an understanding of likelihood as a mode of statistical inference and then apply popular maximum likelihood models within political science research. We will then proceed to discuss other methodological issues/practices common to political science and how we address/utilize them. We will mostly focus on generalizations of the linear model to other types of measures, but will potentially discuss: panel data/multilevel modeling, Bayesian estimation, semi/non-parametric estimation, matching, prediction, missing data, and beyond. Those who successfully complete this course should have a sound working knowledge of maximum likelihood estimation and be able to successfully utilize all practices covered within this class in their own applied research. Students are also expected to gain an awareness of other advanced modeling techniques and topics, especially those that may be more specific to their research interests and which may be featured within advanced methods courses in or outside this department. Course Prerequisites Students should have already taken a course on probability and linear regression analysis (i.e., PLS 801 and 802). Smidt PLS 900 Course Materials Required Book Long, J. Scott. 1997. Regression Models for Categorical and Limited Dependent Variables Thousand Oaks, CA: SAGE Publications. Recommended Books You do not have to have these books to take this course. Comprehensive Econometric Books: Cover course topics and much more. Each has its own strengths and weaknesses. Search the Internet, go to the library, or come to my office and read them if you are interested in knowing which one to buy: Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications New York: Cambridge University Press. Greene, William H. 2011. Econometric Analysis 7th Edition (6th is good too). Upper Saddle River, NJ: Prentice Hall. Wooldridge, Jeffrey M. 2010. Econometric Analysis of Cross Section and Panel Data 2nd Edition. Cambridge, MA: MIT Press. Software-Specific Books: Most often you have the choice to complete class assignments in R or in Stata. While the code to estimate such models in both programs will be covered in class, you are certain to come across problems and questions with each program during your own estimation. In such cases I suggest you consider one or both of the following: Hardin, James W., and Joseph M. Hilbe. 2012. Generalized Linear Models and Extensions, Third Edition College Station, TX: Stata Press. Long, J. Scott and Jeremy Freese. 2006. Regression Models for Categorical Dependent Variables Using Stata College Station, TX: Stata Press. Maindonald, John and John Braun. 2007. Data Analysis and Graphics Using R: An Example Based Approach. New York, NY: Cambridge University Press. In-depth Treatments of Specific Course Topics: Agresti, Alan. 2002. Categorical Data Analysis. 2nd Edition. Hoboken, NJ: John Wiley & Sons. Agresti, Alan. 2010. Analysis of Ordered Categorical Data. 2nd Edition. Hoboken, NJ: John Wiley & Sons. Smidt PLS 900 Box-Steffensmeier, Janet M. and Bradford S. Jones. 2004. Event History Modeling: A Guide for Social Scientists. New York: Cambridge University Press. Cameron, A. Colin, and Pravin K. Trivedi. 1998. Regression Analysis of Count Data. New York: Cambridge University Press. Gelman, Andrew and Jennifer Hill. 2007. Data Analysis using Regression and Multilevel/Hierarchical Models. New York: Cambridge University Press. Greene, William. 2010. Analysis of Ordered Categorical Data. New York: Cambridge University Press. Jackman, Simon. 2011. Bayesian Analysis for the Social Sciences. Hoboken, NJ: John Wiley & Sons. Keele, Luke. 2008. Semiparametric Regression for the Social Sciences. Hoboken, NJ: John Wiley & Sons. King, Gary. 1989. Unifying Political Methodology. New York: Cambridge University Press. Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal Inference. New York: Cambridge University Press. Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalize Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. Boca Raton, FL: CRC Press. Train, Kenneth E. 2009. Discrete Choice Methods with Simulation. New York: Cambridge University Press. Computing Resources Most class assignments will allow you to use either R or Stata for completion, however others might require you to use a method only available within one program. If you are unfamiliar with either program I would recommend you familiarize yourself with it in this class. It is your responsibility to make sure you have adequate access to a computer for class assignments. Requirements and Grading Your grade will be comprised of the following components: 1. Assignments (40%): After a couple weeks, assignments will given on an approximate bi-weekly basis. They will often require you to estimate various models, interpret your findings, present your results, and perform diagnostics using a statistical software Smidt PLS 900 program. Along with answers, students need to include annotated copy of the code they used to develop their answers. 2. Application/Replication Paper (40%): You need a data set to test and write up a research question of your own or you will need to find an article and corresponding data set, preferably within your substantive area of interest, to perform a replication and/or modification testing its research question. Either way, your paper needs to use techniques learned in this class. You are required to write up a description and presentation of your analysis in a paper which will be due at the end of the semester. The paper should showcase your ability to use course methods to address substantive questions and your ability to discuss and present your findings in an accurate and persuasive manner. For data of papers to replicate go to the ICPSR website (http://icpsr.org/) or Harvard’s Dataverse website (http://dvn.iq.harvard.edu/dvn/). 3. Data Analysis Presentations (20%): Once we get into dichotomous dependent variables, each student will give an in-class presentation of his or her efforts to answer a research question using one of the models and presentation methods discussed in class. The goal of discussions is to evaluate how well the statistical method and presentation of results address your substantive research question or test your hypothesis. This presentation will often be made in coordination with your final paper work. Grading in this class follows typical graduate school conventions. A 4.0 represents very good work, a 3.5 represents adequate completion of the course, and a 3.0 or lower generally indicates less than adequate or worse performance. Note: For your benefit, I do not favor giving out incompletes. I also do not accept late assignments. Schedule Most of the journal articles are found on Jstor or alternative library databases. For those readings not available, electronic copies are available on the class website. We will see how far we get into this. Core Content 1. Introduction, Notation, Software, and Other Basics 2. Understanding Likelihood Theory and Inference • Long Ch. 1-2 • Pawitan p. 1-48 (on d2l) • King Ch. 4 (on d2l) 3. Dichotomous Dependent Variables: Logit/Probit, Extensions (Rare Events, Scobit, Heteroskedastic Models, Bivariate Probit . . . ) Smidt PLS 900 • Long Ch. 3 • Nagler, Jonathan. 1994. “Scobit: An Alternative Estimator to Logit and Probit.” American Journal of Political Science 38: 230-255. • King, Gary and Langche Zeng. 2001. “Explaining Rare Events in International Relations.” International Organization 55: 693-715. • Alvarez, R. Michael and John Brehm. 1995. “American Ambivalence Towards Abortion Policy: Development of a Heteroskedastic Probit Model of Competing Values.” American Journal of Political Science 39: 1055-1082. 4. Getting the Substance from MLE: Interpretation, Predicted Probabilities, Marginal Effects, and Model Fit • Long Ch. 4 • King, Gary, Michael Tomz, and Jason Wittenberg. 2001. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” American Journal of Political Science 44: 347-361. • King, Gary, and Langche Zeng. 2006. “The Dangers of Extreme Counterfactuals” Political Analysis 14: 131-159. • Hanmer, Michael J., and Kerem Ozan Kalkan. 2013. “Behind the Curve: Clarifying the Best Approach to Calculating Predicted Probabilities and Marginal Effects from Limited Dependent Variable Models.” American Journal of Political Science. 57: 263-77. • Greenhill, Michael D. Ward, and Audrey Sacks. 2011. “The Separation Plot: A New Visual Method for Evaluating the Fit of Binary Models.” American Journal of Political Science 55: 991-1002. 5. Interactions, Variance Correction, and Bootstrapping in ML Models • Freedman, David A. 2006. “On the So-Called ‘Huber Sandwich Estimator’ and ‘Robust Standard Errors.”’ The American Statistician 60: 299-302. • Greene, William. 2010 “Testing hypotheses about interaction terms in nonlinear models.” Economics Letters 107: 291-296. • Berry, William D., Jackqueline H.R. DeMeritt, and Justin Esarey. 2010. “Testing for Interactions in Binary Logit and Probit Models: Is a Product Term Essential?” American Journal of Political Science 54: 248-266. 6. Panel/Longitudinal/Multilevel Data Models: Between and Within Effects, Fixed Effects, Random Effects, Variance Correction Approaches • Gelman, Andrew and Jennifer Hill. 2007. Data Analysis using Regression and Multilevel/Hierarchical Models (selected chapters) • Steenbergen, Marco and Bradford Jones. 2002. “Modeling Multilevel Data Structures” American Journal of Political Science 46:218-237 Smidt PLS 900 • Beck, Nathaniel and Jonathan Katz. 1995. “What to Do (and Not to Do) with Time-Series Cross-Section Data.” American Political Science Review 89: 634 • Bartels, Brandon. nd. “Beyond Fixed vs. Random Effects: A Framework for Improving Substantive and Statistical Analysis of Panel, Time-Series CrossSectional, and Multilevel Data.” (on d2l) 7. Ordinal Dependent Variables: Ordered Logit and Probit and Extensions (Generalized Ordered Logit) • Long Ch. 5 • King, Gary, Christopher J.L. Murray, Joshua A. Salomon, and Ajay Tandon. 2004. “Enhancing the Validity and Cross-Cultural Comparability of Measurement in Survey Research.” American Political Science Review 98: 191-207. 8. Unordered or Semi-Ordered Dependent Variables: Multinomial Logit, Multinomial Probit, Mixed/Random Parameters Logit, Ranked Logit, Stereotype Logit • Long Ch. 6 • Glasgow, Garrett. 2001. “Mixed Logit Models for Multiparty Elections.” Political Analysis 9: 116-36. • Malhotra, Neil, and Alexander G. Kuo. 2008. “Attributing Blame: The Public’s Response to Hurricane Katrina.” Journal of Politics 70: 120-135. 9. Limited Dependent Variable Models: Censoring, Tobit, Truncated regression, Heckman Models, Mixture/Split-Population Models Models • Long Ch. 7 • Durbin, Jeffrey A. and Douglas Rivers. 1989. “Selection Bias in Linear Regression, Logit and Probit Models.” Sociological Methods and Research 18: 360-390 • Sartori, Anne E. 2003. “An Estimator for Some Binary-Outcome Selection Models Without Exclusion Restrictions.” Political Analysis 11: 111-138. • Bagozzi, Benjamin E., and Bumba Mukherjee. 2012. “A Mixture Model for Middle Category Inflation in Ordered Survey Responses.” Political Analysis 20: 369-386. 10. Counting Distributions: Poissons, Negative Binomial, Hurdle, and Zero-Inflated Models • Long Ch. 8 11. Duration Distributions (Event History): Parametric, Semi-Parametric (Cox), Discrete Time • Box-Steffensmeier, Janet M. and Bradford S. Jones. 2004. Event History Modeling: A Guide for Social Scientists (selected chapters) Smidt PLS 900 • Beck, Nathaniel, Jonathan N. Katz, and Richard Tucker. 1998. “Taking Time Seriously: Time-Series-Cross-Section Analysis with a Binary Dependent Variable” American Journal of Political Science 43: 1260-1288. (also note Erratum for Figure 1). • Carter, David B., and Curtis S. Signorino. 2010. “Back to the Future: Modeling Time Dependence in Binary Data.” Political Analysis 18: 271-292 • Beck, Nathaniel. 2010. “Time is Not a Theoretical Variable.” Political Analysis. 18: 293-294. • Svolik, Milan. 2008. “Authoritarian Reversals and Democratic Consolidation.” American Political Science Review 102: 153-168. Additional Content: Depending on student interest 1. Model dependencies, model comparisons (i.e., Bayes Factor), and model-independent inference (Matching, Regression Discontinuity) • Morgan, Stephen L., and Christopher Winship. 2007. Counterfactuals and Causal Inference (selected chapters) • Arceneaux, Kevin, Alan S. Gerber, and Donald P. Green. 2006. “Comparing Experimental and Matching Methods Using a Large-Scale Voter Mobilization Experiment.” Political Analysis 14:37-62. • Eggers, Andrew C., and Jens Hainmuller. 2009. “MPs for Sale? Returns to Office in Postwar British Politics” American Political Science Review 103(4): 513-33. • Iacus, Stefano M., Gary King, and Giuseppe Porro. 2011. “Causal Inference Without Balance Checking: Coarsened Exact Matching.” Political Analysis • Imai, Kosuke, and Dustin Tingley. 2012. “A Statistical Method for Empirical Testing of Competing Theories.” American Journal of Political Science 56(1): 218-36. • Hainmuller, Jens. 2012. “Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies.” Political Analysis 20(1): 25-46. • Montgomery, Jacob, and Brendan Nyhan. 2010. “Bayesian Model Averaging: Theoretical Developments and Practical Applications.” Political Analysis 18(2): 245-270. 2. Bayesian methods • Western, Bruce and Simon Jackman. 1995. “Bayesian Inference for Comparative Research.” American Political Science Review 88:412-23. • Jackman, Simon. 2000. “Estimation and Inference via Bayesian Simulation: an Introduction to Markov Chain Monte Carlo.” American Journal of Political Science 44: 375-404. • Jackman, Simon. nd. “Statistical Inference: Classical vs. Bayesian.” (on d2l) Smidt PLS 900 3. Nonlinear structural equation and endogenous regressor models (Instrumental Variables, 2SLS, Control Function, Special Regressors, Causal Mediation). • Cameron, A. Colin, and Pravin K. Trivedi. 2005. Microeconometrics: Methods and Applications (selected chapters) • Skrondal, Anders and Sophia Rabe-Hesketh. 2004. Generalize Latent Variable Modeling: Multilevel, Longitudinal, and Structural Equation Models. (selected chapters) • Imai, Kosuke, Luke Keele, and Dustin Tingley. 2010. “A General Approach to Causal Mediation Analysis” Psychological Methods 15(4): 309-334 • Lewbel, Arthur, Yingying Dong, and Thomas Tao Yang. 2012. “Comparing Features of Convenient Estimators for Binary Choice Models with Endogenous Regressors.” Canadian Journal of Economics available at: www2.bc.edu/~lewbel/ simple-comparisons5.pdf 4. Missing Data problems (Multiple Imputation, Ecological Inference): Probably just Gary King’s Stuff 5. Prediction/Variable Selection/Data-Mining (Random Forest; Neural Networks; Lasso; I-score) A Couple Last Things Group Work and Academic Misconduct I recognize that working in groups is essential to scientific progress. You are allowed to collaborate with others to complete the assignment portion of this class. You can work together but your answers should be submited individually and reflect your personal understanding of the topic. If you did work with others, please also indicate what other students you worked with on the assignment. Working in groups is not allowed for the paper component of this class. Academic misconduct will not be tolerated. Cheating or plagiarism is an insult to me, your peers, and yourself; it is not to be tolerated. Instances of cheating will be handled according the school’s policy on integrity of scholarship and grades. Electronic Submissions As a general rule, students should always submit their work in paper form. If, under special circumstances, you are submitting a document electronically, then you need to submit it in an archival format. This means no modifiable Word/Text documents (.doc, .txt, .rtf) and instead formats where content is fixed (.pdf, .ps).
© Copyright 2026 Paperzz