RESCH-GE.2011 Advanced Topics in Quantitative Methods: Marc Scott Classification and Clustering Spring 2013 Lecture: Location: Office Hours: Text: Software: Fridays 9:30am-12:00 pm Office: 801W Kimball Hall TBD Phone: 212-992-9407 By appointment email: [email protected] There is no required text. Selected chapters from several sources will be made available. SPSS, SAS, R. This course will use Blackboard. COURSE DESCRIPTION: Classification and clustering are important statistical techniques commonly applied in many social and behavioral science research problems. Both seek to understand social phenomena through the identification of naturally occurring homogeneous groupings within a population. Classification techniques are used to sort new observations into pre-existing or known groupings, while clustering techniques sort the population under study into groupings based on their observed characteristics. Both help to reveal hidden structure that may be used in further analyses. This course will compare and contrast these techniques, including many of their variations, with an emphasis on applications. COURSE REQUIREMENTS: Participation: 10% You are expected to attend class and participate in class discussions Homework problems: 20% There will be several assigned problems intended to give you practical experience with the methods discussed. Data Analysis Projects: 70% There will be two data analysis projects (worth 35% each). One will be a cluster analysis and the other a problem in classification. Data may come from thesis research, a pilot study, or public datasets. COURSE READINGS: Handouts will be available on Blackboard by the Monday preceding class. It is the student’s responsibility to print out and review the notes before coming to class. Late assignment policy: Assignments are to be handed in on time. 2011 SCHEDULE (subject to change in 2013) Date Jan Feb 28 4 11 18 Mar 25 4 11 Mar 25 Topic Introduction to classification and clustering; what is a cluster; visualization techniques, including principal components Hierarchical clustering; linkage choices; distance measures; the dendogram. Optimization techniques (k-means); choosing the number of groups; evaluating clusters; HW 1 DUE Model-based clustering (including model selection); Nagin clusters (intro); HW 2 DUE Nagin clusters (details) Classification; Discriminant function analysis; adding complexity; interpretation; PROJECT 1 DUE Multidimensional scaling; Use of dissimilarity matrices and the partition about medoids approach; HW 3 DUE PROJECT 2 DUE (this is the Friday after spring break) Readings January 28: Everitt et al., Cluster Analysis (4th Ed.), chapters 1 & 2. February 4: Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.1, 6.2. Brian F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 9. February 11: Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.3, 6.4 (this section is helpful for Feb. 18). Peter J. Rousseeuw (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, 53-65. February 18: Banfield & Raftery (1993). Model-based Gaussian and non-Gaussian clustering. Biometrics, Vol. 49, no. 3, 803-821. Fraley and Raftery (1998). How Many Clusters? Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer Journal, 41(8):578-588. Bobby L. Jones, Daniel S. Nagin, and Kathryn Roeder (2001). A SAS Procedure Based on Mixture Models for Estimating Developmental Trajectories. Sociological Methods & Research (29): 374-393. February 25: Shaun McDermott and Daniel S. Nagin (2001). Same or Different?: Comparing Offender Groups and Covariates Over Time. Sociological Methods Research (29): 282-318. March 4: Everitt & Dunn, Applied Multivariate Data Analysis, chapter 11. Brian F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 8. Tabachnick & Fidell, Using Multivariate Statistics (4th Ed.), chapter 11. March 11: Brian F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 11 (skip 11.11). Sharon L. Weinberg (1991). Introduction to Multidimensional Scaling. Measurement and Evaluation in Counseling and Development, 24(1): 12-36. Kaufman & Rousseeuw (1990). Finding Groups in Data: an introduction to cluster analysis. Chapter 2.
© Copyright 2026 Paperzz