Advanced Topics in Quantiative Methods: Classification and Clustering

RESCH-GE.2011 Advanced Topics in Quantitative Methods: Marc Scott
Classification and Clustering
Spring 2013
Lecture:
Location:
Office Hours:
Text:
Software:
Fridays 9:30am-12:00 pm
Office: 801W Kimball Hall
TBD
Phone: 212-992-9407
By appointment
email: [email protected]
There is no required text. Selected chapters from several sources will be made available.
SPSS, SAS, R. This course will use Blackboard.
COURSE DESCRIPTION:
Classification and clustering are important statistical techniques commonly applied in many social and
behavioral science research problems. Both seek to understand social phenomena through the
identification of naturally occurring homogeneous groupings within a population. Classification
techniques are used to sort new observations into pre-existing or known groupings, while clustering
techniques sort the population under study into groupings based on their observed characteristics. Both
help to reveal hidden structure that may be used in further analyses. This course will compare and contrast
these techniques, including many of their variations, with an emphasis on applications.
COURSE REQUIREMENTS:
Participation:
10% You are expected to attend class and participate in class discussions
Homework problems: 20% There will be several assigned problems intended to give you practical
experience with the methods discussed.
Data Analysis Projects: 70% There will be two data analysis projects (worth 35% each). One will be a
cluster analysis and the other a problem in classification. Data may
come from thesis research, a pilot study, or public datasets.
COURSE READINGS: Handouts will be available on Blackboard by the Monday preceding class. It is
the student’s responsibility to print out and review the notes before coming to class.
Late assignment policy:
Assignments are to be handed in on time.
2011 SCHEDULE (subject to change in 2013)
Date
Jan
Feb
28
4
11
18
Mar
25
4
11
Mar
25
Topic
Introduction to classification and clustering; what is a cluster; visualization
techniques, including principal components
Hierarchical clustering; linkage choices; distance measures; the dendogram.
Optimization techniques (k-means); choosing the number of groups; evaluating
clusters; HW 1 DUE
Model-based clustering (including model selection); Nagin clusters (intro); HW 2
DUE
Nagin clusters (details)
Classification; Discriminant function analysis; adding complexity; interpretation;
PROJECT 1 DUE
Multidimensional scaling; Use of dissimilarity matrices and the partition about
medoids approach; HW 3 DUE
PROJECT 2 DUE (this is the Friday after spring break)
Readings
January 28:
Everitt et al., Cluster Analysis (4th Ed.), chapters 1 & 2.
February 4:
Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.1, 6.2. Brian
F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 9.
February 11:
Everitt & Dunn, Applied Multivariate Data Analysis, chapter 6, sections 6.3, 6.4 (this
section is helpful for Feb. 18). Peter J. Rousseeuw (1987). Silhouettes: a graphical aid to
the interpretation and validation of cluster analysis. Journal of Computational and
Applied Mathematics, 20, 53-65.
February 18:
Banfield & Raftery (1993). Model-based Gaussian and non-Gaussian clustering.
Biometrics, Vol. 49, no. 3, 803-821. Fraley and Raftery (1998). How Many Clusters?
Which Clustering Method? Answers Via Model-Based Cluster Analysis. The Computer
Journal, 41(8):578-588.
Bobby L. Jones, Daniel S. Nagin, and Kathryn Roeder (2001). A SAS Procedure Based
on Mixture Models for Estimating Developmental Trajectories. Sociological Methods &
Research (29): 374-393.
February 25:
Shaun McDermott and Daniel S. Nagin (2001). Same or Different?: Comparing Offender
Groups and Covariates Over Time. Sociological Methods Research (29): 282-318.
March 4:
Everitt & Dunn, Applied Multivariate Data Analysis, chapter 11. Brian F. Manly,
Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 8. Tabachnick & Fidell,
Using Multivariate Statistics (4th Ed.), chapter 11.
March 11:
Brian F. Manly, Multivariate Statistical Methods, A Primer (2nd Ed.), chapter 11 (skip
11.11). Sharon L. Weinberg (1991). Introduction to Multidimensional Scaling.
Measurement and Evaluation in Counseling and Development, 24(1): 12-36. Kaufman &
Rousseeuw (1990). Finding Groups in Data: an introduction to cluster analysis. Chapter
2.