Document

B i g - D a t a M e t h o ds
Measurement
Statistical Innovations for Health and
Educational Research
Steven Andrew Culpepper
Department of Statistics
University of Illinois at Urbana-Champaign
Educational Testing Service
May 6, 2016
B i g - D a t a M e t h o ds
Measurement
Methodological Areas for Innovation
Statistical inference for large-scale surveys involving latent
variables.
Collaboration with Trevor Park.
AERA Grant #2014-03610-00
Fine-grained measurement of cognitive skills and attributes.
Collaboration with Jeff Douglas, Yuguo Chen, and Yinghan
Chen.
B i g - D a t a M e t h o ds
Measurement
NAEP Overview
NAEP provides data on the status of what America’s students
know and can do in various subject areas.
To reduce test time large-scale assessments administer a
random sample of items in a given content area.
Individual score estimates are more variable, so NAEP
measures achievement for groups rather than individuals.
A Multiple Imputation (MI) procedure (e.g., Rubin, 1987;
Thomas, 1997) is employed to predict group proficiency with
student, teacher, and school administrator survey responses.
Researchers should instead use statistical methods that are
designed for high-dimensional inference.
B i g - D a t a M e t h o ds
Measurement
Why Model Selection in Large-Scale Testing?
Large-scale surveys include 100s of variables.
Identifying the most relevant predictors of achievement is more
challenging.
Applied researchers need high-dimensional regression
procedures for testing hypotheses.
The developed model could be used to support the redesign of
background questionnaires.
B i g - D a t a M e t h o ds
Bivariate Normal and Generalized Laplace Priors
Measurement
B i g - D a t a M e t h o ds
Measurement
NAEP Application
The GAL, MVN, and AM software were applied to the 2011
NAEP mathematics data.
NAEP administered J = 155 items to N = 175,200 8th grade
students to assess mathematics achievement in K = 5 subject
areas:
algebra (J1 = 49);
data analysis, statistics, and probability (J2 = 23);
geometry (J3 = 30);
measurement (J4 = 26); and
number properties and operations (J5 = 27)
The model included G = 148 groups with a total of
V = 262 variables.
B i g - D a t a M e t h o ds
Measurement
Comparison of Methods for 2011 NAEP Mathematics
B i g - D a t a M e t h o ds
Measurement
Comparison of GAL and AM Software
Table: Race-Based Achievement Gaps using the GAL model and
Plausible Values (PV)
Race
African Am.
Content
Area
1
2
3
4
5
GAL
EST
SE
-0.498 0.027
-0.515 0.034
-0.559 0.031
-0.595 0.030
-0.573 0.029
PV
EST
SE
-0.440 0.031
-0.383 0.030
-0.471 0.025
-0.478 0.035
-0.489 0.025
Note. Results are unweighted. 1 = algebra; 2 = data analysis,
statistics, and probability; 3 = geometry; 4 = measurement; and
5 = number properties and operations.
B i g - D a t a M e t h o ds
Measurement
Implications for Test Developers
83 out of 148 groups of variables statistically related with
achievement.
The GAL prior could be used to decide which background
questions to retain, modify, or delete for subsequent data
collections.
Such efforts could optimize the time students, teachers, and
school administrators dedicate to completing surveys.
B i g - D a t a M e t h o ds
Measurement
Implications for Researchers
The GAL prior and AM software produced different estimates of
the achievement gap.
The GAL also yielded a more parsimonious model in Monte
Carlo studies and the application.
We would like to disseminate the methodology as an R
package.
B i g - D a t a M e t h o ds
Measurement
School in the Year 2000
- Postcard from the 1900 World Exhibition in Paris
B i g - D a t a M e t h o ds
Measurement
Latent Variable Models
Latent variable models assume that a collection of unobserved
traits or attributes underlie observed test or survey responses.
Most studies consider broadly defined, continuous latent
variables.
Broadly defined continuous latent variables are useful for
correlational research and ranking individuals on traits.
Cognitive Diagnosis Models (CDMs) instead consider a set of
discrete binary attributes/skills.
B i g - D a t a M e t h o ds
Measurement
Cognitive Diagnosis Models
CDMs provide more detailed diagnostic information regarding
student skills/attributes than is available with more broadly
defined continuous traits in item response models.
CDMs have been applied in several areas:
Education Pathological gambling Anxiety disorders.
The application of CDMs is dependent upon the availability of
cognitive theory that specifies the skills and/or attributes
necessary for success on a collection of tasks.
B i g - D a t a M e t h o ds
Measurement
Purdue Spatial Visualization Test – Rotation (PSVT-R):
Item #1
B i g - D a t a M e t h o ds
RSVT-R: Item #2
Measurement
B i g - D a t a M e t h o ds
Measurement
Continuous vs. Discrete Latent Variables
A continuous IRT model would assume test-takers’ broadly
defined spatial abilities can be mapped to random variable θi
A cognitive diagnosis model (CDM) would classify students into
attribute classes, αIi = (αi1, . . . , αiK ) where
αik =
0 i does not have attribute k
1 i has attribute k
We could classify individuals based upon four rotation
attributes: αi1 = 90o x-axis, αi2 = 90o y-axis, αi3 = 180o x-axis,
and αi4 = 180o y-axis.
Skills could form a hierarchy such that αi1 must be mastered
before αi3 and αi2 before αi4.
B i g - D a t a M e t h o ds
Measurement
Challenges
CDMs require robust cognitive theory that clearly specifies the
underlying attributes/skills.
Cognitive theory is catalogued in the Q matrix.
The unavailability of cognitive theory to specify Q limits
widespread application of CDMs.
B i g - D a t a M e t h o ds
Measurement
Statistical Advances
Developed statistical methodology to estimate Q. The new
procedure can be employed to assess existing
cognitive theory or develop new theory.
CDMs can be accurately applied to understand learning
progressions.