Dynamic Bayesian Network Modeling of Game Based

Dynamic Bayesian Network Modeling of
Game Based Diagnostic Assessments
Roy Levy
Measurement and Statistical Analysis
T. Denny Sanford School of Social and Family Dynamics
Arizona State University
CRESST Conference
Warp Speed, Mr. Sulu: Integrating Games, Technology, and
Assessment to Accelerate Learning in the 21st Century
Redondo Beach, CA
April 2014
Thanks!
This research was supported by the Center for Advanced
Technology in Schools (CATS), PR/Award Number
R305C080015, as administered by the Institute of Education
Sciences, U.S. Department of Education. The opinions expressed
are those of the author and do not necessarily reflect the positions
or policies of the Center for Advance Technology in Schools
(CATS), the National Center for Education Research (NCER),
the Institute of Education Sciences (IES), or the U.S. Department
of Education.
Levy, R. (2014). Dynamic Bayesian network modeling of game based diagnostic
assessments. (CRESST Report 837). Los Angeles, CA: University of
California, National Center for Research on Evaluation, Standards, Student
Testing (CRESST).
2
Background & Context
3
Save Patch
(Chung et al., 2010; Kerr & Chung, 2012a, 2012b)
• Game targeting rational number equivalence, addition
• Students engage in levels
• Goal is to get Patch from the start position to target destination
given the layout, ropes
• Lay out ropes, set Patch in motion, observe what happens
– Success on a level leads to more complicated levels
– Advanced levels involve more complicated math (fractions),
more complicated layouts, and gaming features (picking up keys,
coins)
– Unsuccessful? try again (and again, and again,…)
• Evidence of utility as a learning environment, assessment
(Delacruz, Chung, & Baker, 2010; Kerr & Chung, 2012b)
4
Targeted Aspects of Proficiency (Skills)
Skills
Levels
Whole Numbers
1-3
Unit Fractions
4-8
Whole Numbers and Unit Fractions
9-12
Crossing the Unit Bar
13-15
Adding Unit Fractions
16-19
Adding Improper Fractions
20-23
5
Characterization of Behaviors,
Associated with Strategies, Misconceptions
• Cluster analyses of student behaviors captured in log file led to
identification of (Kerr & Chung, 2012a; Kerr, Chung, & Iseli, 2011)
– Several strategies, some solution strategies, other
misconceptions, others related to gameplay (not considered)
– Behaviors associated with each
• Distinguish different types of solutions, which may
differentially constitute evidence of proficiency, efficiency,
strategies, etc.
• Distinguish different types of errors, which may differentially
constitute evidence of lack of proficiency or misconceptions
• Evidence identification rules for student performances on
levels, yielding observable variables
6
Possible Solution Strategies
(Present on Most Levels)
•
•
•
•
Standard Solution
Fractional Solution
Alternate Solution
Incomplete Solution
7
Possible Errors & Corresponding Misconceptions
(Total; Not All Present On All Levels)
Observable Value
Corresponding Misconceptions
Wrong Numerator
Iterating Error
Saw As Mixed Number
Converting to Wholes Error
Counted Hash Marks
Partitioning Error
Counted Hash Marks and Posts
Partitioning Error
Saw As One Unit
Unitizing Error
Saw As Wholes
Unitizing Error
Saw As One Unit and Counted Hash Marks
Partitioning Error, Unitizing Error
Saw As One Unit and Counted Hash Marks and Posts
Partitioning Error, Unitizing Error
Everything In Order
Avoiding Math
Unknown Error
8
Key Features for Psychometric Modeling
• 6 aspects of proficiency across 23 modeled levels (1 each)
• 5 misconceptions linked to behaviors, some of which may be
possible on a particular level
• Feedback (student knows if correctly or incorrectly completed)
• Possibility (really, hope) of student learning during gameplay
• Performances not conditionally independent even given
proficiency (you know what you did, and how it turned out,
for the most part)
9
Dynamic Bayesian Networks for Game Based
Assessment
10
Bayesian Networks for Game Based Assessment
• Games are appealing assessment environments
– Shared structure
– Digitally based games afford opportunities for complex
assessment arguments
– Monitoring learning and change over time
• Bayesian networks (BNs) a promising approach for
psychometric modeling of games and related environments
• Latent variables for aspects of proficiency (Master/Nonmaster)
and misconceptions (Possess/Do not Possess)
• Observable variables as summaries of performance (type of
solution, error)
11
Bayesian Network Psychometric Models
Proficiency & Misconceptions (Multivariate)
θ
Performance depends on
proficiency and misconceptions
X
Performance
- Masters of skills should do
better than nonmasters
- Possessing certain
misconceptions more likely to
exhibit certain behaviors
12
DBNs for Game Based Assessment
• Bayesian networks (BNs) a promising approach for
psychometric modeling of games and related environments
• Latent variables for aspects of proficiency (Master/Nonmaster)
and misconceptions (Possess/Do not Possess)
• Observable variables as summaries of performance (type of
solution, error)
• Dynamic BNs (DBNs) for modeling longitudinal data
– Latent Markov models, latent transition models, growth models
13
Dynamic Bayesian Network
Psychometric Models
Prof., misconcepts.
at time t
Prof., misconcepts.
at time t+1
θt
θt+1
Xt
Performance
at time t
Xt+1
Performance
at time t+1
Time (in slices)
14
Dynamic Bayesian Network
Psychometric Models
Prof., misconcepts.
at time t
Prof., misconcepts.
at time t+1
θt
θt+1
Current proficiency depends on past proficiency
Xt
- Once a master, always
a master Xt+1
- Transition away from (to?) misconceptions
Performance
Performance
at time t
at time t+1
Time (in slices)
15
Dynamic Bayesian Network
Psychometric Models
Prof., misconcepts.
at time t
Prof., misconcepts.
at time t+1
θt
θt+1
Xt
Current proficiency
depends on past
performance
due to
Xt+1
feedback on performance
Performance
at time t
Performance
at time t+1
Time (in slices)
16
Dynamic Bayesian Network
Psychometric Models
Prof., misconcepts.
at time t
Prof., misconcepts.
at time t+1
θt
θt+1
Xt
Performance
at time t
Xt+1
Performance
at time t+1
Time (in slices)
17
DBNs for Game Based Assessment
• Model structure & conditional probabilities typically specified
in advance by subject matter experts
• Desire to calibrate the models, estimate conditional
probabilities and associated parameters
– Synthesize a priori theory, data
• Challenges
– Typically applied to models with dichotomous latent and
observable variables
– Polyomtous observables from characterization of performance in
fairly open workspace
– Longitudinal dependence structures
18
Model Specification:
Observables Dependent on Latent Variables
• Item response type models for structuring conditional
probabilities of observables given latent variables, similar to:
– General Diagnostic Model (GDM, von Davier, 2005) and
Scaling Individuals and Classifying Misconceptions (SICM,
Bradshaw & Templin, in press) model in diagnostic modeling
– multidimensional item response theory (MIRT, Reckase, 2009)
• Fully Bayesian approach, prior distributions
– reflect substantive theory (e.g., Standard solution most reflective
of mastery)
– resolve problems of label switching
– otherwise diffuse
19
Model Specification:
Transition of Latent Variables
• Probability of transitioning to mastery, away from
misconceptions (to misconceptions?)
• Data sparseness an issue
– Infrequent occurrences of certain combinations of variables
– Logical difficulties associated with transition to mastery given
success on last level that targets particular skill
• Fully Bayesian approach mitigates these
– Prior specifications based on theory
– Hierarchical prior specification addresses logical problem by
borrowing strength from other, similar situations
20
Model Fitting
• 851 6th-8th graders playing Save Patch
• Markov chain Monte Carlo estimation via OpenBUGS
– Complicated by different number of attempts at each level
– Fit groups of levels, as software could manage
– Hierarchical prior specification aids in this
21
Representative Results:
Conditional Probability of Performance Level 19
Latent Variables
Observable for Level 19
Add. Unit Iterating
Fractions Error
Master
~Possess
Stand.
Alt.
Inc. Wrong Unknown
Error
Solution Solut. Solut. Numer.
0.95
0.00 0.01
0.03
0.01
Nonmaster ~Possess
0.58
0.02
0.01
0.25
0.13
Master
Possess
0.77
0.00
0.01
0.21
0.01
Nonmaster Possess
0.33
0.01
0.01
0.58
0.07
22
Representative Results: Transition Probability
for Adding Unit Fractions, Level 19
Latent Variable
Add. Unit Fractions
at Time t
Master
Nonmaster
Observable for Level 19
Stand.
Alt.
Inc. Wrong Unknown
Error
Solution Solut. Solut. Numer.
1
1
1
1
1
.38
.17
.19
.20
.09
23
Representative Results: Transition Probability
for Iterating Error, Level 19
Latent Variable
Iterating Error
at Time t
Possess
Not Possess
Observable for Level 19
Stand.
Alt.
Inc. Wrong Unknown
Error
Solution Solut. Solut. Numer.
.51
.51
.51
.29
.46
0
0
0
.15
.14
24
1.0
0.6
0.2
P(Mastery/Po
Model-Based Reasoning
About Students During Gameplay
1
3
5
6
7 8 9
10
11
13
15
16
19
22
Attempt
Whole Numbers
Unit Fractions
Whole Numbers and Unit Fractions
Crossing Unit Bar
Add Unit Fractions
Add Improper Fractions
25
Model-Based Reasoning
About Students During Gameplay
26
Concluding Thoughts
27
Discussion
• GBA argumentation: Openness of workspace, data collection
offers enormous potential
• Appeal is also its challenge—what to pay attention to, how to
reason to desired inferences, claims
• Challenges to traditional psychometrics
– Feedback, learning during gameplay, lack of conditional
independence
• DBNs as flexible, accommodate complexities of GBA not
encountered in traditional assessment
– Hard and soft constraints reflecting theory
– Bayesian approach offers theoretical, practical advantages
• Psychometric models as imperfect representation of assessor’s
imperfect thinking
28
Thank You!
[email protected]
29
References
Levy, R. (2014). Dynamic Bayesian network modeling of game based diagnostic assessments. (CRESST Report
837). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards,
Student Testing (CRESST).
Bradshaw, L., & Templin, J. (in press). Combining item response theory and diagnostic classification models: A
psychometric model for scaling ability and diagnosing misconceptions. Psychometrika.
Chung, G. K. W. K., Baker, E. L., Vendlinski, T. P., Buschang, R. E., Delacruz, G. C., Michiuye, J. K., & Bittick,
S. J. (2010, April). Testing instructional design variations in a prototype math game. In R. Atkinson (Chair),
Current perspectives from three national R&D centers focused on game-based learning: Issues in learning,
instruction, assessment, and game design. Structured poster session at the annual meeting of the American
Educational Research Association, Denver, CO.
Delacruz, G. C., Chung, G. K. W. K., & Baker, E. L. (2010). Validity evidence for games as assessment
environments (CRESST Research Report No. 773). Los Angeles: National Center for Research on
Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA.
Kerr, D., & Chung, G. K. W. K. (2012a). Identifying key features of student performance in educational video
games and simulations through cluster analysis. Journal of Eductional Data Mining, 4, 144-182.
Kerr, D., & Chung, G. K. W. K. (2012b). The mediation effect of in-game performance between prior knowledge
and posttest score (CRESST Research Report No. 819). Los Angeles: National Center for Research on
Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA.
Kerr, D., Chung, G. K. W. K., & Iseli, M. R. (2011). The feasibility of using cluster analysis to examine log data
from educational video games (CRESST Research Report No. 790). Los Angeles: National Center for
Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA.
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ:
30
Educational Testing Service.