Dynamic Bayesian Network Modeling of Game Based Diagnostic Assessments Roy Levy Measurement and Statistical Analysis T. Denny Sanford School of Social and Family Dynamics Arizona State University CRESST Conference Warp Speed, Mr. Sulu: Integrating Games, Technology, and Assessment to Accelerate Learning in the 21st Century Redondo Beach, CA April 2014 Thanks! This research was supported by the Center for Advanced Technology in Schools (CATS), PR/Award Number R305C080015, as administered by the Institute of Education Sciences, U.S. Department of Education. The opinions expressed are those of the author and do not necessarily reflect the positions or policies of the Center for Advance Technology in Schools (CATS), the National Center for Education Research (NCER), the Institute of Education Sciences (IES), or the U.S. Department of Education. Levy, R. (2014). Dynamic Bayesian network modeling of game based diagnostic assessments. (CRESST Report 837). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, Student Testing (CRESST). 2 Background & Context 3 Save Patch (Chung et al., 2010; Kerr & Chung, 2012a, 2012b) • Game targeting rational number equivalence, addition • Students engage in levels • Goal is to get Patch from the start position to target destination given the layout, ropes • Lay out ropes, set Patch in motion, observe what happens – Success on a level leads to more complicated levels – Advanced levels involve more complicated math (fractions), more complicated layouts, and gaming features (picking up keys, coins) – Unsuccessful? try again (and again, and again,…) • Evidence of utility as a learning environment, assessment (Delacruz, Chung, & Baker, 2010; Kerr & Chung, 2012b) 4 Targeted Aspects of Proficiency (Skills) Skills Levels Whole Numbers 1-3 Unit Fractions 4-8 Whole Numbers and Unit Fractions 9-12 Crossing the Unit Bar 13-15 Adding Unit Fractions 16-19 Adding Improper Fractions 20-23 5 Characterization of Behaviors, Associated with Strategies, Misconceptions • Cluster analyses of student behaviors captured in log file led to identification of (Kerr & Chung, 2012a; Kerr, Chung, & Iseli, 2011) – Several strategies, some solution strategies, other misconceptions, others related to gameplay (not considered) – Behaviors associated with each • Distinguish different types of solutions, which may differentially constitute evidence of proficiency, efficiency, strategies, etc. • Distinguish different types of errors, which may differentially constitute evidence of lack of proficiency or misconceptions • Evidence identification rules for student performances on levels, yielding observable variables 6 Possible Solution Strategies (Present on Most Levels) • • • • Standard Solution Fractional Solution Alternate Solution Incomplete Solution 7 Possible Errors & Corresponding Misconceptions (Total; Not All Present On All Levels) Observable Value Corresponding Misconceptions Wrong Numerator Iterating Error Saw As Mixed Number Converting to Wholes Error Counted Hash Marks Partitioning Error Counted Hash Marks and Posts Partitioning Error Saw As One Unit Unitizing Error Saw As Wholes Unitizing Error Saw As One Unit and Counted Hash Marks Partitioning Error, Unitizing Error Saw As One Unit and Counted Hash Marks and Posts Partitioning Error, Unitizing Error Everything In Order Avoiding Math Unknown Error 8 Key Features for Psychometric Modeling • 6 aspects of proficiency across 23 modeled levels (1 each) • 5 misconceptions linked to behaviors, some of which may be possible on a particular level • Feedback (student knows if correctly or incorrectly completed) • Possibility (really, hope) of student learning during gameplay • Performances not conditionally independent even given proficiency (you know what you did, and how it turned out, for the most part) 9 Dynamic Bayesian Networks for Game Based Assessment 10 Bayesian Networks for Game Based Assessment • Games are appealing assessment environments – Shared structure – Digitally based games afford opportunities for complex assessment arguments – Monitoring learning and change over time • Bayesian networks (BNs) a promising approach for psychometric modeling of games and related environments • Latent variables for aspects of proficiency (Master/Nonmaster) and misconceptions (Possess/Do not Possess) • Observable variables as summaries of performance (type of solution, error) 11 Bayesian Network Psychometric Models Proficiency & Misconceptions (Multivariate) θ Performance depends on proficiency and misconceptions X Performance - Masters of skills should do better than nonmasters - Possessing certain misconceptions more likely to exhibit certain behaviors 12 DBNs for Game Based Assessment • Bayesian networks (BNs) a promising approach for psychometric modeling of games and related environments • Latent variables for aspects of proficiency (Master/Nonmaster) and misconceptions (Possess/Do not Possess) • Observable variables as summaries of performance (type of solution, error) • Dynamic BNs (DBNs) for modeling longitudinal data – Latent Markov models, latent transition models, growth models 13 Dynamic Bayesian Network Psychometric Models Prof., misconcepts. at time t Prof., misconcepts. at time t+1 θt θt+1 Xt Performance at time t Xt+1 Performance at time t+1 Time (in slices) 14 Dynamic Bayesian Network Psychometric Models Prof., misconcepts. at time t Prof., misconcepts. at time t+1 θt θt+1 Current proficiency depends on past proficiency Xt - Once a master, always a master Xt+1 - Transition away from (to?) misconceptions Performance Performance at time t at time t+1 Time (in slices) 15 Dynamic Bayesian Network Psychometric Models Prof., misconcepts. at time t Prof., misconcepts. at time t+1 θt θt+1 Xt Current proficiency depends on past performance due to Xt+1 feedback on performance Performance at time t Performance at time t+1 Time (in slices) 16 Dynamic Bayesian Network Psychometric Models Prof., misconcepts. at time t Prof., misconcepts. at time t+1 θt θt+1 Xt Performance at time t Xt+1 Performance at time t+1 Time (in slices) 17 DBNs for Game Based Assessment • Model structure & conditional probabilities typically specified in advance by subject matter experts • Desire to calibrate the models, estimate conditional probabilities and associated parameters – Synthesize a priori theory, data • Challenges – Typically applied to models with dichotomous latent and observable variables – Polyomtous observables from characterization of performance in fairly open workspace – Longitudinal dependence structures 18 Model Specification: Observables Dependent on Latent Variables • Item response type models for structuring conditional probabilities of observables given latent variables, similar to: – General Diagnostic Model (GDM, von Davier, 2005) and Scaling Individuals and Classifying Misconceptions (SICM, Bradshaw & Templin, in press) model in diagnostic modeling – multidimensional item response theory (MIRT, Reckase, 2009) • Fully Bayesian approach, prior distributions – reflect substantive theory (e.g., Standard solution most reflective of mastery) – resolve problems of label switching – otherwise diffuse 19 Model Specification: Transition of Latent Variables • Probability of transitioning to mastery, away from misconceptions (to misconceptions?) • Data sparseness an issue – Infrequent occurrences of certain combinations of variables – Logical difficulties associated with transition to mastery given success on last level that targets particular skill • Fully Bayesian approach mitigates these – Prior specifications based on theory – Hierarchical prior specification addresses logical problem by borrowing strength from other, similar situations 20 Model Fitting • 851 6th-8th graders playing Save Patch • Markov chain Monte Carlo estimation via OpenBUGS – Complicated by different number of attempts at each level – Fit groups of levels, as software could manage – Hierarchical prior specification aids in this 21 Representative Results: Conditional Probability of Performance Level 19 Latent Variables Observable for Level 19 Add. Unit Iterating Fractions Error Master ~Possess Stand. Alt. Inc. Wrong Unknown Error Solution Solut. Solut. Numer. 0.95 0.00 0.01 0.03 0.01 Nonmaster ~Possess 0.58 0.02 0.01 0.25 0.13 Master Possess 0.77 0.00 0.01 0.21 0.01 Nonmaster Possess 0.33 0.01 0.01 0.58 0.07 22 Representative Results: Transition Probability for Adding Unit Fractions, Level 19 Latent Variable Add. Unit Fractions at Time t Master Nonmaster Observable for Level 19 Stand. Alt. Inc. Wrong Unknown Error Solution Solut. Solut. Numer. 1 1 1 1 1 .38 .17 .19 .20 .09 23 Representative Results: Transition Probability for Iterating Error, Level 19 Latent Variable Iterating Error at Time t Possess Not Possess Observable for Level 19 Stand. Alt. Inc. Wrong Unknown Error Solution Solut. Solut. Numer. .51 .51 .51 .29 .46 0 0 0 .15 .14 24 1.0 0.6 0.2 P(Mastery/Po Model-Based Reasoning About Students During Gameplay 1 3 5 6 7 8 9 10 11 13 15 16 19 22 Attempt Whole Numbers Unit Fractions Whole Numbers and Unit Fractions Crossing Unit Bar Add Unit Fractions Add Improper Fractions 25 Model-Based Reasoning About Students During Gameplay 26 Concluding Thoughts 27 Discussion • GBA argumentation: Openness of workspace, data collection offers enormous potential • Appeal is also its challenge—what to pay attention to, how to reason to desired inferences, claims • Challenges to traditional psychometrics – Feedback, learning during gameplay, lack of conditional independence • DBNs as flexible, accommodate complexities of GBA not encountered in traditional assessment – Hard and soft constraints reflecting theory – Bayesian approach offers theoretical, practical advantages • Psychometric models as imperfect representation of assessor’s imperfect thinking 28 Thank You! [email protected] 29 References Levy, R. (2014). Dynamic Bayesian network modeling of game based diagnostic assessments. (CRESST Report 837). Los Angeles, CA: University of California, National Center for Research on Evaluation, Standards, Student Testing (CRESST). Bradshaw, L., & Templin, J. (in press). Combining item response theory and diagnostic classification models: A psychometric model for scaling ability and diagnosing misconceptions. Psychometrika. Chung, G. K. W. K., Baker, E. L., Vendlinski, T. P., Buschang, R. E., Delacruz, G. C., Michiuye, J. K., & Bittick, S. J. (2010, April). Testing instructional design variations in a prototype math game. In R. Atkinson (Chair), Current perspectives from three national R&D centers focused on game-based learning: Issues in learning, instruction, assessment, and game design. Structured poster session at the annual meeting of the American Educational Research Association, Denver, CO. Delacruz, G. C., Chung, G. K. W. K., & Baker, E. L. (2010). Validity evidence for games as assessment environments (CRESST Research Report No. 773). Los Angeles: National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA. Kerr, D., & Chung, G. K. W. K. (2012a). Identifying key features of student performance in educational video games and simulations through cluster analysis. Journal of Eductional Data Mining, 4, 144-182. Kerr, D., & Chung, G. K. W. K. (2012b). The mediation effect of in-game performance between prior knowledge and posttest score (CRESST Research Report No. 819). Los Angeles: National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA. Kerr, D., Chung, G. K. W. K., & Iseli, M. R. (2011). The feasibility of using cluster analysis to examine log data from educational video games (CRESST Research Report No. 790). Los Angeles: National Center for Research on Evaluation, Standards, Student Testing (CRESST), Center for Studies in Education, UCLA. Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer. von Davier, M. (2005). A general diagnostic model applied to language testing data (RR-05-16). Princeton, NJ: 30 Educational Testing Service.
© Copyright 2026 Paperzz