Feature Selection for Automated Speech Scoring Anastassia Loukina, Klaus Zechner, Lei Chen, Michael Heilman* Educational Testing Service *CIVIS Analytics Copyright © 2015 by Educational Testing Service. 1 Overview • • • • • Motivation Data Scoring models Results Conclusion Copyright © 2015 by Educational Testing Service. Context and motivation • Scoring of constructed responses -- speech • Computation of features using NLP + speech technology, using speech recognition and signal processing outputs • Predict scores using supervised machine learning • Educational measurement: managing trade-off: - Maximize empirical performance - Maximize model interpretability Copyright © 2015 by Educational Testing Service. Ideal Properties of Scoring Models • High empirical performance • Contains features that evaluate all relevant aspects of the test construct • Relative Contribution by each feature should be obvious • Inter-correlations between features not too high • Polarity of feature weights correspond to their meaning • Smaller and simpler is better (interpretability) Copyright © 2015 by Educational Testing Service. Linear Regression Scoring Models Built by Human Experts • Straightforward and well-known in all disciplines • Allow to address most requirements of ideal scoring models • Disadvantage: cumbersome development due to manual selection of features and checking for all constraints Copyright © 2015 by Educational Testing Service. Proposed Model • Explore alternative regression models, e.g., shrinkage methods • Can do feature selection automatically while still addressing ideal model constraints Copyright © 2015 by Educational Testing Service. 6 Data • Spoken English proficiency test • Spontaneous speech, ~1 minute per response • Score scale: 1 – 4 Data Set Speakers Responses H-H Correlation Train 9,312 9,956 0.63 Eval 8,101 47,642 0.62 Copyright © 2015 by Educational Testing Service. Features • 75 features extracted for each response via SpeechRater • Construct dimensions: – fluency – pronunciation accuracy – prosody – grammar – vocabulary • Dimensions not covered: content, discourse Copyright © 2015 by Educational Testing Service. Scoring Models 1. 2. 3. 4. 5. Baseline: human expert (12 features) All features using OLS regression Hybrid stepwise regression Non-negative least-square regression Non-negative LASSO regression (LASSO*; lambda optimized to obtain a feature set size of about 25) Copyright © 2015 by Educational Testing Service. LASSO • • • • Shrinkage model – dimensionality reduction Penalty for larger coefficients Sets subset of coefficients to zero Lambda-parameter: if zero: yields OLS model; if infinity: yields model with no features • Determined optimal lambda empirically (Target number of features where performance flattens out) Copyright © 2015 by Educational Testing Service. Crossvalidation Results Model Features Negative Coeffs Correlation Expert baseline 12 No 0.606 All OLS 75 Yes 0.667 Hybrid stepwise ~40 Yes 0.667 Non-neg Ls ~35 No 0.655 LASSO* ~25 No 0.649 Copyright © 2015 by Educational Testing Service. Results on Evaluation Set Model Features Item Corr Speaker Corr Expert baseline 12 0.61 0.78 All OLS 75 0.67 0.86 LASSO* 25 0.65 0.84 Copyright © 2015 by Educational Testing Service. Construct Coverage Comparison • Adding relative standardized beta-weights Construct Expert Lasso* Fluency 0.580 0.527 Pronunciation accuracy 0.098 0.151 Prosody 0.080 0.035 Total for Delivery 0.759 0.712 Grammar 0.155 0.103 Vocabulary 0.086 0.183 Total for Language Use 0.241 0.286 Copyright © 2015 by Educational Testing Service. Summary • Building scoring models for constructed responses in line with best practices in educational measurement is a complex task of constraint satisfaction • Therefore, this Task has been typically performed by human experts • Our study demonstrates the viability of using automated methods of feature selection that can satisfy multiple requirements of ideal scoring models • LASSO* model is more accurate, has very similar construct coverage compared to expert baseline and is highly interpretable Copyright © 2015 by Educational Testing Service.
© Copyright 2026 Paperzz