Poster

Identifying Students’ Gradual Understanding
of Physics Concepts
Using TagHelper Tools
Nava L. Livne
[email protected]
Oren E. Livne
[email protected]
University of Utah
PSLC Summer School, June 21, 2007
1 / 12
Driving Research Question
Can a machine identify students’ gradual understanding of physics concepts?
Hypothesis - IBAT learning model: Students learn in four stages.
Student Conceptual Learning
Transfer principles to complex scenarios
Advanced principles
Basic notions
Ignoring irrelevant data
Time
PSLC Summer School, June 21, 2007
2 / 12
Outline
• Data Collection
• Students’ constructed responses to physics questions
• Human teacher response classification = the reference for analysis
• Data Analysis
• TagHelper Tools
• Discriminatory classifiers: Naïve Bayes, SMO
• User-defined features
• Results
• Discussion
• How well do TagHelper Tools delineate the four stages of students’
conceptual understanding?
• Lessons Learned from the Summer School & TagHelper Tools
PSLC Summer School, June 21, 2007
3 / 12
Data Collection
• Data unit = student constructed response to open-ended physics question:
“Acceleration is defined as the final amount subtracted from the initial
amount divided by the time.”
840 student responses collected
Development Set: 420
randomly selected responses
Validation Set: the other 420
responses
• Responses were classified by human teachers into 55 concepts,
aggregated into four main categories.
• Irrelevant
• Basic notions: e.g. no gravity in vacuum, definition of force
• Advanced principles: e.g. zero net force [implies body at rest]
• Complex scenarios: e.g. man drops keys in an elevator
PSLC Summer School, June 21, 2007
4 / 12
Data Analysis: Rationale
• TagHelper Tools can analyze any text response; which algorithm and option
set is best for this type of data set?
• Objective: detect four ordered stages
use a discriminatory classifier
• Naïve Bayes: uses cumulative evidence to distinguish among records
• Support Vector Machines (SMO): finds distinguished groups in data
• Models must exhibit reasonable predictions for both the training and
validation sets to ensure reliability
• User features should mainly delineate among scenarios
• ANY ( EGG , CLOWN )
• ALL ( PUMPKIN , ANY ( PERSON , MAN ) )
• ANY ( KEYS , ELEVATOR)
• Shooting for reliability index κ ~ 0.6-0.7
PSLC Summer School, June 21, 2007
5 / 12
Data Analysis: Models
• Best models
• Model A: Naïve Bayes, no POS, no user-defined features
• Model B: Naïve Bayes, no POS, with user-defined features
• Model C: SMO, no POS, exponent = 2.0, no user-defined features
• Model D: SMO, no POS, exponent = 2.0, with user-defined features
• Procedure
• Models were trained on the development set using cross-validation
• Evaluation measures: κ (>0.5), % Correctly Classified Instances (> 60%)
• If measures were reasonable, model was further tested on validation set
PSLC Summer School, June 21, 2007
6 / 12
Results on Development Set*
Correctly
Classified
Instances
Kappa (κ)
reliability index
A (NB)
71%
0.544
B (NB + user features)
72%
0.570
C (SMO)
73%
0.598
D (SMO + user features)
76%
0.636
Model
* The model was trained on the development set by dividing it into 10
chunks and running cross-validation among the chunks.
PSLC Summer School, June 21, 2007
7 / 12
Results: Development vs. Validation Set
Model
Correctly Classified
Instances –
Development Set
Correctly Classified
Instances –
Validation Set
A (NB)
71%
67%
B (NB + user features)
72%
50%
C (SMO)
73%
48%
D (SMO + user features)
76%
35%
PSLC Summer School, June 21, 2007
8 / 12
Discussion #1
• Best model was Naïve Bayes with no user-defined features; it had the lowest
κ for the development set, but the highest prediction for the validation set and
uniform overall performance.
Watch out for and optimize development/validation tradeoff
• Why didn’t the models generalize well? This may be due to the large skew of
the data, causing a large variability even between the development and
validation sets. Data skew is evident when optimizing the SMO exponent (for
non-skewed data, the optimal exponent=1; here it is 2). This may also be the
reason why SMO was not superior to NB.
Check data skew (indicated by optimal SMO exponent not equal to 1)
• Analysis on the non-aggregated 55 concepts resulted in a higher κ = 0.61,
however the confusion matrix is much larger. Difficult to interpret errors.
Strive for a small number of distinct categories
PSLC Summer School, June 21, 2007
9 / 12
Discussion #2: Error Analysis
Error analysis provides a fine-grained perspective of the data and sheds light
on the characteristic error patterns made by TagHelper.
• Identify large entries in the confusion matrix
• Look at response examples that represent dominant error types
• Design user features to eliminate the errors
I
B
A
T
I
33
2
5
15
B
2
12
0
7
A = Advanced principles
A
12
0
153 20
T = transfer to complex scenarios
T
28
8
26
Notation:
I = Irrelevant responses
B = Basic notions
PSLC Summer School, June 21, 2007
95
10 / 12
Summary
•
In short, the answer to the driving research question is
YES, A MACHINE CAN IDENTIFY STUDENTS’ GRADUAL LEARNING IN
PHYSICS.
•
Students develop their conceptual understanding in physics in four stages, that
correspond to the four categories found in the data (see page 2):
1. Learning to ignore irrelevant data and focus on the relevant knowledge
components
2. Getting familiar with basic notions.
3. Learning advanced principles that use the basic notions.
4. Transfer of the principles to complex real-life scenarios. Each scenario is
likely to involve multiple principles.
PSLC Summer School, June 21, 2007
11 / 12
Lessons Learned
• TagHelper Tools can distinguish between different data categories that
represent different knowledge components.
• There is a trade-off between fitting to training set and performance on
validation set. We chose the model that optimized this trade-off.
• The quality of conclusions is limited by the quality of the data. In our case the
model validation was reasonable, because the responses were drawn from
multiple students but the individual students were not indicated.
• TagHelper Tools is a state-of-the-art machine learning framework, but its
analysis is limited to identifying structured patterns within its feature space.
The default feature space includes simple patterns only, but adding creative
user features is the key to making TagHelper Tools even more powerful.
• Future directions may generalize TagHelper Tools to more flexible types of
structural text patterns and incorporating imported data from other parsers
(e.g. mathematical expression parsers).
PSLC Summer School, June 21, 2007
12 / 12