Learning rate - WPI - Worcester Polytechnic Institute

Determining the Significance of Item
Order In Randomized Problem Sets
Zachary A. Pardos, Neil T. Heffernan
Worcester Polytechnic Institute
Department of Computer Science
1
The Problem
• In which order should we present tutor content to students?
• Many problem sets in ITS where items given in a random order
• Randomizing item order is mostly done when there is not an
obvious ordering of items that would benefit learning.
•Can we data mine user responses to infer orderings that are
reliably more beneficial to learning than others?
Pardos, Z. A., Heffernan, N. T. In Press (2009) Detecting the Learning Value of Items In a Randomized Problem Set. In
Proceedings of the 14th International Conference on Artificial Intelligence in Education. Brighton, UK. IOS Press.
2
Solution Approach
• Possible approach: evaluate each sequence for learning value
seq1
Probability of learning: 0.13
seq2
Probability of learning: 0.19
• In this paper we evaluate the learning rates of ordered item pairs, such as
should Q1 go before Q2 or should Q2 go before Q1
Item pair
(1,2)
Probability of
learning: 0.09
Item pair
(2,1)
Probability of
learning: 0.14
• If multiple reliable orderings are
found, a full sequence could be
determined to be best for learning.
3
Solution Application Example
Learning rate: 0.14
(
,
Learning rate: 0.09
)>(
Learning rate: 0.15
(
,
Learning rate: 0.17
)<(
Learning rate: 0.17
(
,
,
)
)>
(
(
,
)
)
,
,
)
4
Model
Parameters can be learned
Modeling or measuring learning requires modeling knowledge
with the EM algorithm! .. ?
• Knowledge Tracing used to model learning
Parameters
P(Skill: 0 → 1)
P(Skill: 0 → 1)
(probability of learning)
(guess/slip)
Latent
(skill knowledge)
(dichotomous)
S
S
S
P(correct| Skill = 0)
P(incorrect| Skill = 1)
Observables
(question answers)
incorrect
correct
correct
5
Model
The six sequence permutations modeled with shared Bayesian parameters
Also known as Equivalence classes of CPTs (conditional probability tables )
Novel contribution of paper: Harnessing the power of randomization to help
estimate accurate parameters using all response data
6
Reliability measure
• Data for a problem set randomly split into 10 equal size bins by
student
• Each bin was evaluated separately by the model
• Binomial test used to estimate the probability of the null
hypothesis, that each ordering is equally likely to have the highest
learning rate
• ie: binopdf(best_choice_mode,20,0.25)
Ordered pair learning rates
...
...
(1,3)
0.642
...
(2,3)
0.0379
...
(1,2)
0.0701
...
(3,1)
0.0837
...
(2,1)
0.0267
...
Split 1
(3,2)
0.0732
Split 10
0.0849
0.0512
0.0550
0.0710
0.0768
0.0824
7
Dataset
Main problem
hint
• Student main problem responses
(correct/incorrect) to 5 problem
sets of 3 questions each
• Questions within a problem set
relate to the same skill
• 295-674 students completed each
problem set in 2006-2007 school
year data
• Questions in the problem sets
were presented in a randomized
order (required for this analysis)
8
Confound
Main problem
hint
• Since only main question responses
are being analyzed, the learning
from the main question is
confounded with the learning from
the scaffolding and hints of the
problem.
• In an item pair, learning could be
attributed to
• The immediate feedback to the
main problem of question 1
• The scaffolding of question 1
• Applying concepts from question
1 on question 2’s main problem
9
Results
• Of the 5 problem sets evaluated, two returned statistically reliable orderings
Learning probabilities of Item Pairs
(2,1)
(3,1)
(1,2)
(2,3)
Problem
Set
24
Users
(3,2)
(1,3)
403
0.1620
0.0948
0.0793
0.0850
0.0754
0.0896
36
419
0.1507
0.1679
0.0685
0.1179
0.1274
0.1371
Reliable
Rules
(3,2) >
(2,3)
(1,3) >
(3,1)
•Other item relationships could be tested
•In Problem Set 36: (2,1) > (3,1) in 10 out of 10 of the bins
10
Results
Guess and Slip values per question
Problem Set 24
Question # Guess
Slip
Problem Set 36
Guess
Slip
1
2
3
0.33
0.31
0.20
0.17
0.31
0.23
0.18
0.08
0.17
0.13
0.10
0.08
• Values are within reasonable range (< .50)
• Same problem sets run with AIED and sequence model
•Same guess and slip values were returned
•Indicates high stability in parameter estimation among methods
11
Simulation Validation
• Since ground truth of learning rates in the real world are impossible to
know, a simulation study was run
• The simulation set a variety of values for the parameters of prior,
guess/slip and learning rates and then simulated user responses
• These responses could then be analyzed by the method using the same
technique as was used on real data
• 160 simulations run using different combinations of parameters
• Parameters for the simulation drawn from a distribution fit to a previous
year’s analysis of ASSISTment data.
Parameter type
Mean
Std
Beta dist
α
Beta dist
β
Learning rate
0.086
0.063
0.0652
0.6738
Guess
0.144
0.383
0.0170
0.5909
Slip
0.090
0.031
0.0170
0.6499
12
Simulation Results
• More data leads to more reliable rules found
• The rate of false positives remains low, independent of number of users
• Average false positive is 6.3%, very close to the 5% p-value cutoff of our reliability estimator
• Simulation suggests that the results are trustworthy
13
Limitations
• Only problem sets of five questions or less can be
reasonably evaluated
– Larger problem sets become intractable to compute due to
the exponential increase in nodes and permutations as
question count increases
• for a four question set (4+4)*24 = 192 nodes
• for a five question set (5+5) *120 = 1,200 nodes
– Possible optimization is to only model the sequences for
which there is data
• Randomization of question order must be present to
control for factors including problem difficulty and
allow for detecting learning rates of all item pairs in the
problem set
14
Conclusions & Future Work
• We think that this method, and ones built off of it, will
facilitate better tutoring systems
• Randomization gives many of the properties of a RCE.
This method can perform a similar function but in the
form of data mining.
• Best orderings might have a variety of reasons for
existing. Applying this method to investigate those
reasons could inform content authors and scientists on
best practices in much the same way as randomized
controlled experiments do but by utilizing the far more
economical means of investigation which is data mining.
15