AIEDpaper-ModelTracing-final

Kodaganallur, V., Wietz, R., Heffernan, N. T., & Rosenthal, D. (Submitted). Approaches to
Model-Tracing in Cognitive Tutors. (Eds) Proceedings of the 13th Conference on Artificial
Intelligence in Education. IOS Press.
Approaches to Model-Tracing in Cognitive
Tutors
Viswanathan KODAGANALLUR1, Rob WEITZ1, Neil HEFFERNAN2, David
ROSENTHAL1
1
School of Business, Seton Hall University, South Orange, NJ 07079
2
Computer Science Department, Worchester Polytechnic Institute, Worchester, MA 01609
Abstract. Cognitive (or Model-Tracing) Tutors, a type of intelligent tutor, have
demonstrated effectiveness in helping students learn. Model tracing algorithms are
central to cognitive tutors, but have not been the focus of much published research. In
this paper we briefly review the existing approaches and suggest an alternative
approach that is very simple, but is suitable when tracing each student action requires
just a single rule. In this approach the process of rule execution alone suffices for
model-tracing; this eliminates the need for a costly tree/graph search over the search
space of the problem’s state-transitions. We also address the issue of goal structure and
suggest a way of writing the production rules based on top-down goal decomposition.
Contrary to common practice, this approach views the goal structure of a problem as a
hierarchical rather than a linear structure and hence serves to provide the student with a
richer goal model.
1. Introduction
Cognitive Tutors [1, 2] have been successfully deployed in a wide range of domains including
college-level physics, high school algebra, geometry, and computer programming. (See [3] for
an overview.) The underlying paradigm of cognitive tutors has its origins in the ACT-R theory
[4]. According to ACT-R, “acquiring cognitive knowledge involves the formulation of
thousands of rules relating task goals and task states to actions and consequences.” Generally,
the claim for cognitive tutors is that their use results in as much as a one standard deviation
improvement in student performance beyond standard classroom instruction.
Intelligent tutoring is generally set in a problem-solving context – the student is
presented with a problem and the tutor provides feedback as the student works. Cognitive
tutors are also able to provide planning advice when the student is unsure about how to
proceed from a given situation. At the heart of a cognitive tutor is a knowledge base
consisting of production rules. Cognitive tutors use a model-tracing algorithm to identify the
rules that the student appears to be using and are also called model-tracing tutors for this
reason.
Although model-tracing is a very important part of cognitive tutors, algorithms for
model-tracing have not been the focus of much published research. One such algorithm is
described in [5] and [6]. In this paper we review the model-tracing approaches that have been
employed in fielded tutors and suggest a new approach that is simpler, but is applicable only
under special conditions. Specifically, the proposed approach works only when each student
action can be traced by the application of a single rule. It therefore is suitable in situations
where the problem has been broken down into elementary steps and the student is expected to
proceed step by step. This holds, for example, for the canonical addition problem ([4]). We
have also used it to build a tutor for statistical hypothesis testing [3]. This approach allows for
very targeted remediation.
As part of the approach, we also suggest a way of writing rules that emphasize topdown goal decomposition such that the goal hierarchy of the problem is clearly enshrined in
the rules. This enables the tutor to naturally give strong procedural remediation and to
communicate the goal hierarchy explicitly to the student.
Section 2 briefly reviews model-tracing and the approaches that have been employed
thus far. Section 3 describes the proposed approach and section 4 describes our approach to
top-down goal decomposition.
2. Current Approaches to Model-Tracing
Cognitive tutors aim to provide remediation by inferring the production rules that a student
used in order to arrive at a given problem state. They do this by modeling the problem solving
process (both valid and invalid) by a set of production rules. Some of these rules are expert
rules or rules that a competent problem solver in the domain might adopt. Others are buggy
rules, or rules that model faulty reasoning that is known to occur in real problem solving
contexts. When the tutor is given a student’s solution, it traces the solution by identifying a set
of production rules that could have generated it. If such a set of rules is found, and it uses only
expert rules, then the student has not committed any mistakes. On the other hand if the trace
employs one or more buggy rules, then the tutor knows exactly what conceptual errors the
student committed and can provide appropriate remediation. While it is possible to design
cognitive tutors that evaluate a student’s actions only when explicitly requested, it is common
for cognitive tutors to provide immediate feedback upon each student action.
Abstractly, the set of production rules can be seen as forming a directed acyclic graph
with the problem states as nodes and state-transitions as arcs (Table 1). The graph models all
anticipated states and state-transitions for a particular problem or set of similar problems. A
production rule corresponding to the arc between nodes n1 and n3 would be:
IF
the current state is n1
THEN
move to state n3
Although there could conceivably be a large number of possible states that could follow n1, it
is necessary only to provide productions for states that the tutor author expects some
student(s) to reach
Suppose a student who was in state s moved to state t, then viewed in the context of
Table 1, the role of a model tracing algorithm is to identify a path from s to t in the graph. It is
quite possible that there are several such paths because several combinations of rule firings
could lead from s to t. In the latter case, the tutor would have to employ some mechanism to
choose one of the paths for remediation if needed.
Table 1: Problem state-space viewed as directed acyclic graph
The first approach to model-tracing was based on the work of Pelletier [5] who created
Tertl, a goal-driven production system. Pelletier developed a forward chaining production
system and incorporated a model-tracing algorithm into it. The unit of computation in the
Tertl system is a cycle. In each cycle the system first generates a conflict tree which enables it
to find various combinations of production rules that could lead to the goal (or state reached
by the student). It then goes through a resolution phase in which it chooses one among the
possibly many combinations of rule firings. Finally it commits the chosen solution by firing
the rules in the chosen combination. This approach to model-tracing was used in the Tutor
Development Kit, or TDK, [6] used in the Human Computer Interaction Institute (HCI) at
Carnegie Mellon University for many years to build cognitive tutors.
More recently the use of TDK has been discontinued and researchers at HCI have
shifted to using CTAT, the Cognitive Tutor Authoring Tools ([7]). CTAT is based on JESS,
the Java Expert System Shell [8], a forward chaining production system. Unlike Tertl, JESS
does not have a built-in model-tracing element. In CTAT a domain-independent modeltracing algorithm works in consort with a set of domain-specific production rules written in
the JESS language. These production rules specify, for each significant problem state that the
student can reach, the possible successor states [9]. In terms of Table 1, CTAT searches for a
path from s to t by depth-first iterative-deepening (DFID) [10]. It first finds the list of rules
that can be fired from state s and checks if firing any of these leads to the state t (the state is
reset to s before each new rule in the list is tried). If it does, then the input has been traced and
the associated rule is known. If a single rule firing is insufficient to trace the input, then it
considers all possible two-rule firings from state s to get t (again, the state is reset to s before
each two-rule sequence is tried). This process is continued, while progressively increasing the
number of rule firings (search depth), until either a set of rule firings is found, or it is clear that
t cannot be reached from s. The whole DFID search is a separate process that repeatedly
invokes JESS through its Java API. The DFID process can be costly when either the rule
depth is large or when the number of productions is large. It is also known that DFID can
perform duplicate node expansions when the search space is a strict graph. This can be a
performance disadvantage as well.
The TDK and the CTAT approaches are both dynamic approaches in the sense that they
generate the nodes in the relevant portion of the graph on each invocation. An approach that
trades-off space for time would be one that pre-generates all possible nodes of the state-space
and stores them in some indexed form for easy retrieval along with their associated rule
combinations. If this is done, then the process of model-tracing reduces to one of indexed
table lookup. Given that memory has become extremely cheap, this approach for speeding run
time might be feasible even for problems with very large state-spaces.
3. New Approach To Model Tracing
Although the task of model-tracing, in the general case, involves identifying several ruleinvocations to account for a single student action, there are several situations where this might
not be necessary. In light of Anderson’s [4] finding that immediate feedback is more effective,
it might be pedagogically useful in certain situations to break up the problem into several
small steps and make the student follow these steps while providing remediation along the
way. This might be especially true in tutors intended for beginning learners of a discipline.
When such an approach is adopted, it often turns out that the task of model-tracing becomes a
lot simpler as most student actions can be traced to just a single rule application. Based on this
reasoning, we present an approach to model-tracing wherein model-tracing is accomplished
simply by the process of a rule-engine executing rules – no additional algorithm is needed. In
this sense it is simpler.
The productions in the new approach are similar to those used in CTAT, but the
antecedents of some of the productions are augmented with a check to see if the state
anticipated by the production is the state reached by the student. If the states match, then the
rule employed by the student has been identified and the student’s input can be considered to
have been traced. Of course this approach works only if each student action can be traced by
the application of exactly one rule. An example production (with the additional antecedent in
the production shown in bold-italic typeface) is:
IF
the current state is n1
and the student has moved to state n3
THEN
provide appropriate feedback
In the traditional way of writing the rule, the bold portion of the rule antecedent would
be absent and the rule consequent would be “then move to state n3”. With the productions
augmented in this manner, the comparison of the student’s action with the actions anticipated
by the rule is done within the rule engine itself and a costly external search is not needed. The
normal process of rule execution alone suffices for model-tracing as well. The advantages are
that the system is simplified, and the power of the rule engine’s optimizations to check which
rules can be fired is brought to bear on the model-tracing process. An example of a rule
written in this way for the domain of statistical hypothesis testing is given in Table 2.
Table 2: Example of augmented production rules written in JESS
; Rule 1
(defrule decision-1pm-leq-1
(ready-to-decide)
(problem (problemType "1PM<=") (zAlphaRight ?cutoff&~nil) (zvalue ?z&~nil)
(test (> ?z ?cutoff))
(decision ?d&~nil&:(eq ?d "Reject null")))
=>
(addResult (fetch results) nil CORRECT (create$ "decision")))
Rule 1 in Table 2 deals with the situation when the student has calculated the cutoff and
z values for the problem and is therefore in a position to arrive at the final decision (either
“Reject the null hypothesis” or “Not reject the null hypothesis”). This rule applies when the
(sample) z value is greater than the cutoff (critical z) value, and the student has chosen the
option to “Reject the null hypothesis”. For this type of problem (as tested in the rule
antecedent), and the relative values of the cutoff and z values, this decision is correct. That is,
under the given conditions, the expert action is to “Reject the null hypothesis”. Instead of
putting this as an action in the rule consequent, we have included it in the rule antecedent
(bold portion). If this rule fires, then it implies that the student has indeed applied the
corresponding expert rule and hence the student’s action has been traced.
4. Top-Down Goal Decomposition
We found that (at least in some problem domains) it might be an advantage to model the
productions based on a top-down goal decomposition structure. McKendree [11] as shown the
important role played by goal structure in tutoring effectiveness. As an example of goal
structure, consider the expert goal structure of the statistical hypothesis testing problem ([12])
shown in Table 3. Each node in Table 3 represents a goal, and the nodes emanating from a
node represent its subgoals. The numbers attached to the subgoals indicate the sequence in
which they need to be satisfied, with identical sequence numbers at a given level indicating
indifference. In some instances, the ordering simply reflects pedagogically desirable
sequencing. (These ideas hold for any problem that requires a series of steps in the solution
and it is not required that the steps be followed in a single, sequential order. Another example
of such a problem domain is physics mechanics problems.) This approach stands in contrast to
a bottom-up scheme that just specifies the sequencing of the goals without any indication of
the hierarchical goal structure.
Table 3: Expert goal structure for hypothesis testing
Table 4 shows some of the production rules for the goal structure shown in Table 3. We
use a template called problem whose slots represent the state variables of the problem
(variables for which the student supplies values, and other variables that represent problem
data). We use the backward chaining capabilities of JESS to induce subgoals from a main goal
through JESS’ “backward chaining reactive” templates. Briefly, Rule 1 says that if the student
has successfully met all the subgoals needed for making a decision, and the decision is to
reject the null hypothesis, and the z value is greater than the cutoff value (both of which would
have been calculated while satisfying the subgoals), then mark the decision as correct. In the
conventional way of writing the productions, the antecedent would not have the clause dealing
with the decision slot; instead the consequent would state that the decision should be to reject
the null hypothesis.
At the start, when the student has done nothing, the ready-to-decide fact is unavailable,
but since it is declared as backward chaining reactive, JESS will try to fire a rule that can
assert it. It does this by automatically asserting a need-ready-to-decide fact, which is the first
antecedent of Rule 2. The following three antecedents are also of backward chaining reactive
templates and this causes further rule firings to try to get those facts asserted. Rule 2 shows
how the ready-to-decide fact is asserted once its subgoals are satisfied. Rule 3 demonstrates
how the backward chaining goes one step further into asserting the hypotheses-established
fact. In this manner JESS backward chaining is able to unfurl the goal structure of Table 3.
Based on Table 3, the very first act that the student can legally perform is to calculate a
value for MuZero. Although all the pertinent rules are not shown, if the rules are structured as
shown, Rule 4 (expert) or Rule 5 (bug) dealing with MuZero will eventually fire depending on
the value supplied by the student. Another possible bug rule in this context is that the student
first supplies the null hypothesis, before supplying a value for MuZero. Rule 6 shows this bug
rule.
Table 4 – Sample production rules for statistical hypothesis testing illustrating top-down goal
decomposition
; Rule 1
(defrule decision-1pm-leq-1
(ready-to-decide)
(problem (problemType "1PM<=") (zAlphaRight ?cutoff&~nil) (zvalue ?z&~nil)
(test (> ?z ?cutoff))
(decision ?d&~nil&:(eq ?d "Reject null")))
=>
(addResult (fetch results) nil CORRECT (create$ "decision")))
; Rule 2
(defrule decompose-ready-to-decide
(need-ready-to-decide)
(hypotheses-established)
(critical-value-computed)
(statistic-value-computed)
=>
(assert (ready-to-decide)))
; Rule 3
(defrule decompose-hypotheses-established
(need-hypotheses-established)
(null-hypothesis-established)
(alternate-hypothesis-established)
=>
(assert (hypotheses-established)))
; Similar rules to decompose null and alternate hypotheses not shown.
; Those rules will backward chain to Rule 4 below.
; Rule 4
(defrule mu-zero-correct
(need-mu-zero-computed)
(problem (muZero ?muZero) (studentMuZero ?muZero&~nil))
=>
(addResult (fetch results) nil CORRECT (create$ "studentMuZero"))
(assert (mu-zero-computed)))
;Rule 5
(defrule mu-zero-wrong
(need-mu-zero-computed)
(problem (muZero ?muZero) (studentMuZero
?smz&~nil&:(not-eq ?muZero ?smz)))
=>
(addResult (fetch results) "wrong muzero" WRONG (create$ "studentMuZero")))
;Rule 6
(defrule null-hyp-instead-of-mu-zero
(need-mu-zero-computed)
(problem (muZero nil) (nullHyp ?nh&~nil))
=>
(addResult (fetch results) "wrong muzero" WRONG (create$ "null-hypfor-muzero")))
The top-down goal decomposition shown in Table 4 enables the tutor to provide
planning advice via a set of guidance rules (example shown in Table 5). At some stage,
suppose that one possible next step for the student is to specify the null hypothesis, and the
student asks for guidance at this stage. Based on the subgoals already satisfied, the tutor
knows exactly where the student stands. Based on the goal decomposition that the rules
already fired imply, it can provide guidance along the lines of “In order to arrive at a decision,
you need to establish the hypotheses, calculate the critical value and calculate the statistic
value. In order to establish the hypotheses, you need to calculate MuZero, which you have
already done. You now need to establish the null hypothesis.” Rather than just telling the
student what the next step is, the tutor is able to do so in the context of the overall problem
goals. The rule’s consequent assembles the guidance message for the user interface based on
messages associated with each goal/subgoal.
Table 5 – Example of a guidance rule
(defrule null-hypothesis-guidance
(need-null-hypothesis-established)
(mu-zero-computed)
(problem (nullHypSign nil))
=>
(store-guidance))
4. Conclusions
Model tracing algorithms are central to cognitive tutors, but have not been a focus of much
published research. We have reviewed the approaches that have been taken thus far to the
problem. While the general task of model-tracing is complex, we have found that in many
cases it involves only a single rule. In such cases a simpler approach is appropriate and we
have presented such an approach in which the process of rule execution alone suffices for
model-tracing also. This eliminates the need for a costly tree/graph search over the search
space of the problem’s state-transitions. Furthermore, we have suggested a way of writing the
production rules based on top-down goal decomposition. This approach allows planning
advice to be automatically anchored around the problem’s goal structure, thereby providing
the student with a rationale for the next step, rather than just suggesting it.
Acknowledgment
The authors wish to gratefully acknowledge Ken Koedinger for his valuable feedback.
References
[1]
Koedinger, K. R. (2001). Cognitive tutors as modeling tool and instructional model. In Forbus, K. D. &
Feltovich, P. J. (Eds.) Smart Machines in Education: The Coming Revolution in Educational Technology, (pp.
145-168). Menlo Park, CA: AAAI/MIT Press.
[2]
Koedinger, K. and Anderson, J. (1997). Intelligent Tutoring Goes to School in the Big City. International
Journal of Artificial Intelligence in Education, 8, 30-43.
[3]
Kodaganallur. V, Weitz. R, Rosenthal. D. (2005) Comparison of Model tracing and Constraint Based
intelligent tutoring paradigms, International Journal of AI in Education, 15(2).
[4]
Anderson, J. R. (1993). Rules of the Mind. Erlbaum , Hillsdale, NJ.
[5]
Pelletier, Ray (1993). The TDK Production Rule System. Master Thesis, Carnegie MellonUniversity.
[6]
Anderson, J. R. & Pelletier, R. (1991). A Development System For Model-Tracing Tutors. In Proceedings
of the International Conference of the Learning Sciences, 1-8. Evanston, IL.
[7]
Aleven, V., McLaren, B. M., Sewall, J., & Koedinger, K. (2006). The Cognitive Tutor Authoring Tools
(CTAT): Preliminary evaluation of efficiency gains. In M. Ikeda, K. D. Ashley, & T. W. Chan (Eds.),
Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006), (pp. 61-70). Berlin:
Springer Verlag.
[8]
Friedman-Hill, E. (2003). Jess in Action: Rule-Based Systems in Java. Manning Publications, Greenwich,
CT. (See also the JESS homepage at: http://herzberg.ca.sandia.gov/jess/index.shtml.)
[9]
Koedinger, K. R., Aleven, V., McLaren, B., and VanLehn, K. Lecture notes, 1st Annual PSLC LearnLab
Summer School, June 27 - July 1, 2005, Carnegie Mellon University, Pittsburgh, PA.
[10] Korf, R. E. (1985), Depth-First Iterative-Deepening: An Optimal Admissible Tree Search. Artificial
intelligence, 27, 97-107.
[11] McKendree, J. E. (1990). Effective feedback content for tutoring complex skills. Human Computer
Interaction, 5, 381-414.
[12] Levine D.M., Stephan D., Krehbiel, T.C. and Berenson M.L. (2001). Statistics for Managers Using
Microsoft Excel (3rd edition), Upper Saddle River, New Jersey: Prentice Hall.