Chapter 2 Case-Based Reasoning Techniques

Chapter 2 Basic Case-Based Reasoning Techniques
2.1 Case Representation
A case is “a contextualized piece of knowledge representing an experience that
teaches a lesson fundamental to archiving the goal of the reasoner” [Kolodner, 1993].
Typically, there are three major parts of a case:

Problem description: the state of the world while the case is happening and what
problem needed solving at the time

Solution: the stated or derived solution to the problem

Outcome: the resulting state of the world after the case occurred.
Table 2-1 lists the content of the above major parts of a case.
Table 2-1 The Content of the Major Parts of a Case [Kolodner, 1993]
Major Parts
Problem Description
Solution
Outcome
Contents
1. Goals to be achieved
2. Constraints on the goals
3. Features of the problem situation and relationship
between its parts
1. Solutions
2. Reasoning steps
3. The set justifications for decisions
1. The outcome it self
2. Explanation of the expectation violation and/or
failure
3. Repair strategy
4. Pointer to next attempt at solution
Another way to describe case presentations is to visualize the structure in terms of
the problem space and the solution space [Watson, 1997 and Doyle, 1998]. Figure 2
illustrates the structure. According to this structure, the description of a problem resides
10
in the problem space. The retrieval process identifies the features of the case with the
most similar problem. When the best matching is found, the system uses similarity
metrics to find the best matching case. In those processes, the solution of a case with the
most similar problem may have to be adapted to solve the new problem.
Problem Space
R
= description of new problem to solve
= description of solved problems
= stored solutions
= new solution created by adaptation
A
A = Adaptation
R = Retrieval
Solution Space
Figure 2-1 Problem and solution spaces [Watson, 1997 and Doyle, 1998]
2.2 Case Indexing
An index is a computational data structure that can be stored in memory and
searched quickly. Case indexing involves assigning indexes to cases to facilitate their
retrieval [Watson, 1995]. The CBR community proposed several guidelines on indexing
[Hammond, 89 and Kolodner, 93]:
11
1. Indexes should be predictive.
2. Predictions that can be made should be useful ones, that is, they should
address the purposes the case will be used for.
3. Indexes should be abstract enough to make a case useful for future cases.
4. Indexes should be concrete enough to be recognized in the future.
Methodologies for choosing indexing includes manual and automated methods. In
some systems, cases are indexed by hand. For example, when the cases are complex and
the knowledge needed to understand cases well enough to choose indexes accurately is
not concretely available, hand indexing is needed [Kolodner, 1993]. On the other hand, if
problem solving and understanding are already automated, it is advantageous to use
automated indexing methods.
2.3 Case Retrieval
Case retrieval is a process that a retrieval algorithm retrieves the most similar
cases to the current problem. Case retrieval requires a combination of search and
matching. In general, two retrieval techniques are used by the major CBR applications:
nearest neighbor retrieval algorithm and inductive retrieval algorithm.
2.3.1
Nearest-Neighbor Retrieval
Nearest-neighbor retrieval is a simple approach that computes the similarity
between stored cases and new input case based on weight features. A typical evaluation
function is used to compute nearest-neighbor matching [Kolodner, 1993] as shown in
Figure 2-2:
12
n
 w  sim( f
i
similarity (CaseI , CaseR ) 
i 1
I
i
, f iR )
n
w
i
i 1
Figure 2-2 A nearest-neighbor evaluation function
Where wi is the importance weight of a feature, sim is the similarity function of features,
and fiI and fiR are the values for feature i in the input and retrieved cases respectively.
Figure 2-2 displays a simple scheme for nearest-neighbor matching. In this 2-dimensional
space, case3 is selected as the nearest neighbor because similarity(NC, case3)>
similarity(NC, case1) and similarity(NC, case3)> similarity(NC, case2).
feature2
NC - New Case
case1
similarity(NC, case1)
NC
similarity(NC, case3)
case3
case2
similarity(NC, case2)
feature1
Figure 2-3 How to find the nearest neighbor of the new case NC.
2.3.2
Inductive Retrieval
Inductive retrieval algorithm is a technique that determines which features do the
best job in discriminating cases and generates a decision tree type structure to organize
the cases in memory [Watson, 1997]. This approach is very useful when a single case
feature is required as a solution, and when that case feature is dependent upon others.
Here is a completed decision tree (see Figure 2-4) generated from the data in Table 2-2
[Watson, 1997]. The task is to predict the status of a loan from features of the loan
applicant (income, job status and repayment).
13
Table 2-2 Four loan cases [Watson, 1997]
Case No.
Loan Status
Monthly Income
Job Status
Repayment
Case 1
Good
$2000
Salaried
$200
Case 2
Very bad
$4000
Salaried
$600
Case 3
Very good
$3000
Waged
$300
Case 4
Bad
$1500
Salaried
$400
Repayment < $400
Yes
No
Job status
Income > $1500
Salaried
W aged
Yes
Case 1
Case 3
Case 2
No
Case 4
Figure 2-4 The completed decision tree
Table 2-3 A Target Case
Case No.
Loan Status
Monthly Income
Job Status
Repayment
Case X
?
$1000
Salaried
$600
14
If a target case were presented as shown in Table 2-3, to determine the loan status
of the target case, the algorithm would traverse the decision tree and search for the best
matching case in the case-base. For the given the loan repayment, the algorithm first
selects the left branch. After this, the algorithm traverses to the node (Income>$1500)
and selects the left branching according the monthly income. We can therefore predict
that the best matching case is Case 4. This suggests that the loan prospect is bad because
Case 4’s outcome is bad.
2.3.3
Nearest-Neighbor Retrieval vs. Inductive Retrieval
Nearest-neighbor retrieval and inductive retrieval are widely applied in CBR
applications and tools. Table 2-4 shows strengths and weakness of two techniques. The
choice between nearest-neighbor retrieval and inductive retrieval in CBR applications
requires experience and experimentation. Usually, it is a good choice using nearestneighbor retrieval without any preindexing [Watson, 1997]. If retrieval time becomes an
important issue, inductive retrieval is preferable.
Table 2-4 Comparison between nearest-neighbor retrieval and inductive retrieval
Retrieval Techniques
Nearest Neighbor Retrieval
Inductive Retrieval
Strength
Weakness
Slow retrieval speed when the case base
is large
Simple
Fast retrieval speed
1. Depends on pre-indexing which is a
time-consuming process
2. Impossible to retrieval a case while
case data is missing or unknown
In some CBR tools, both techniques are used: inductive indexing is used to
retrieve a set of matching cases, then nearest-neighbor is used to rank the cases in the set
according to the similarity to the target case.
15