Complete Action Map or Best Action Map in Accuracy

Complete Action Map or Best Action Map in Accuracy
-based Reinforcement Learning Classifier Systems
Masaya Nakata
Pier Luca Lanzi
The University of Electro-Communications
Tokyo, Japan
Politecnico di Milano
Milano, Italy
[email protected]
Tim Kovacs
[email protected]
Keiki Takadama
University of Bristol
Bristol, UK
The University of Electro-Communications
Tokyo, Japan
[email protected]
[email protected]
ABSTRACT
1. INTRODUCTION
We study two existing Learning Classifier Systems (LCSs):
XCS, which has a complete map (which covers all actions
in each state), and XCSAM, which has a best action map
(which covers only the highest-return action in each state).
This allows XCSAM to learn with a smaller population size
limit (but larger population size) and to learn faster than
XCS on well-behaved tasks. However, many tasks have difficulties like noise and class imbalances. XCS and XCSAM
have not been compared on such problems before. This paper aims to discover which kind of map is more robust to
these difficulties. We apply them to a classification problem
(the multiplexer problem) with class imbalance, Gaussian
noise or alternating noise (where we return the reward for a
different action). We also compare them on real-world data
from the UCI repository without adding noise. We analyze
how XCSAM focuses on the best action map and introduce
a novel deletion mechanism that helps to evolve classifiers
towards a best action map. Results show the best action
map is more robust (has higher accuracy and sometimes
learns faster) in all cases except small amounts of alternating noise.
Learning Classifier Systems (LCSs) [7] are rule-based learning systems that evolve general rules (called classifiers) using Genetic Algorithms [6] to represent human-readable solutions. The accuracy-based LCSs [21] [1] are the most
popular classifier system model so far and have been successfully applied to a wide range of problem domains [2]
such as classification [8], regression [5], and sequential decision making problems [4]. Accuracy-based LCSs can be
classified to two types according to a difference of learning
mechanisms that they employ: Accuracy-based Reinforcement Learning Classifier Systems (ARLCS) which employ
reinforcement learning (RL) [18], i.e., the XCS classifier system [21], and Accuracy-based Supervised Learning Classifier
Systems (ASLCS) which employ supervised learning instead
of RL, i.e., the UCS classifier system [1]. While ASLCS is a
useful and robust model in supervised learning tasks [14], it
is not designed to be suitable for RL tasks.
When we design ARLCS for given problems, we have to
choose either one of two learning strategies: a complete action map which predicts the rewards of all actions in each
state and a best action map which in contrast covers only
the highest-return action in each state. Most real world applications have difficulties like noise and class imbalances in
their state-space and reward functions. Accordingly, a key
which can help to design LCS is to understand which strategy (complete or best-action map) is more robust to which
kind of difficulties.
Orriols-Puig shows that UCS (with a best action map) can
derive better performance than XCS (with a complete map)
on a class-imbalanced classification problem [14]. Stone and
Bull show the ZCS classifier system [20] (which is a strengthbased LCS and employs a best action map) exceeds performance of XCS on noisy problems [17]. Also, Fani shows a revised ZCS for data mining can derive higher classification accuracy than XCS in a classification problem with real world
data [19]. These results may suggest the best action map is
a more robust strategy than the complete action map. However, ZCS and UCS are not ARLCS. Specifically, as noted
previously, UCS can successfully identify a highest-return
action (i.e., a correct class) using supervised data. ZCS, like
ARLCS, employs reinforcement learning, but it may not correctly generalize classifiers because of its strength-based fitness. Strength-based fitness directly estimates reward (i.e.,
Categories and Subject Descriptors
I.2.6 [Artificial Intelligence]: Learning—knowledge acquisition,concept learning
General Terms
Algorithms, Experimentation, Performance
Keywords
Learning Classifier System, XCS, XCSAM, complete action
map, best action map, classification
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from [email protected].
GECCO’14, July 12–16, 2014, Vancouver, BC, Canada.
Copyright 2014 ACM 978-1-4503-2662-9/14/07 ...$15.00.
http://dx.doi.org/10.1145/2576768.2598351.
557
that in any state it can predict the effect of every possible
action in terms of expected returns.1
For each possible action a in [M], XCS computes the system prediction P (st , a) which estimates the payoff that XCS
expects if action a is performed in st . The system prediction
P (st , a) is computed as the fitness weighted average of the
predictions of classifiers in [M] which advocate action a:
Fk
P (st , a) =
pk × (1)
cli ∈[M ](a) Fi
high fitness means high estimated reward). This is useful
to identify the best action map but is less useful to find accurate classifiers. In contrast, the accuracy-based fitness of
ARLCS is calculated from an error in estimated prediction;
it is useful to accurately generalize classifiers but is very sensitive to noise (especially noise which is added to a reward).
This means accuracy-based fitness mistakenly leads to incorrect generalization in noisy problems. Accordingly, best
action maps may not be robust in class-imbalanced, noisy
and real-world classification.
In summary, previous studies have found ZCS and UCS
were more robust than XCS on class-imbalanced and noisy
tasks, but it is not clear if this is due to their best-action
maps, or due to their definition of fitness. This paper compares complete and best maps in systems with the same
definition of fitness; specifically, we compare two ARLCSs;
XCS and XCSAM [12] [11]. XCS evolves a complete action
map while XCSAM is a variation of XCS which evolves best
action maps. We apply them to versions of a classification
problem (the multiplexer) with class imbalance, Gaussian
noise or alternating noise. We also compare them on real
world data from the UCI repository [16]. From experimental
results we analyze how XCSAM focuses on the best action
map. We also introduce a novel deletion mechanism [9] that
helps to evolve classifiers towards the best action map.
The remainder of this paper is organized as follows: In sections 2 and 3 we describe mechanisms of XCS and XCSAM.
In section 4 we show experimental results of XCS and XCSAM on problems with noise and class-imbalance, and real
wold data. In section 5 we analyze the results of the XCSAM and introduce the deletion mechanism for XCSAM. In
section 6 we show experimental results for XCSAM with the
deletion mechanism. Finally, in section 7, we conclude this
paper by summarizing the contributions.
2.
clk ∈[M ](a)
where, [M](a) represents the subset of classifiers of [M] with
action a, pk identifies the prediction of classifier clk , and Fk
identifies the fitness of classifier clk . Then XCS selects an
action to perform; the classifiers in [M] which advocate the
selected action form the current action set [A].
The selected action at is performed, and a scalar reward
rt+1 is returned to XCS together with a new input st+1 .
When the reward rt+1 is received, the estimated payoff P (t)
is computed as follows:
P (t) = rt+1 + γ max P (st+1 , a)
a∈[M ]
where γ is the discount factor [18], which is 0 for single-step
tasks such as those in this work. Next, the parameters of
the classifiers in [A] are updated in the following order [3]:
prediction, prediction error, and finally fitness. Prediction p
is updated with learning rate β (0 ≤ β ≤ 1):
pk ← pk + β(P (t) − pk )
(3)
Then, the prediction error and classifier fitness are updated with the absolute accuracy κ:
k ← k + β(|P (t) − pk | − k )
κk =
XCS CLASSIFIER SYSTEM
The XCS classifier system maintains a population of rules
(“classifiers”) which represent the solution to a reinforcement
learning problem [10]. The following sub-sections explain
the classifiers and mechanisms of XCS.
2.1
(2)
1
α(k /0 )−ν
Fk ← Fk + β( if k ≤ 0 ,
otherwise.
κk × numk
− Fk )
κj × numj
(4)
(5)
(6)
clj ∈[A]
The GA is applied to classifiers in [A] [21]. It selects
two classifiers, copies them, and with probability χ performs
crossover on the copies; then it mutates each element in its
condition with probability μ.
Classifier
A classifier is a condition-action rule; a condition C is
coded by C ∈ {0, 1, #}L , where L is the length of condition,
and the symbol ‘#’ is the don’t care symbol which matches
all input values (i.e., 0 or 1). Classifiers consist of a condition, an action, and four main parameters [21, 3]: (i) the
prediction p, which estimates the relative payoff that the
system expects when the classifier is used; (ii) the prediction error ε, which estimates the error of the prediction p;
(iii) the fitness F , which estimates the accuracy of the payoff
prediction given by p; and (iv) the numerosity num, which
indicates how many copies of classifiers with the same condition and the same action are present in the population.
3.
XCSAM CLASSIFIER SYSTEM
XCS [21] evolves classifiers that represent complete action
maps which are composed of all available actions in every
possible situation, but the complete action map increases
the population size required to evolve an optimal solution.
UCS [1] deals with the best action map and is trained using
supervised learning, unlike XCS which uses reinforcement
learning. Therefore, UCS is limited in that it cannot be
applied to reinforcement learning applications. XCS with
Adaptive Action Mapping (or XCSAM), provides a trade-off
between XCS and UCS. Like XCS, it can solve both reinforcement learning (multi-step) and supervised classification
(one-step) problems; like UCS, XCSAM does not evolve a
2.2 Mechanism
At time t, XCS builds a match set [M] containing the classifiers in the population [P] whose condition matches the current sensory input st ; if [M] does not contain all the feasible
actions covering takes place and creates a set of classifiers
that matches st and cover all the missing actions. This process ensures that XCS can evolve a complete mapping so
1
In the algorithmic description [3], covering is activated
when the match set contains less than θmna actions; however, θmna is always set to the number of available actions
so that the match covers all the actions.
558
complete action map but tries to focus on the best actions
only while learning proceeds. XCSAM extends the original
XCS by (i) including a mechanism to adaptively identify the
redundant actions from those that should be included in a
best action map and by (ii) getting rid of redundant actions
while still exploring the space of viable optimal solutions.
If solving a single-step problem, as in the current work,
XCSAM computes a selection array S(a) that associates a
selection probability to each action as follows:
clk ∈[M ]|aj pk × Fk
S(aj ) =
(8)
clk ∈[M ] Fk
3.1
After action selection is performed, XCSAM generates both
the action set [A] (as XCS) and also the not action set [Ā]
consisting of the classifiers in [M ] not advocating the selected action. The selection probability is calculated from
the fitness F and parameter eam [13] When the executed
action is considered a candidate best action, the parents in
the GA are selected from [A] to promote the evolution of
classifiers that are likely to be in the final best action map
while the deleted classifiers are selected from [Ā] to get rid of
classifiers that are not likely to be part of the final solution.
Otherwise, if there is not enough information about the executed action, or [Ā] is empty, XCSAM applies deletion in
[P ] as in XCS. When the executed action is not identified
as a candidate best action, the parents are selected from [Ā]
to explore the solution space even further and deletion is
applied to the population as in XCS.
Identification of the Best Action Map
Firstly, XCSAM identifies the best action when the selected action is performed. In a typical reinforcement learning problem involving delayed rewards, the expected future
reward at the current state maxP (st , a) (Equation 2), tends
to be higher than the expected future reward at the previous
state maxP (st−1 , a) (because of the discount factor γ). Accordingly, since the action corresponding to a higher reward
also tends to correspond to a shorter state sequence, the best
actions will tend to have a value of maxP (st , a) larger than
maxP (st−1 , a). More precisely, maxP (st−1 , a) converges to
γmaxP (st , a) at the next state, while maxP (st , a) converges
to the maxP (st−1 , a)/γ. Thus, when an executed action is a
best action, in XCS the predictions of the accurate classifiers
in [A] tend to converge to maxP (st−1 , a)/γ. For this reason,
XCSAM can identify the actions that are likely to be part
of the best action map by comparing maxP (st , a) against
ζ × maxP (st−1 , a)/γ (where ζ is a learning rate added to
guarantee convergence). If maxP (st , a) is greater than the
threshold ζ × maxP (st−1 , a)/γ, then a is a good candidate
for the best action map and should be maintained.
After having identified good action candidates for the best
action map, XCSAM needs to adaptively identify classifiers
that may be good candidate for the final best action map.
Accordingly, XCSAM adds a parameter eam to classifiers,
or effect of adaptive mapping, that for classifier cli is updated according to Equation 7, where nma represents the
number of available actions. The value of eam of classifiers
advocating the selected action converges to 1 if the classifier
is a good candidate for the final best action map; otherwise,
eam converges to nma. Therefore, classifiers with an eam
close to one are good candidates to represent the final best
action map while classifiers with an eam close to nma are
less likely to be maintained as they are probably advocating
actions with lower expected return.
⎧
⎪
⎨eami + β(1 − eami )
eami ←
if maxP (st , a) ≥ ζ × maxP (st−1 , a)/γ
⎪
⎩eam + β(nma − eam )
otherwise.
i
i
(7)
3.2
4.
EXPERIMENT: XCS VS XCSAM
We apply XCS and XCSAM to the multiplexer and a classification problem with real world data. We add three kinds
of problem difficulty to the multiplexer: class-imbalance,
Gaussian noise or alternating noise. Class-imbalance can be
observed in many real world data such as oil spills in satellite
images, failures in manufacturing process, and rare medical
diagnosis [15]. Gaussian noise simulates real-world-like reward functions which sometimes return a different reward
value when the same action is executed in the same state.
Alternating noise simulates the dataset having several aliasing states which have the same input but different class. All
three conditions can occur in real world problems.
4.1
4.1.1
Experiments on Multiplexer Problems
Multiplexer problem and Experimental design
Multiplexer problem. The multiplexer (MP) [21] is defined over a binary string of k + 2k bits; the first k bits
represent an address pointing to the remaining 2k bits. For
instance, the 6-multiplexer function (k = 2) applied to the
input string 110001 will return 1 as an answer (i.e., class),
while when applied to 110110 it will return 0. We compare
XCS and XCSAM using the 11MP (k = 3). Following XCS
tradition [21] we generate inputs randomly, so the train and
test sets are the same.
Evolution on the Best Actions
To focus evolution on the best actions, XCSAM acts on
the covering operator to prevent the generation of classifiers
that are not likely to be included in the final solution. In particular, XCSAM tunes the covering threshold θmna based on
the actions’ predicted reward and the eam parameters. Initially, θmna is set to the number of feasible actions (the same
value as in XCS). When [M ] is generated, XCSAM computes
the prediction array before covering is applied (whereas XCS
computes it only after covering). Then, XCSAM computes
the current θmna as the average eam of the classifiers in [M]
weighted for the expected future return maxP (st , a). If the
number of different actions in [M ] is smaller than the computed θmna , covering is called and the prediction array is
computed again to update the prediction array.
Experimental design. During learning problems, the system selects actions randomly from those represented in the
match set. During evaluation problems, the system selects
the action with highest expected return. When the system
performs the correct action, it receives a 1000 reward, otherwise it receives 0. The genetic algorithm is enabled only
during learning problems, and it is turned off during evaluation problems. The covering operator is always enabled, but
operates only if needed. Learning problems and test problems alternate. The performance is reported as the moving
average over the last 50000 (the class imbalance) or 5000
(Gaussian noise and alternating noise) evaluation problems.
All the plots are averages over 20 experiments.
559
XCS ir = 0
ir = 3
ir = 4
ir = 5
ir = 6
ir = 7
0.6
0.4
0.2
1
0.9
0.9
0.8
XCS σ = 100
σ = 200
σ = 300
σ = 400
σ = 500
0.7
0.6
0.5
0
100
200
300
400
0.4
500
0
100
200
Iterations (10000s)
a-1) XCS
Performance
Performance (Minority class)
0.6
0.4
XCSAM ir = 0
ir = 3
ir = 4
ir = 5
ir = 6
ir = 7
0
100
200
0.6
300
400
0.4
500
0
100
300
400
1
0.9
0.9
XCSAM σ = 100
σ = 200
σ = 300
σ = 400
σ = 500
0.7
0.6
0.5
500
0.4
0
100
200
300
400
500
0.8
0.7
0.6
XCSAM Pan = 0.10
Pan = 0.15
Pan = 0.20
Pan = 0.25
Pan = 0.30
0.5
Iterations (10000s)
300
c-1) XCS
1
0.8
200
Iterations (1000s)
b-1) XCS
0.8
0
XCS Pan = 0.10
Pan = 0.15
Pan = 0.20
Pan = 0.25
Pan = 0.30
0.7
Iterations (1000s)
1
0.2
0.8
0.5
Performance
0
1
Performance
0.8
Performance
Performance (Minority class)
1
400
500
0.4
0
100
200
300
400
Iterations (1000s)
Iterations (1000s)
a-2) XCSAM
b-2) XCSAM
c-2) XCSAM
a) Class-imbalanced
b) Gaussian noise
c) Alternating noise
500
Figure 1: Performances of XCS and XCSAM on the multiplexer problem.
4.1.2
Class-imbalance
in the class-imbalanced multiplexer problem, both systems
have a tendency to overgeneralize classifiers for the majority class. Overgeneralized classifiers have relatively high fitness because inputs where they are wrong (minority-class
inputs) happen with very low frequency. Since XCS deletes
classifiers based on their fitness, XCS mistakenly keeps the
overgeneralized classifiers as good classifiers. XCSAM also
reaches optimality with ir = 0, 3 and 4. XCSAM with ir
= 5 reaches 0.94 faster than XCS. With ir = 6 and 7 XCSAM clearly has better performances than XCS. While in
XCSAM overgeneralized classifiers are also generated, it can
detect and delete them in contrast with XCS. This is because
XCSAM deletes classifiers which are not candidates for the
best action map when it identifies overgeneralized classifiers
(which have incorrect best actions) using the reward signal.
These results suggest that the best action map can enhance
the performance of ARLCS in class-imbalanced problems,
because the evolution of the best action map works to detect and quickly delete overgeneralized classifiers.
In our first set of experiments we use the class-imbalanced
multiplexer problem (IMP) [15], which generates a binary
string of k + 2k bits as an instance. If the answer of its
instance, which is computed as in the normal MP, belongs to
the minority class, then this instance is accepted as an input
with probability 1/2ir , where ir denotes the imbalance level
of the dataset. If the answer belongs to the majority class,
this instance is accepted as the input at all times. Accepted
inputs are sent to the LCS. Note ir = 0 represents the usual
balanced multiplexer. We set the minority class to 0 and the
majority class to 1. We use the following parameter settings
[15]:2 N = 800, 0 = 1, μ = 0.04, P# = 0.6, χ = 0.8, β =
0.2, α = 0.1, δ = 0.1, ν = 5, θGA = 25, θdel = 20, θsub =
200, GA subsumption is turned on while AS subsumption
is turned off; in XCSAM, we set ζ = 0.99. We set different
imbalance levels ir to = 0, 3, 4, 5, 6, and 7. The maximum
iteration is set to 5,000,000 as in [15].
Figure 1-a) shows the performances of XCS and XCSAM
on the class-imbalanced 11-bit multiplexer problem. Each
performance curve shows the rate of correct answers for the
minority class only (the performance for the majority class
reaches optimally with all ir in both systems). XCS reaches
optimality with ir = 0, 3 and 4; it reaches near-optimal
performance (0.94) with ir = 5, but it fails to reach nearoptimal performance with ir = 6 and 7 because in XCS overgeneralized classifiers remain in the population. Specifically,
4.1.3
Gaussian noise
In our second set of experiments we add Gaussian noise to
the reward on the (normal) multiplexer [21]. Gaussian noise
is added with mean zero and standard deviation σ to the
environment reward. The parameter settings are the same as
the previous experiment. We set σ to different values: 100,
200, 300, 400 and 500. The maximum iteration is 500,000.
Figure 1-b) shows the performances of XCS and XCSAM
on the 11-bit multiplexer problem with Gaussian noise. XCS
reaches optimality with σ = 100, 200 and 300. However, it
suddenly fails to reach optimal performance with higher σ.
In contrast, XCSAM reaches optimality faster than XCS
with σ = 200 and 300. Even with a large noise (σ = 400
and 500), XCSAM reaches about 0.96 and 0.85 respectively.
2
[15] also adapts XCS parameter settings to optimize it for
class-imbalanced problems but we use the unadapted settings from [15] because we want to compare best-action and
complete maps without optimisations, and over all three
problem conditions (not just class imbalance). Future work
could consider parameter optimization for all cases.
560
With Gaussian noise, the fitness of a classifier is much lower
because the prediction error which is used to calculate the
fitness becomes large. This means most classifiers are inaccurate, hence XCS cannot select a correct action using the
inaccurate system predictions (which are calculated from the
fitness). In XCSAM most classifiers are also inaccurate but
XCSAM can identify classifiers which are candidates for the
best action map. This is because the reward signal, even
with added noise, is still useful to identify best action rules;
a positive reward, which is returned when a best action is
executed, is usually larger than a negative reward, which
is returned when a not-best action is executed. That is,
the reward signal still indicates the best action even with
Gaussian noise. Hence, XCSAM can select the best action
classifiers, which remain in the population, by deleting redundant classifiers which are not candidate for the best action map. These results suggest that the best action map
contributes to stably derive high performance by deleting
redundant classifiers in Gaussian noise where the positive
reward is larger than the negative reward. In the next experiments we test the case of noise where the positive reward
is sometimes equal to the negative reward.
4.1.4
Table 1: Datasets and Results of XCS and XCSAM
(Bold text indicates p<0.05). • /◦ indicates a positive
significant difference of XCS/XCSAM respectively.
DataName #len #cls XCS XCSAM p value
Audiology
69
24 0.46
0.59
0.002◦
A.C.card
14
2 0.84
0.84
0.531
Balance S.
4
3 0.83
0.80
0.026•
Bupa
6
2 0.69
0.67
0.530
Breast w
9
2 0.91
0.89
0.152
Breast wd
32
2 0.96
0.97
0.088
Cmc
9
3 0.53
0.51
0.010•
Glass
9
6 0.74
0.70
0.207
Heart-c
13
5 0.48
0.51
0.091
Heart-h
13
5 0.64
0.68
0.154
Hepatitis
19
2 0.88
0.83
0.035•
Iris
4
3 0.96
0.94
0.081
Libras
91
15 0.15
0.17
0.253
Segment
19
7 0.94
0.95
0.293
Soybean
35
19 0.21
0.45
6.6E-08◦
Vehicle
18
4 0.72
0.71
0.407
Vowel
13
19 0.62
0.63
0.854
Vote
16
2 0.96
0.96
0.988
Wine
13
3 0.96
0.96
0.985
Zoo
17
7 0.95
0.92
0.081
XCS• -XCSAM◦ : 3 - 2
Alternating noise
In our third set of experiments we add alternating noise
to the reward on the (normal) multiplexer problem. With
alternating noise, the reward is swapped with a certain probability Pan . Specifically, the positive reward (1000) is replaced by the negative reward, which means this noise simulates a state space which includes aliasing states. For instance, an instance of 6MP “00100” has a best action “1”;
however, when a learner executes an action “1”, 1-Pan % of
all instances “00100” returns the positive reward 1000 but
Pan % instance return the negative reward 0 even. The parameter settings are the same as the previous experiment.
We use different Pan to 0.10, 0.15, 0.20, 0.25 and 0.30. The
maximum iteration is set to 500,000.
Figure 1-c) shows the performance of XCS and XCSAM
on the 11-bit multiplexer problem with alternating noise.
With Pan = 0.10 and 0.15, XCS reaches 0.98 and 0.95 respectively; it fails to reach optimal performance with Pan =
0.20, 0.25 and 0.30. The performance of XCSAM with Pan
= 0.10 and 0.15 are worse than XCS, in fact they converge to
0.95 and 0.93 respectively. However, with Pan = 0.20, 0.25
and 0.30, XCSAM clearly improves on the performances of
XCS. With alternating noise, XCSAM has a tendency to
delete classifiers which are candidates for the best action
map, because XCSAM mistakenly identifies them as redundant classifiers when the returned reward is 0 (i.e., reward
0 represents that executed action is not the best action). In
contrast, XCS keeps them even with the added noise (XCS
deletes classifiers based on their fitness), because they have
relatively high fitness in the case of small noise. Hence, with
small noise Pan = 0.10 and 0.15, the performance of XCSAM
is lower than that of XCS. However, with large noises Pan
= 0.20, 0.25 and 0.30, the fitness of classifiers is much low,
hence, XCS cannot select actions using the incorrect system
prediction (which is the same reason of Gaussian noise).
4.2
shown in Table 1 we use 20 datasets. In the Table, #len denotes the length of input in each dataset and #cls denotes
the number of classes in each dataset. As in previous work
[1], if the attributes are binary, the data is codified in the
ternary alphabet. If the attributes are real, the data is codified as an interval range. For simplicity, all real attributes
are normalized to the range [0,1). Nominal attributes are
translated into numeric values, and so considered as real
attributes. The datasets are run on a stratified ten-fold
cross-validation test. To test the differences between the
both systems, we use a paired t-test from the ten-fold crossvalidation results. The parameter settings are the same as
previously except the population size limit N = 6400. The
maximum iteration is 100,000.
The classification accuracies of XCS and XCSAM are shown
in Table 1. We note XCS is significantly better (p<0.05)
than XCSAM on Balance S., Cmc and Hepatitis while XCSAM is significantly better Audiology and Soybean which
have longer inputs and more classes than other datasets.
Overall, we cannot find a clear advantage for either system.
5.
ANALYSIS
To understand why XCSAM does not derive higher classification accuracies than XCS, we analyze how XCSAM focuses on the best action map. From the analysis, we introduce a deletion strategy for XCSAM which can help to
evolve the best action map.
5.1 Analysis
Specifically, we analyze the parameters eam (effectiveness
of adaptive action mapping) and (prediction error) of classifiers in the population when XCSAM is applied to the real
dataset Vowel. The reason for focusing on Vowel is that
XCSAM did not derive a better performance than XCS although Vowel has many classes like Audiology and Soybean
Experiment on Real World Data
To evaluate performance of XCS and XCSAM on a real
world application, we apply both systems to classification
tasks with real world data from the UCI repository [16]. As
561
9
7
5
3
1
0
100
200
300
400
500
11
Effectiveness of Adaptive action mapping
Effectiveness of Adaptive action mapping
Effectiveness of Adaptive action mapping
11
9
7
5
3
1
0
100
Prediction Error
a) 10,000th iteration
200
300
400
Prediction Error
500
11
9
7
5
3
1
0
100
200
300
400
500
Prediction Error
b) 50,000th iteration
c) 100,000th iteration
Figure 2: Classifies in the population when XCSAM is applied to Vowel
where XCSAM clearly derived higher classification accuracy
than XCS. In Vowel, the number of available actions nma is
11, which is the same as the number of classes of Vowel.
Figure 2 shows the classifiers in the population when XCSAM is applied to Vowel. The vertical and horizontal axes
indicate the classifiers’ eam and parameters. A classifier
has the best action if its eam is close to 1, while a not-best
action if its eam is close to 11 (nma). Similarly, a classifier
is accurate if its is close to 0, while inaccurate if its is
close to 500. For instance, classifiers which are placed at the
left-top corner are accurate but not candidates for the best
action map; classifiers which are placed at the left-bottom
corner are accurate and candidates for the best action map.
From figure a), at the 10,000th iteration, XCSAM generates redundant classifiers which are inaccurate (large ) or
not candidates for the best action map (large eam). At the
50,000th iteration, XCSAM gradually generates classifiers
which have large but small eam; on the other hand many
redundant classifiers remain. At the 100,000th as the final
iteration, XCSAM successfully learns several accurate classifiers (left-bottom corner) which have a best action (eam =
1); however many inaccurate classifiers and redundant classifiers (left-bottom corner) still remain. This analysis suggests XCSAM can partially learn the best action map, but it
has many redundant classifiers which are inaccurate or are
not best action map candidates. Hence, XCSAM sometimes
fails to efficiently evolve the best action map.
5.2
is called when the executed action is identified as the best
action. First, the average eam EAM (a) is computed from
classifiers with action a. Then XCSAM builds a maximum
eam set [E] composed of classifiers in [M] which advocate
the maxEAM (a) (i.e., they are not even close to the best
action map); and deletes all classifiers in [E] if nma[M ] >
θmna . XCSAM repeats this process while nma[M ] > θmna .
6.
EXPERIMENT: REVISED XCSAM
We apply the revised XCSAM with the new deletion mechanism to the multiplexer and real world data.
6.1
Experiments on Multiplexer Problems
The experimental settings are the same as the previous
experiments (Section 4.1). Figure 3 shows the performance
of the revised XCSAM on the 11-bit multiplexer with classimbalance, Gaussian noise or alternating noise. With classimbalance (Figure 3 a)), the revised XCSAM reaches optimality with ir = 0, 3 and 4; with ir = 5 and 6 it reaches
0.99 and 0.96 respectively, which is higher accuracy than
the standard XCSAM which reaches to 0.94 and 0.87 respectively (see Figure 1-a)). With Gaussian noise (Figure 3
b)), with σ = 100, 200, 300 and 400, the performance is as
good as that of the standard XCSAM. With σ = 500, the revised XCSAM reaches 0.90 while standard XCSAM reaches
0.85 (see Figure 1-b)). In contrast, with alternating noise
(Figure 3 b)), the performance of revised XCSAM is lower
than the standard XCSAM. For instance, with Pan = 0.10
and 0.15, the revised XCSAM reaches 0.89 and 0.84 respectively, while the standard XCSAM reaches 0.95 and 0.93.
As noted in Section 4.1, with class-imbalance and Gaussian
noise, since XCSAM can identify the best action map, the
new deletion mechanism correctly deletes redundant classifiers. Hence, the revised XCSAM can successfully focus on
the best action map and evolve them toward accurate classifiers. However, with alternating noise, XCSAM sometimes
mistakenly identifies redundant classifier as the best action
map due to noise. This causes the new deletion mechanism
to mistakenly delete even classifiers which are candidates for
the best action map. Hence, the revised XCSAM does not
perform better than the standard XCSAM. This suggests
that, if ARLCS can evolve only best action maps and have
no redundant classifiers in their population, the best action
map is more robust than the complete action map for classimbalance and Gaussian noise; in contrast, robustness with
alternating noise is worse than the complete action map.
Deletion mechanism
From the analysis, to efficiently focus on the best action
map, XCSAM needs to delete many redundant classifiers.
However, using a steady-state GA, XCSAM deletes only two
classifiers in each generation, hence many redundant classifiers remain in the population. Here, we introduce a new
deletion mechanism that deletes more than two classifiers
as necessary. Specifically, in XCSAM, θmna , which is calculated from the parameter eam, is the number of actions
which should be in the matchset [M]. When θmna = nma
(number of action types), classifiers in [M] should not be removed because XCSAM does not detect the best action in
its state. In contrast, when θmna = 1 (which means XCSAM
correctly identified the best action), classifiers will be redundant if [M] has more than one type of action. More generally,
if the number of action types in [M] (nma[M ] ) is larger than
θmna , we can suspect there are some redundant classifiers
in [M]. For this reason, the new deletion mechanism deletes
classifiers if nma[M ] > θmna . The new deletion mechanism
562
0.6
0.4
Revised XCSAM ir = 0
ir = 3
ir = 4
ir = 5
ir = 6
ir = 7
0.2
0
0
100
200
300
400
1
1
0.9
0.9
0.8
Revised XCSAM σ = 100
σ = 200
σ = 300
σ = 400
σ = 500
0.7
0.6
Performance
0.8
Performance
Performance (Minority class)
1
0.5
500
0.4
0.7
0.6
Revised XCSAM Pan = 0.10
Pan = 0.15
Pan = 0.20
Pan = 0.25
Pan = 0.30
0.5
0
100
Iterations (10000s)
200
300
Iterations (1000s)
a) Class-imbalance
0.8
b) Gaussian noise
400
500
0.4
0
100
200
300
400
500
Iterations (1000s)
c) Alternating noise
Figure 3: Performance of revised XCSAM on the multiplexer problem
differences of revised XCSAM from the standard XCSAM
are noted, the revised XCSAM successfully improves on the
performance of XCSAM on the real world data. From these
results, the best action map is more effective on the classification of data where an input has many elements (i.e., it is
high-dimensional) and many classes.
Finally, to understand how the revised XCSAM focuses
on the best action map, we analyze the classifiers of revised
XCSAM when it is applied to dataset Vowel ; see figure 4.
At the 10,000th iteration, like the standard XCSAM, revised XCSAM generates redundant classifiers which are inaccurate (large ) or not candidates for the best action map
(large eam). At the 50,000th iteration, the revised XCSAM
gradually learns only the classifiers which have small eam
parameters. Note in the standard XCSAM, many classifiers which have large eam remain, but the revised XCSAM
successfully deletes them. At the 100,000th iteration, many
accurate classifiers (which have = 0 and eam = 1) are
generated and the classifiers which have large eam do not
remain. From these results, the revised XCSAM successfully
deletes classifies which are not best action map candidates,
hence, the revised XCSAM can derive high performance.
Table 2: Results of Revised XCSAM (Bold text indicates p<0.05). p1 value indicates the difference
between XCS and revised XCSAM. • /◦ indicates a
positive significant difference of XCS/Revised XCSAM respectively. p2 value indicates it between XCSAM and revised XCSAM. /♦ indicates a positive
significant difference of XCSAM/ RevisedXCSAM
respectively.
DataName
Audiology
A.C.card
Balance S.
Bupa
Breast w
Breast wd
Cmc
Glass
Heart-c
Heart-h
Hepatitis
Iris
Libras
Segment
Soybean
Vehicle
Vowel
Vote
Wine
Zoo
6.2
Revised XCSAM p1 value
0.57
0.031◦
0.84
0.755
0.85
0.003◦
0.67
0.260
0.89
0.258
0.97
0.152
0.52
0.375
0.69
0.228
0.49
0.678
0.61
0.377
0.83
0.020◦
0.95
0.343
0.22
0.002◦
0.95
0.040◦
0.40
1.1E-04◦
0.70
0.179
0.85
2.6E-0.5◦
0.95
0.044•
0.96
0.999
0.95
0.972
XCS• -Revised XCSAM◦ :
XCSAM -Revised XCSAM♦ :
p2 value
0.554
0.726
0.001♦
0.812
0.891
0.811
0.423
0.879
0.304
0.055
0.838
0.168
0.005♦
0.231
0.065
0.657
5.3E-0.7♦
0.227
0.988
0.343
1-7
0-3
7. SUMMARY AND CONCLUSION
This paper compares XCS [20] with XCSAM [11, 12] on
multiplexer problems and classification of real world data.
Specifically, firstly, we test both system on the multiplexer
problem with the three problem difficulties: class-imbalance,
Gaussian noise and alternating noise. Experimental results
show that XCSAM can derive better performance (higher accuracy and sometimes faster learning) than XCS with classimbalance and Gaussian noise, where XCSAM can identify
classifiers which are best action map candidates. Hence,
XCSAM can efficiently evolve classifiers toward a best action map. In contrast, XCSAM sometimes derives lower
accuracy than XCS when the alternating noise is small.
From analysing this we introduce a new deletion mechanism. While original XCSAM sometimes fails to focus on
the best action map in real world data, we show our new
deletion mechanism can focus on it with real world data.
The revised XCSAM which employs the new deletion mechanism can derive higher accuracy than XCSAM on the classimbalanced and Gaussian noise problems; however with alternating noise, its accuracy degrades more than XCSAM.
On classification with real world data, the revised XCSAM
can derive higher classification accuracy on data which have
a high-dimensional input and many possible classes. These
Experiment on Real World Data
We test the revised XCSAM on classifying the real world
datasets from Section 4.2, using the same parameter settings. Table 2 shows the classification accuracy of XCS and
p values between XCS and the revised XCSAM and between
the standard XCSAM and revised XCSAM. The revised XCSAM clearly improves on the classification accuracy of XCSAM (see Table 1). We also found seven positive significant
differences of revised XCSAM from XCS. In particular, the
revised XCSAM derives high classification accuracy on the
datasets Audiology, Libras, Soybean and Vowel, where their
input-lengths are relatively long and there are many possible classes. Additionally, since three positive significant
563
9
7
5
3
1
0
100
200
300
400
500
11
Effectiveness of Adaptive action mapping
Effectiveness of Adaptive action mapping
Effectiveness of Adaptive action mapping
11
9
7
5
3
1
0
100
Prediction Error
a) 10,000th iteration
200
300
Prediction Error
b) 50,000th iteration
400
500
11
9
7
5
3
1
0
100
200
300
400
500
Prediction Error
c) 100,000th iteration
Figure 4: Classifies in the population when revised XCSAM is applied to Vowel
[6] D. E. Goldberg. Genetic Algorithms in Search,
Optimization, and Machine Learning. Addison Wesley,
1989.
[7] J. H. Holland. Escaping Brittleness: The Possibilities
of General Purpose Learning Algorithms Applied to
Parallel Rule-based system. Machine Learning,
2:593–623, 1986.
[8] F. Kharbat, L. Bull, and M. Odeh. Mining breast
cancer data with XCS. GECCO2007, pages
2066–2073, 2007.
[9] T. Kovacs. Deletion schemes for classifier systems.
GECCO1999, pages 329–336, 1999.
[10] Pier Luca Lanzi. Learning classifier systems from a
reinforcement learning perspective. Soft Computing A Fusion of Foundations, Methodologies and
Applications, 6(3):162–170, 2002.
[11] M. Nakata, P. L. Lanzi, and K. Takadama. Enhancing
Learning Capabilities by XCS with Best Action
Mapping. In PPSN XII, volume 7491 of LNCS, pages
256–265. Springer, 2012.
[12] M. Nakata, P. L. Lanzi, and K. Takadama. XCS with
Adaptive Action Mapping. In SEAL2012, volume 7673
of LNCS, pages 138–147. Springer, 2012.
[13] Masaya Nakata, Pier Luca Lanzi, and Keiki Takada
ma. Selection strategy for XCS with adaptive action
mapping. In GECCO2013, p. 1085–1092. ACM, 2013.
[14] A. Orriols and E. Bernadó-Mansilla. Class imbalance
problem in UCS classifier system: fitness adaptation.
In CEC2005, vol. 1, p. 604–611. IEEE, 2005.
[15] A. Orriols-Puig and E. Bernadó-Mansilla. Bounding
XCS’s parameters for unbalanced datasets. In
GECCO2006, pages 1561–1568. ACM, 2006.
[16] UCIrvine Machine Learning Repository.
http://archive.ics.uci.edu/ml/.
[17] C. Stone and L. Bull. Comparing XCS and ZCS on
noisy continuous valued environments. In ETechnical
Report. UWE, 2005.
[18] R. S. Sutton and A. G. Barto. Reinforcement Learning
– An Introduction. MIT Press, 1998.
[19] Fani A. Tzima and Pericles A. Mitkas. ZCS Revisited:
Zeroth-level Classifier Systems for Data Mining.
ICDM Workshops, pages 700–709, 2008.
[20] S. W. Wilson. ZCS: A Zeroth Level Classifier System.
Evolutionary Computation, 2(1):1–18, 1994.
[21] S. W. Wilson. Classifier fitness based on accuracy.
Evolutionary Computation, 3(2):149–175, June 1995.
result suggest that the best action map is more robust to
the class-imbalance and Gaussian noise, since the evolution
of best action maps works to detect and quickly delete overgeneralized classifiers and classifiers which do not have a
best action map. Note, despite these problem difficulties,
the reward signal is still useful to guide the LCS to the best
action. The alternating noise causes XCSAM to mistakenly
identify a not-best action as best action. Accordingly, XCSAM deletes accurate classifiers which have a best action
by wrongly identifying them as not-best actions. Hence, the
best action map is less robust than the complete action map
in noise where the best action is mistakenly evaluated as the
not-best action, i.e., in aliased states.
Our conclusion is that the best action map is a powerful
strategy in problems which distinctively return a positive
reward for the best action with a negative reward for the
not-best action (i.e., the positive reward is always larger
than the negative reward). Additionally, if ARLCS can successfully focus on only the best action map, the best action
map contributes to high performance on a complex problem which has a high-dimensional input and many possible
classes. The complete action map is a better strategy than
the best action map when given problems include the aliasing states. Results with RL problems could be different from
classification problems but we leave this for future work.
8.
REFERENCES
[1] E. Bernadó-mansilla and J. M.Garrell-Guiu.
Accuracy-based Learning Classifier Systems: Models,
Analysis and Applications to Classification Tasks.
Evolutionary Computation, 11:209–238, 2003.
[2] Larry Bull, Ester Bernadó-Mansilla, and John H.
Holmes, editors. Learning Classifier Systems in Data
Mining, volume 125 of Studies in Computational
Intelligence. Springer, 2008.
[3] M. V. Butz and S. W. Wilson. An algorithmic
description of XCS. Journal of Soft Computing,
6(3–4):144–153, 2002.
[4] M. V. Butz, D. E. Goldberg, and P. L. Lanzi. Gradient
descent methods in learning classifier systems: Improving XCS performance in multistep problems. IEEE
Trans. on Evol. Comp., 9(5):452–473, October 2005.
[5] M. V. Butz, P. L. Lanzi, and S. W. Wilson. Function
approximation with XCS: Hyperellipsoidal conditions,
recursive least squares, and compaction. IEEE Trans.
on Evol. Comp., 12(3):355–376, 2008.
564

Download Report

Complete Action Map or Best Action Map in Accuracy

Paperzz.com

Your Paperzz