Selection Strategy for XCS with Adaptive Action Mapping

Selection Strategy for XCS with Adaptive Action Mapping
Masaya Nakata
Pier Luca Lanzi
Keiki Takadama
Dept. of Informatics
The University of
Electro-Communications
1-5-1, Chofugaoka, Chofu-shi
Tokyo, Japan
Dipartimento di Elettronica,
Informazione e Bioinformatica
Politecnico di Milano
I-20133
Milano, Italy
Dept. of Informatics
The University of
Electro-Communications
1-5-1, Chofugaoka, Chofu-shi
Tokyo, Japan
[email protected] [email protected]
[email protected]
ABSTRACT
1. INTRODUCTION
XCS with Adaptive Action Mapping (XCSAM) evolves solutions focused on classifiers that advocate the best action
in every state. Accordingly, XCSAM usually evolves more
compact solutions than XCS which, in contrast, works toward solutions representing complete state-action mappings.
Experimental results have however shown that, in some problems, XCSAM may produce bigger populations than XCS. In
this paper, we extend XCSAM with a novel selection strategy to reduce, even further, the size of the solutions XCSAM
produces. The proposed strategy selects the parent classifiers based both on their fitness values (like XCS) and on the
effect they have on the adaptive map. We present experimental results showing that XCSAM with the new selection
strategy can evolve more compact solutions than XCS which,
at the same time, are also maximally general and maximally
accurate.
Learning Classifier Systems [11] combine reinforcement
learning [20] and genetic algorithms [10] to solve classification [2], regression [9], and sequential decision making problems [7]. XCS [21] is probably the most popular classifier
system model so far that has been successfully applied to
a wide range of problem domains [2]. XCS evolves solutions that provide an evaluation of the expected payoff of all
the available actions in every situation, i.e., they represent
complete (state-action) mappings of the problem solutions.
Because it evolves complete mappings, XCS can learn optimal solutions for very difficult problems and because of its
intrinsic genetic pressure toward generalization, XCS generates solutions that are maximally accurate and maximally
general.
Complete mappings are often considered redundant since
in most applications (e.g., classification, stock market prediction) only the actions with the highest return really count.
Complete mappings are also very expensive both in terms
of memory (since they allocate classifiers to non-optimal actions) and in terms of time required to achieve optimal performance (since they require a more thoroughly exploration
to the problem space); Accordingly, to avoid complete mappings, Bernado et al. introduced the UCS classifier system
[1] which, like XCS, exploits accuracy-based evolution, to
produce maximally accurate maximally general solutions,
but employs supervised learning (instead of reinforcement
learning) to distribute incoming feedback to the classifiers
accountable for it. Thus, UCS only evolves classifiers that
advocate the best action but can solve only classification
problems (since it does not apply reinforcement learning as
XCS), therefore its approach cannot generalized to broader
machine learning applications.
To overcome the limitations of UCS, we introduced XCS
with Adaptive Action Mapping (XCSAM) [19, 18] a classifier system that (i) can evolve mappings mainly focused
on the best action, with the highest payoffs, like UCS; at
the same time, XCSAM (ii) can tackle both classification
and sequential decision making problems using reinforcement learning like XCS. XCSAM [19, 18] mainly works like
XCS and mainly selects parents classifiers based on their
relative accuracy encoded in the classifiers’ fitness [21].
In XCS, plain accuracy-based fitness results in the evolution of complete mappings but in XCSAM it produces
solutions with more classifiers than XCS, even if, XCSAM
tries to delete classifiers that do not represent the highest
rewarded actions.
Categories and Subject Descriptors
I.2.6 [Artificial Intelligence]: Learning—knowledge acquisition,concept learning
General Terms
Algorithms, Experimentation, Performance
Keywords
Learning Classifier System, XCS classifier system, XCS with
Adaptive Action Mapping, Best action mapping, Genetic
Algorithm, Selection strategy
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
GECCO’13, July 6–10, 2013, Amsterdam, The Netherlands.
Copyright 2013 ACM 978-1-4503-1963-8/13/07 ...$15.00.
1085
Then, the prediction error and classifier fitness are updated as,
To reduce the size of the solutions evolved by XCSAM,
in this paper we firstly analyzes the effect of accuracy-based
selection in XCSAM; then, we introduce a more effective
selection strategy for XCSAM; finally, we compare XCS, the
original XCSAM, and the improved XCSAM to the hidden
parity and multiplexer problems. Our results show that the
improved version of XCSAM can produce solutions that are
more compact than those evolved by XCS and the original
XCSAM.
2.
pk × clk ∈[M ](a)
Fk
cli ∈[M ](a)
Fi
a∈[M ]
The XCS classifier system [21] evolves solutions that map
every possible situation (state) and action pair in its corresponding expected return. The maintenance of a complete
mapping makes XCS robust to local optima and guarantees
that, given an adequate amount of resources [8, 15], XCS can
learn an optimal solution made of maximally accurate and
maximally general classifiers. A complete mapping is however considered redundant in several domains (for instance,
in classification we are not interested in the expected payoff
of wrong classes) and very expensive to maintain both in
terms of memory (since it keeps information on all available
actions) and time (since it requires a thoroughly exploration
of the space). Accordingly, Bernardo with colleagues introduced UCS [1] a classifier system specifically designed for
classification tasks that applies supervised learning (instead
of reinforcement learning as XCS) to evolve solutions containing only classifiers advocating the best actions.
XCS with Adaptive Action Mapping (XCSAM) [19, 18]
represents a trade-off between XCS and UCS. Like XCS,
it can solve both reinforcement learning (multi-step) and
supervised classification (one-step) problems; similarly to
UCS, XCSAM does not evolve a complete mapping but tries
to focus on the best actions while the learning proceeds.
To accomplish this, XCSAM extends the original XCS by
adding mechanisms (i) to adaptively identify redundant actions and (ii) to get rid of them while still exploring the
space of viable optimal solutions.
(1)
3.1 Identifying Best Action Mappings
To focus on the classifiers with the highest expected return, XCSAM examines the expected payoff at the current state maxP (st , a) (Equation 2) against the payoff expected at the previous state (maxP (st−1, a)). Note that in
general the former (maxP (st , a)) tends to be higher than
the latter (maxP (st−1 , a)) (because of the discount factor
γ). Accordingly, since the action corresponding to a higher
reward also corresponds to a shorter state sequence, the
best actions will tend to have a value of maxP (st , a) larger
than maxP (st−1 , a). More precisely, maxP (st−1 , a) converges to γmaxP (st , a) at the next state, while maxP (st , a)
converges to maxP (st−1 , a)/γ. Thus, in XCS the prediction of the accurate classifiers in [A] tends to converge to
(2)
where γ is the discount factor [20]. Next, the parameters of
the classifiers in [A] are updated in the following order [6]:
prediction, prediction error, and finally fitness. Prediction p
is updated with learning rate β (0 ≤ β ≤ 1):
pk ← pk + β(P (t) − pk )
(5)
3. XCS WITH ADAPTIVE
ACTION MAPPING
where, [M](a) represents the subset of classifiers of [M ] with
action a, pk identifies the prediction of classifier clk , and Fk
identifies the fitness of classifier clk . Then, XCS selects an
action to perform; the classifiers in [M] which advocate the
selected action form the current action set [A]. The selected
action at is performed, and a scalar reward rt+1 is returned
to XCS together with a new input st+1 . When the reward
rt+1 is received, the estimated payoff P (t) is computed as
follows:
P (t) = rt+1 + γ max P (st+1 , a)
Fk ← Fk + β(κk − Fk ).
where
is the raw accuracy of classifier k computed from
the classifier error k [21].
On regular basis (dependent on parameter θga ), the genetic algorithm is applied to classifiers in [A]. It selects
two classifiers, copies them, with probability χ performs
crossover on the copies, and with probability μ it mutates
each allele. The resulting offspring classifiers are inserted
into the population and two classifiers are deleted to keep
the population size constant.
The XCS classifier system maintains a population of rules
(the classifiers) which represents the solution to a reinforcement learning problem [13]. Classifiers consist of a condition, an action, and four main parameters [21, 6]: (i) the
prediction p, which estimates the relative payoff that the
system expects when the classifier is used; (ii) the prediction error ε, which estimates the error of the prediction p;
(iii) the fitness F , which estimates the accuracy of the payoff
prediction given by p; and (iv) the numerosity num, which
indicates how many copies of classifiers with the same condition and the same action are present in the population.
At time t, XCS builds a match set [M] containing the classifiers in the population [P] whose condition matches the current sensory input st ; if [M] does not contain all the feasible
actions covering takes place and creates a set of classifiers
that matches st and cover all the missing actons. This process ensures that XCS can evolve a complete mapping so
that in any state it can predict the effect of every possible
action in terms of expected returns.
For each possible action a in [M], XCS computes the system prediction P (st , a) which estimates the payoff that the
XCS expects if action a is performed in st . The system prediction P (st , a) is computed as the fitness weighted average
of the predictions of classifiers in [M] which advocate action
a:
(4)
κk
THE XCS CLASSIFIER SYSTEM
P (st , a) =
k ← k + β(|P (t) − pk | − k )
(3)
1086
maxP (st−1 , a)/γ. For this reason, XCSAM can identify the
actions that are likely to be part of the best mapping by
comparing maxP (st , a) against ζ × maxP (st−1, a)/γ (where
ζ is a learning rate added to guarantee convergence). If
maxP (st , a) is greater than the threshold ζ×maxP (st−1, a)/γ,
then a is a good candidate and should be maintained.
After having identified good candidate actions for the best
action mapping, XCSAM needs to adaptively identify classifiers that may be good candidate for the final best mapping.
For this purpose, a parameter eam (effect of adaptive mapping) is added to the classifiers of XCSAM, and updated
as,
⎧
⎪
⎨eami + β(1 − eami )
eami ←
if maxP (st , a) ≥ ζ × maxP (st−1, a)/γ
⎪
⎩eam + β(nma − eam )
otherwise.
i
i
where nma represents the number of available actions. The
value of eam of classifiers advocating the selected action converges to 1 if the classifier is a good candidate for the final best action mapping; otherwise, eam converges to nma.
Therefore, classifiers with an eam close to one are good candidates to represent the final best action mapping while classifiers with an eam close to nma are less likely to be maintained as they are probably advocating actions with lower
expected return.
4.1 Hidden Parity problem
This class of Boolean functions has been first used with
XCS in [12] to relate the problem difficulty to the number
of accurate maximally general classifiers needed by XCS to
solve the problem. They are defined over binary strings of
length n in which only k bits are relevant; the hidden parity function (HPn,k ) returns the value of the parity function
applied to the k relevant bits, that are hidden among the
n inputs. For instance, given the hidden parity function
HP6,4 defined over inputs of six bits (n = 6), in which only
the first four bits are relevant (k = 4), then we have that
HP6,3 (110111) = 1 while HP6,3 (000111) = 1.
In this analysis, we applied the standard parameter settings for XCS [16]: N = 2000, 0 = 1, μ = 0.04, P# = 1.0,
Pexplr = 1.0, χ=1.0, β = 0.2, α=0.1, δ = 0.1, ν = 5, θGA
= 25, θdel = 20, θsub = 20, GA subsumption is applied but
Action set subsumption is not applied, in addition, we use
the tournament selection with τ = 0.4; for XCSAM, we applied the same parameters of XCS and in addition we set ζ
= 0.99.
4.2 Accuracy-Based Selection in XCS
Figure 1 shows the characteristics of parents classifiers selected by accuracy-based fitness in XCS. In particular, Figure 1a and Figure 1b show the relation between the prediction and the fitness of selected parents and the iteration
when its parent has been selected. For instance, a parent is
plotted at the upper right corner of Figure 1a, (with prediction 1000) if it was selected at iteration 150000. Figure 1c
shows the relations between the prediction and the fitness
of classifiers in [P] after 150000 iterations, that is, the final
solutions. In XCS, to ensure a complete mapping, selection
is designed to focus toward maximally accurate classifiers.
Therefore, in classification problems with a 1000/0 reward
scheme, the classifiers which should be selected as parent
have a prediction of 1000 or 0. Figure 1a and Figure 1b
show that the genetic algorithm in XCS selects classifiers
with a prediction between between 100 and 900 until problem 60000; then, XCS gradually focuses on classifiers with
a 0 or 1000 prediction. Similarly, selected classifiers have
a low fitness until iteration 70000; then, XCS gradually focuses selection on classifiers with higher fitness values (near
to 1). This behavior is coherent to what we should expected
in XCS however the same strategy has a different effect in
XCSAM.
3.2 Focusing Evolution on the Best Actions
To focus evolution on the best actions, XCSAM acts on
the covering operator to prevent the generation of classifiers that are not likely to be included in the final solution.
In particular, XCSAM tunes the activation threshold of the
genetic algorithm θnma using the actions’ predicted reward
and the eam parameters. Initially, θnma is set to the number
of feasible actions (the same value used in XCS). When [M ]
is generated, XCSAM computes the prediction array before
the covering is applied (whereas XCS computes it only after covering). Then, XCSAM computes the current θnma as
the average eam of the classifiers in [M] weighted for the expected future return maxP (st , a). If the number of different
actions in [M ] is smaller than the computed θnma , covering
is called and the prediction array is computed again. After
action selection is performed, XCSAM generates both the
action set [A] (as XCS) and also the not action set [Ā] consisting of the classifiers in [M ] not advocating the selected
action. When the executed action is considered a candidate
best action, during the genetic algorithms, (i) the parent
classifiers are selected from [A] to promote the evolution of
classifiers that are likely to be in the final best action mapping; instead, (ii) the deleted classifiers are selected from [Ā]
to get rid of classifiers that are not likely to be part of the
final solution. Otherwise, if there is not enough information
about the executed action, or [Ā] is empty, XCSAM applies
deletion in [P ] as done in XCS. When the executed action
is not identified as a candidate best action, the parents are
selected from [Ā] to explore the solution space even further
and deletion is applied to the population as in XCS.
4.
to the hidden parity problem HP20,5 and compared the characteristics of parents classifiers in the two models.
4.3 Accuracy-Based Selection in XCSAM
In classification problems with a 1000/0 reward scheme,
XCSAM should tend to focus mainly on accurate classifiers
with high (near 1000) prediction values, which represent the
best action mappings. Thus, XCSAM should behave differently than XCS and should not select accurate classifiers
wih low, near 0, prediction values, which represent redundant classifiers for XCSAM.
ANALYSIS OF SELECTION IN XCSAM
We analyzed the effect of pure accuracy-based selection in
XCSAM by studying the characteristics of selected parent
classifiers. For this purpose, we applied XCS and XCSAM
1087
150000
1
150000
0.9
0.8
0.7
50000
Fitness
Iterations
100000
Iterations
100000
50000
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0
0
200
400
600
Prediction
800
1000
0
a) Prediction of selected parents and
iteration that its parent are selected
0.2
0.4
0.6
Fitness
0.8
0
1
200
400
600
800
1000
Prediction
b) Fitness of selected parents and
iteration that its parent are selected
c) Prediction and Fitness of classifiers in
[P] as final solutions
Figure 1: Characteristics of parents selected by accuracy-based selection in XCS (a & b) and characteristics
of the final solutions (c).
150000
1
150000
0.9
0.8
0.7
50000
Fitness
Iterations
100000
Iterations
100000
50000
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0
0
200
400
600
800
1000
0
0.2
Prediction
a) Prediction of selected parents and
iteration that its parent are selected
0.4
0.6
0.8
0
1
Fitness
b) Fitness of selected parents and
iteration that its parent are selected
200
400
600
Prediction
800
1000
c) Prediction and Fitness of classifiers in
[P] as final solutions
Figure 2: Characteristics of parents selected by accuracy-based selection in XCSAM (a & b) and characteristics of the final solutions (c).
As can be noted, classifiers with a low prediction (a near
zero prediction) tend to have the high eam (which converges
to nma); classifiers with a high prediction (a near 1000 prediction), have the low eam (which converges to 1). This
suggests that XCSAM should select parents with a small
value of eam and a high fitness. Accordingly, we introduce
the selection vote which is used to decide parents. In detail,
we compare the selection vote of each classifiers in [A] to select parents. In particular, we modify tournament selection
[4] by defining the selection vote as:
Figure 2 shows the characteristics of selected classifier parents in XCSAM using plain accuracy-based fitness. Figure
2a and Figure 2b show that XCSAM selects most promising
parent classifiers for a best action mapping (in fact, it tends
to select classifiers with high fitness values). However, at
the same time, the genetic algorithm still selects inaccurate
classifiers until iteration 150000 (Figure 2b). XCSAM keeps
selecting classifiers with low fitness (Figure 2c) therefore the
final solutions evolved by XCSAM contain a large number of
inaccurate classifiers. Thus, plain accuracy-based selection
in XCSAM select the proper classifiers but also keeps a large
number of inaccurate classifiers and thus produces solutions
with more classifiers than needed (in fact, the system should
still get rid of inaccurate classifiers).
5.
selection vote =
cl.F
1
×
cl.num
cl.eam − 1
(6)
Note that the selection vote is equal to cl.F/cl.num that is
original selection pressure of the tournament selection [3],
i.e., we only add 1/(cl.eam − 1) to the original selection
vote. This means that XCSAM selects parents based on
not only fitness but also based on the eam value so that
XCSAM detects accurate parents that should be included
in the best action mappings (because they have a low eam).
We select the classifier which has maximum selection vote
in the tournament with tournament size τ as done [4].
IMPROVED XCSAM
We propose a new selection strategy for XCSAM to ensure
that XCSAM evolves compact solutions containing mainly
classifiers advocating the actions with the highest payoff. As
shown by previous analysis, XCSAM with plain accuracybased selection fails to select good parents that represent
the best action mappings. To understand how we designed
an adequate selection strategy for XCSAM, we analyzed the
eam parameter in XCSAM. Figure 3 shows the relations
between the prediction and effect of adaptive mapping (eam)
of the classifiers in the final solutions.
5.1 Analysis of New selection strategy
To analyze the characteristics of selected parent classifiers
by the proposed selection strategy in XCSAM, we applied
XCSAM on the 20/5 Hidden parity problem.
1088
150000
1
150000
0.9
0.8
0.7
Fitness
Iteration
100000
Iteration
100000
50000
50000
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0
0
200
400
600
Prediction
800
1000
0
a) Prediction of selected parents and
iteration that its parents are selected
0.2
0.4
0.6
Fitness
0.8
0
1
200
400
600
800
1000
Prediction
b) Fitness of selected parents and
iteration that its parents are selected
c) Prediction and Fitness of classifiers in
[P] as final solutions
Figure 4: Characteristics of parents selected by the new selection strategy in XCSAM (a & b) and characteristics of the classifiers in the final solution.
150000
100000
100000
Iteration
Iterations
150000
50000
50000
0
0
1
1.2
1.4
1.6
1.8
2
1
Effect of Adaptive Mapping: eam
1.2
1.4
1.6
1.8
2
Effect of Adaptive Mapping: eam
a) XCSAM with XCS’s selection strategy
b) XCSAM with proposed selection strategy
Figure 5: Effect of adaptive mapping (eam ) of parents selected by accuracy-based selection strategy and the
new selection strategy in XCSAM.
Figure 4a shows that with the proposed selection strategy
XCSAM selects inaccurate classifiers that have prediction =
100 ∼ 900 till iteration 50000; then, it clearly focus only on
good candidates with high accuracy and a high (near 1000)
prediction. Similarly, Figure 4b shows that XCSAM still
selects maximally accurate classifiers. Therefore, as shown
in Figure 4c, final solutions contain very few inaccurate or
redundant classifiers and thus represent the best mapping
XCSAM should generate. Additionally, Figure 5 shows that
the proposed selection strategy correctly focuses on classifiers with a low eam whereas pure accuracy-based fitness
cannot stably select such classifiers.
Overall, these results suggest that the proposed selection
strategy promotes accurate classifiers that mainly advocate
the best actions in every situation and prevent the selection
of inaccurate classifiers.
Effect of Adaptive Mapping (eam)
2
1.8
1.6
1.4
1.2
1
0
200
400
600
800
1000
Prediction
Figure 3: Relation between prediction and effect of
adaptive mapping (eam) of classifiers in the final solution.
6. EXPERIMENTAL RESULTS
We applied XCS, the original XCSAM [19, 18], and XCSAM with the new selection strategy to the hidden parity
problem and to the Boolean multiplexer. We compared the
three models in terms of learning performance, population
size, and evolved mapping.
Figure 4 compares the characteristics of selected parent
using accuracy-based selection and the new selection strategy. Figure 5 shows the relation between the iteration that
parents are selected and the eam parameter of selected parents by both selection strategi in XCSAM.
Design of Experiments. Each experiment consists of a
number of problems that the system must solve. Each problem is either a learning problem or a test problem.
1089
In contrast, our improved XCSAM is even faster than the
original XCSAM and reaches optimality after 170000 problems and produces solutions that are half of what XCS needs
(just 570 classifiers).
Overall our results show that by focusing only on the actions with the highest reward, the original XCSAM [19, 18]
can learn faster than XCS but requires more classifiers than
XCS. The improved XCSAM combines the best of the two
worlds and ensures fast learning and more compact solutions. The improved XCSAM in fact evolves solution that
are around half of what XCS produces (as it should be expected).
During learning problems, the system selects actions randomly from those represented in the match set. During
test problems, the system selects the action with highest
expected return. When the system performs the correct action, it receives a 1000 reward, otherwise it receives 0. The
genetic algorithm is enabled only during learning problems,
and it is turned off during test problems. The covering operator is always enabled, but operates only if needed. Learning
problems and test problems alternate. The performance is
reported as the moving average over the last 5000 test problems. All the plots are averages over 10 experiments.
Hidden Parity. In the first set of experiments, we applied
XCS, XCSAM, and the improved XCSAM with the new
selection strategy to HP20,5 (see Section 4) using the standard
parameter settings used in [17, 14].
Figure 6 compares the performance and population size
of XCS, XCSAM, and the improved XCSAM. As can be
noted XCS reaches optimal performance after 110000 problems (Figure 6a) and evolves solutions containing an average of 120 classifiers (Figure 6b). XCSAM learns faster
and reaches optimality after 70000 iterations (Figure 6a)
but produces larger solutions than XCS (an average of 580
classifiers, Figure 6b). In contrast, the improved version
of XCSAM, using the new selection strategy, reaches optimality a bit faster than the original XCSAM but needs less
classifiers than XCS (an average of 79 classifiers).
7. CONCLUSIONS
XCSAM [19, 18] is an extension of XCS [21] that generates solutions mainly containing classifiers that advocate
the actions with the highest returns. While XCS [21] learns
the expected payoff of the available actions in every possible
situation, XCSAM only concentrates on the most promising actions and therefore it can learn faster than XCS [19,
18]. However, XCSAM can often produces solutions that are
larger than those evolved by XCS. Accordingly, in this paper, we extended XCSAM by introducing a novel selection
strategy that could reduce the size of the solutions evolved
by XCSAM. The proposed selection strategy enables XCSAM to select the parents classifiers based both on their
fitness (as done in XCS) and also on the effect they have on
the mapping (encoded by the parameter eam). We applied
XCS, the original XCSAM, and the improved XCSAM to the
20/5 hidden parity problem [17, 14], to the 20-multiplexer
and to the 37-multiplexer [21]. Our results show that the improved XCSAM can evolve solutions that are around half of
the size of the solutions produced by XCS while also reaching optimal much smaller than XCS. Thus, the improved
XCSAM opens up new opportunity to tackle much complex
problems in acceptable time.
Boolean Multiplexer. In the second set of experiments,
we compared the three models on the Boolean multiplexer
[21]. These are are defined over a binary string of k + 2k
bits; the first k bits represent an address pointing to the
remaining 2k bits. For instance, the 6-multiplexer function
(k = 2) applied to the input string 110001 will return 1,
while when applied to 110110 it will return 0. We compared
the three XCS models using the 20-multiplexer (k = 4) and
the 37-multiplexer (k5) using the standard parameter settings [5]: N = 2000(k = 4), 5000(k = 5), 0 = 10, μ = 0.04,
P# = 0.5(k = 4), 0.65(k = 5), Pexplr =1.0, χ = 0.8, β =
0.2, α = 0.1, δ = 0.1, ν = 5, θGA = 25, θdel = 20, θsub = 20,
tournament selection is applied, τ = 0.4, GA subsumption is
turned on while AS subsumption is turned off; in XCSAM,
we set ζ = 0.99. Figures 7 and 8 compare the performance
and population size of XCS, the original XCSAM, and the
improved XCSAM using the new selection strategy on the
20-multiplexer and the 37-multiplexer. As can be noted, in
the 20-multiplexer, XCS reaches optimal performance after
40000 problems (Figures 7a) and evolved solutions that on
the average contain around 340 classifiers (Figures 7b). XCSAM learns a little bit faster than XCS but it produce larger
solutions of about 630 classifiers (i.e., more than twice the
size of the solutions evolved by XCS). In contrast, the improved XCSAM reaches optimality a little bit faster than
XCS, while producing very compact solutions containing an
average of 180 classifiers (Figures 7b), i.e., almost half the
size of the solutions evolved by XCS. In the 37-multiplexer,
XCS needs to train over 400000 problems before it can reach
optimality and evolves solutions containing an average of
1200 classifiers. The original XCSAM [19, 18] learns much
faster than XCS and reaches optimal performance just after
200000 problems; however, the final population size is about
2300 classifiers (almost twice what XCS needs).
Acknowledgment
This work was supported by the JSPS Institutional Program
for Young Researcher Overseas Visits.
8. REFERENCES
[1] E. Bernadó-mansilla and J. M.Garrell-Guij.
Accuracy-based Learning Classifier Systems: Models,
Analysis and Applications to Classification Tasks.
Evolutionary Computation, 11:209–238, 2003.
[2] Larry Bull, Ester Bernadó-Mansilla, and John H.
Holmes, editors. Learning Classifier Systems in Data
Mining, volume 125 of Studies in Computational
Intelligence. Springer, 2008.
[3] M. V. Butz. Rule-Based Evolutionary Online Learning
Systems. Springer, 2006.
[4] M. V. Butz, D. E. Goldberg, and K. Tharakunnel.
Analysis and Improvement of Fitness Exploitation in
XCS: Bounding Models, Tournament Selection, and
Bilateral Accuracy. Evolutionary Computation,
11(3):239–277, 2003.
[5] M. V. Butz, T. Kovacs, P. L. Lanzi, and S. W. Wilson.
Toward a Theory of Generalization and Learning in
XCS. IEEE Transactions on Evolutionary
Computation, 8(1):28–46, February 2004.
1090
1
2000
1500
0.8
Population size
Performance
0.9
XCS
XCSAM
Improved XCSAM
0.7
XCS
XCSAM
Improved XCSAM
1000
500
0.6
0.5
0
0
50
100
150
200
iterations (1000s)
250
300
0
50
a) Performance
100
150
200
iterations (1000s)
250
300
b) Population size
Figure 6: Performance and population size on 20/5 Hidden parity problem
1
2000
1500
0.8
Population size
Performance
0.9
XCS
XCSAM
Improved XCSAM
0.7
XCS
XCSAM
Improved XCSAM
1000
500
0.6
0.5
0
0
20
40
60
80
100
0
20
40
iterations (1000s)
60
80
100
iterations (1000s)
a) Performance
b) Population size
Figure 7: Performance and population size on the 20-multiplexer.
1
5000
4000
0.8
Population size
Performance
0.9
XCS
XCSAM
Improved XCSAM
XCS
XCSAM
Improved XCSAM
0.7
0.6
3000
2000
1000
0.5
0
0
200
400
600
iterations (1000s)
800
1000
0
a) Performance
200
400
600
iterations (1000s)
800
1000
b) Population size
Figure 8: Performance and population size on the 37-multiplexer.
[7] Martin V. Butz, David E. Goldberg, and Pier Luca
Lanzi. Gradient descent methods in learning classifier
systems: Improving xcs performance in multistep
[6] M. V. Butz and S. W. Wilson. An algorithmic
description of xcs. Journal of Soft Computing,
6(3–4):144–153, 2002.
1091
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15] Pier Luca Lanzi, Luigi Nichetti, Kumara Sastry,
Davide Voltini, and David E. Goldberg. Real-coded
extended compact genetic algorithm based on
mixtures of models. In Ying-Ping Chen and
Meng-Hiot Lim, editors, Linkage in Evolutionary
Computation, volume 157 of Studies in Computational
Intelligence, pages 335–358. Springer, 2008.
[16] P.L. Lanzi, D. Loiacono, S.W. Wilson, and D.E.
Goldberg. XCS with computed prediction for the
learning of Boolean functions. In Evolutionary
Computation, 2005. The 2005 IEEE Congress on,
volume 1, pages 588 –595, 2005.
[17] Martin V. Butz and David E. Goldberg, K.
Tharakunnel. Analysis and Improvement of Fitness
Exploitation in XCS: Bounding Models, Tournament
Selection, and Bilateral Accuracy. Evolutionary
Computation, 11(4):239–277, 2003.
[18] Masaya Nakata, Pier Luca Lanzi, and Keiki
Takadama. Enhancing learning capabilities by xcs
with best action mapping. In Carlos A. Coello Coello,
Vincenzo Cutello, Kalyanmoy Deb, Stephanie Forrest,
Giuseppe Nicosia, and Mario Pavone, editors, PPSN
(1), volume 7491 of Lecture Notes in Computer
Science, pages 256–265. Springer, 2012.
[19] Masaya Nakata, Pier Luca Lanzi, and Keiki
Takadama. Xcs with adaptive action mapping. In
Lam Thu Bui, Yew-Soon Ong, Nguyen Xuan Hoai,
Hisao Ishibuchi, and Ponnuthurai Nagaratnam
Suganthan, editors, SEAL, volume 7673 of Lecture
Notes in Computer Science, pages 138–147. Springer,
2012.
[20] R. S. Sutton and A. G. Barto. Reinforcement Learning
– An Introduction. MIT Press, 1998.
[21] S. W. Wilson. Classifier fitness based on accuracy.
Evolutionary Computation, 3(2):149–175, June 1995.
problems. IEEE Transaction on Evolutionary
Computation, 9(5):452–473, October 2005.
Martin V. Butz, Tim Kovacs, Pier Luca Lanzi, and
Stewart W. Wilson. Toward a theory of generalization
and learning in xcs. IEEE Transaction on
Evolutionary Computation, 8(1):28–46, February 2004.
Martin V. Butz, Pier Luca Lanzi, and Stewart W.
Wilson. Function approximation with xcs:
Hyperellipsoidal conditions, recursive least squares,
and compaction. IEEE Trans. Evolutionary
Computation, 12(3):355–376, 2008.
D. E. Goldberg. Genetic Algorithms in Search,
Optimization, and Machine Learning. Addison Wesley,
1989.
J. H. Holland. Escaping Brittleness: The Possibilities
of General Purpose Learning Algorithms Applied to
Parallel Rule-based system. Machine Learning,
2:593–623, 1986.
Tim Kovacs and Manfred Kerber. What makes a
problem hard for xcs? In Pier Luca Lanzi, Wolfgang
Stolzmann, and Stewart W. Wilson, editors, IWLCS,
volume 1996 of Lecture Notes in Computer Science,
pages 80–102. Springer, 2000.
Pier Luca Lanzi. Learning classifier systems from a
reinforcement learning perspective. Soft Computing A Fusion of Foundations, Methodologies and
Applications, 6(3):162–170, 2002.
Pier Luca Lanzi, Daniele Loiacono, Stewart W.
Wilson, and David E. Goldberg. XCS with computed
prediction for the learning of boolean functions. In
Proceedings of the IEEE Congress on Evolutionary
Computation – CEC-2005, pages 588–595, Edinburgh,
UK, September 2005. IEEE.
1092

Download Report

Selection Strategy for XCS with Adaptive Action Mapping

Paperzz.com

Your Paperzz