Complete Action Map or Best Action Map in Accuracy -based Reinforcement Learning Classifier Systems Masaya Nakata Pier Luca Lanzi The University of Electro-Communications Tokyo, Japan Politecnico di Milano Milano, Italy [email protected] Tim Kovacs [email protected] Keiki Takadama University of Bristol Bristol, UK The University of Electro-Communications Tokyo, Japan [email protected] [email protected] ABSTRACT 1. INTRODUCTION We study two existing Learning Classifier Systems (LCSs): XCS, which has a complete map (which covers all actions in each state), and XCSAM, which has a best action map (which covers only the highest-return action in each state). This allows XCSAM to learn with a smaller population size limit (but larger population size) and to learn faster than XCS on well-behaved tasks. However, many tasks have difficulties like noise and class imbalances. XCS and XCSAM have not been compared on such problems before. This paper aims to discover which kind of map is more robust to these difficulties. We apply them to a classification problem (the multiplexer problem) with class imbalance, Gaussian noise or alternating noise (where we return the reward for a different action). We also compare them on real-world data from the UCI repository without adding noise. We analyze how XCSAM focuses on the best action map and introduce a novel deletion mechanism that helps to evolve classifiers towards a best action map. Results show the best action map is more robust (has higher accuracy and sometimes learns faster) in all cases except small amounts of alternating noise. Learning Classifier Systems (LCSs) [7] are rule-based learning systems that evolve general rules (called classifiers) using Genetic Algorithms [6] to represent human-readable solutions. The accuracy-based LCSs [21] [1] are the most popular classifier system model so far and have been successfully applied to a wide range of problem domains [2] such as classification [8], regression [5], and sequential decision making problems [4]. Accuracy-based LCSs can be classified to two types according to a difference of learning mechanisms that they employ: Accuracy-based Reinforcement Learning Classifier Systems (ARLCS) which employ reinforcement learning (RL) [18], i.e., the XCS classifier system [21], and Accuracy-based Supervised Learning Classifier Systems (ASLCS) which employ supervised learning instead of RL, i.e., the UCS classifier system [1]. While ASLCS is a useful and robust model in supervised learning tasks [14], it is not designed to be suitable for RL tasks. When we design ARLCS for given problems, we have to choose either one of two learning strategies: a complete action map which predicts the rewards of all actions in each state and a best action map which in contrast covers only the highest-return action in each state. Most real world applications have difficulties like noise and class imbalances in their state-space and reward functions. Accordingly, a key which can help to design LCS is to understand which strategy (complete or best-action map) is more robust to which kind of difficulties. Orriols-Puig shows that UCS (with a best action map) can derive better performance than XCS (with a complete map) on a class-imbalanced classification problem [14]. Stone and Bull show the ZCS classifier system [20] (which is a strengthbased LCS and employs a best action map) exceeds performance of XCS on noisy problems [17]. Also, Fani shows a revised ZCS for data mining can derive higher classification accuracy than XCS in a classification problem with real world data [19]. These results may suggest the best action map is a more robust strategy than the complete action map. However, ZCS and UCS are not ARLCS. Specifically, as noted previously, UCS can successfully identify a highest-return action (i.e., a correct class) using supervised data. ZCS, like ARLCS, employs reinforcement learning, but it may not correctly generalize classifiers because of its strength-based fitness. Strength-based fitness directly estimates reward (i.e., Categories and Subject Descriptors I.2.6 [Artificial Intelligence]: Learning—knowledge acquisition,concept learning General Terms Algorithms, Experimentation, Performance Keywords Learning Classifier System, XCS, XCSAM, complete action map, best action map, classification Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. GECCO’14, July 12–16, 2014, Vancouver, BC, Canada. Copyright 2014 ACM 978-1-4503-2662-9/14/07 ...$15.00. http://dx.doi.org/10.1145/2576768.2598351. 557 that in any state it can predict the effect of every possible action in terms of expected returns.1 For each possible action a in [M], XCS computes the system prediction P (st , a) which estimates the payoff that XCS expects if action a is performed in st . The system prediction P (st , a) is computed as the fitness weighted average of the predictions of classifiers in [M] which advocate action a: Fk P (st , a) = pk × (1) cli ∈[M ](a) Fi high fitness means high estimated reward). This is useful to identify the best action map but is less useful to find accurate classifiers. In contrast, the accuracy-based fitness of ARLCS is calculated from an error in estimated prediction; it is useful to accurately generalize classifiers but is very sensitive to noise (especially noise which is added to a reward). This means accuracy-based fitness mistakenly leads to incorrect generalization in noisy problems. Accordingly, best action maps may not be robust in class-imbalanced, noisy and real-world classification. In summary, previous studies have found ZCS and UCS were more robust than XCS on class-imbalanced and noisy tasks, but it is not clear if this is due to their best-action maps, or due to their definition of fitness. This paper compares complete and best maps in systems with the same definition of fitness; specifically, we compare two ARLCSs; XCS and XCSAM [12] [11]. XCS evolves a complete action map while XCSAM is a variation of XCS which evolves best action maps. We apply them to versions of a classification problem (the multiplexer) with class imbalance, Gaussian noise or alternating noise. We also compare them on real world data from the UCI repository [16]. From experimental results we analyze how XCSAM focuses on the best action map. We also introduce a novel deletion mechanism [9] that helps to evolve classifiers towards the best action map. The remainder of this paper is organized as follows: In sections 2 and 3 we describe mechanisms of XCS and XCSAM. In section 4 we show experimental results of XCS and XCSAM on problems with noise and class-imbalance, and real wold data. In section 5 we analyze the results of the XCSAM and introduce the deletion mechanism for XCSAM. In section 6 we show experimental results for XCSAM with the deletion mechanism. Finally, in section 7, we conclude this paper by summarizing the contributions. 2. clk ∈[M ](a) where, [M](a) represents the subset of classifiers of [M] with action a, pk identifies the prediction of classifier clk , and Fk identifies the fitness of classifier clk . Then XCS selects an action to perform; the classifiers in [M] which advocate the selected action form the current action set [A]. The selected action at is performed, and a scalar reward rt+1 is returned to XCS together with a new input st+1 . When the reward rt+1 is received, the estimated payoff P (t) is computed as follows: P (t) = rt+1 + γ max P (st+1 , a) a∈[M ] where γ is the discount factor [18], which is 0 for single-step tasks such as those in this work. Next, the parameters of the classifiers in [A] are updated in the following order [3]: prediction, prediction error, and finally fitness. Prediction p is updated with learning rate β (0 ≤ β ≤ 1): pk ← pk + β(P (t) − pk ) (3) Then, the prediction error and classifier fitness are updated with the absolute accuracy κ: k ← k + β(|P (t) − pk | − k ) κk = XCS CLASSIFIER SYSTEM The XCS classifier system maintains a population of rules (“classifiers”) which represent the solution to a reinforcement learning problem [10]. The following sub-sections explain the classifiers and mechanisms of XCS. 2.1 (2) 1 α(k /0 )−ν Fk ← Fk + β( if k ≤ 0 , otherwise. κk × numk − Fk ) κj × numj (4) (5) (6) clj ∈[A] The GA is applied to classifiers in [A] [21]. It selects two classifiers, copies them, and with probability χ performs crossover on the copies; then it mutates each element in its condition with probability μ. Classifier A classifier is a condition-action rule; a condition C is coded by C ∈ {0, 1, #}L , where L is the length of condition, and the symbol ‘#’ is the don’t care symbol which matches all input values (i.e., 0 or 1). Classifiers consist of a condition, an action, and four main parameters [21, 3]: (i) the prediction p, which estimates the relative payoff that the system expects when the classifier is used; (ii) the prediction error ε, which estimates the error of the prediction p; (iii) the fitness F , which estimates the accuracy of the payoff prediction given by p; and (iv) the numerosity num, which indicates how many copies of classifiers with the same condition and the same action are present in the population. 3. XCSAM CLASSIFIER SYSTEM XCS [21] evolves classifiers that represent complete action maps which are composed of all available actions in every possible situation, but the complete action map increases the population size required to evolve an optimal solution. UCS [1] deals with the best action map and is trained using supervised learning, unlike XCS which uses reinforcement learning. Therefore, UCS is limited in that it cannot be applied to reinforcement learning applications. XCS with Adaptive Action Mapping (or XCSAM), provides a trade-off between XCS and UCS. Like XCS, it can solve both reinforcement learning (multi-step) and supervised classification (one-step) problems; like UCS, XCSAM does not evolve a 2.2 Mechanism At time t, XCS builds a match set [M] containing the classifiers in the population [P] whose condition matches the current sensory input st ; if [M] does not contain all the feasible actions covering takes place and creates a set of classifiers that matches st and cover all the missing actions. This process ensures that XCS can evolve a complete mapping so 1 In the algorithmic description [3], covering is activated when the match set contains less than θmna actions; however, θmna is always set to the number of available actions so that the match covers all the actions. 558 complete action map but tries to focus on the best actions only while learning proceeds. XCSAM extends the original XCS by (i) including a mechanism to adaptively identify the redundant actions from those that should be included in a best action map and by (ii) getting rid of redundant actions while still exploring the space of viable optimal solutions. If solving a single-step problem, as in the current work, XCSAM computes a selection array S(a) that associates a selection probability to each action as follows: clk ∈[M ]|aj pk × Fk S(aj ) = (8) clk ∈[M ] Fk 3.1 After action selection is performed, XCSAM generates both the action set [A] (as XCS) and also the not action set [Ā] consisting of the classifiers in [M ] not advocating the selected action. The selection probability is calculated from the fitness F and parameter eam [13] When the executed action is considered a candidate best action, the parents in the GA are selected from [A] to promote the evolution of classifiers that are likely to be in the final best action map while the deleted classifiers are selected from [Ā] to get rid of classifiers that are not likely to be part of the final solution. Otherwise, if there is not enough information about the executed action, or [Ā] is empty, XCSAM applies deletion in [P ] as in XCS. When the executed action is not identified as a candidate best action, the parents are selected from [Ā] to explore the solution space even further and deletion is applied to the population as in XCS. Identification of the Best Action Map Firstly, XCSAM identifies the best action when the selected action is performed. In a typical reinforcement learning problem involving delayed rewards, the expected future reward at the current state maxP (st , a) (Equation 2), tends to be higher than the expected future reward at the previous state maxP (st−1 , a) (because of the discount factor γ). Accordingly, since the action corresponding to a higher reward also tends to correspond to a shorter state sequence, the best actions will tend to have a value of maxP (st , a) larger than maxP (st−1 , a). More precisely, maxP (st−1 , a) converges to γmaxP (st , a) at the next state, while maxP (st , a) converges to the maxP (st−1 , a)/γ. Thus, when an executed action is a best action, in XCS the predictions of the accurate classifiers in [A] tend to converge to maxP (st−1 , a)/γ. For this reason, XCSAM can identify the actions that are likely to be part of the best action map by comparing maxP (st , a) against ζ × maxP (st−1 , a)/γ (where ζ is a learning rate added to guarantee convergence). If maxP (st , a) is greater than the threshold ζ × maxP (st−1 , a)/γ, then a is a good candidate for the best action map and should be maintained. After having identified good action candidates for the best action map, XCSAM needs to adaptively identify classifiers that may be good candidate for the final best action map. Accordingly, XCSAM adds a parameter eam to classifiers, or effect of adaptive mapping, that for classifier cli is updated according to Equation 7, where nma represents the number of available actions. The value of eam of classifiers advocating the selected action converges to 1 if the classifier is a good candidate for the final best action map; otherwise, eam converges to nma. Therefore, classifiers with an eam close to one are good candidates to represent the final best action map while classifiers with an eam close to nma are less likely to be maintained as they are probably advocating actions with lower expected return. ⎧ ⎪ ⎨eami + β(1 − eami ) eami ← if maxP (st , a) ≥ ζ × maxP (st−1 , a)/γ ⎪ ⎩eam + β(nma − eam ) otherwise. i i (7) 3.2 4. EXPERIMENT: XCS VS XCSAM We apply XCS and XCSAM to the multiplexer and a classification problem with real world data. We add three kinds of problem difficulty to the multiplexer: class-imbalance, Gaussian noise or alternating noise. Class-imbalance can be observed in many real world data such as oil spills in satellite images, failures in manufacturing process, and rare medical diagnosis [15]. Gaussian noise simulates real-world-like reward functions which sometimes return a different reward value when the same action is executed in the same state. Alternating noise simulates the dataset having several aliasing states which have the same input but different class. All three conditions can occur in real world problems. 4.1 4.1.1 Experiments on Multiplexer Problems Multiplexer problem and Experimental design Multiplexer problem. The multiplexer (MP) [21] is defined over a binary string of k + 2k bits; the first k bits represent an address pointing to the remaining 2k bits. For instance, the 6-multiplexer function (k = 2) applied to the input string 110001 will return 1 as an answer (i.e., class), while when applied to 110110 it will return 0. We compare XCS and XCSAM using the 11MP (k = 3). Following XCS tradition [21] we generate inputs randomly, so the train and test sets are the same. Evolution on the Best Actions To focus evolution on the best actions, XCSAM acts on the covering operator to prevent the generation of classifiers that are not likely to be included in the final solution. In particular, XCSAM tunes the covering threshold θmna based on the actions’ predicted reward and the eam parameters. Initially, θmna is set to the number of feasible actions (the same value as in XCS). When [M ] is generated, XCSAM computes the prediction array before covering is applied (whereas XCS computes it only after covering). Then, XCSAM computes the current θmna as the average eam of the classifiers in [M] weighted for the expected future return maxP (st , a). If the number of different actions in [M ] is smaller than the computed θmna , covering is called and the prediction array is computed again to update the prediction array. Experimental design. During learning problems, the system selects actions randomly from those represented in the match set. During evaluation problems, the system selects the action with highest expected return. When the system performs the correct action, it receives a 1000 reward, otherwise it receives 0. The genetic algorithm is enabled only during learning problems, and it is turned off during evaluation problems. The covering operator is always enabled, but operates only if needed. Learning problems and test problems alternate. The performance is reported as the moving average over the last 50000 (the class imbalance) or 5000 (Gaussian noise and alternating noise) evaluation problems. All the plots are averages over 20 experiments. 559 XCS ir = 0 ir = 3 ir = 4 ir = 5 ir = 6 ir = 7 0.6 0.4 0.2 1 0.9 0.9 0.8 XCS σ = 100 σ = 200 σ = 300 σ = 400 σ = 500 0.7 0.6 0.5 0 100 200 300 400 0.4 500 0 100 200 Iterations (10000s) a-1) XCS Performance Performance (Minority class) 0.6 0.4 XCSAM ir = 0 ir = 3 ir = 4 ir = 5 ir = 6 ir = 7 0 100 200 0.6 300 400 0.4 500 0 100 300 400 1 0.9 0.9 XCSAM σ = 100 σ = 200 σ = 300 σ = 400 σ = 500 0.7 0.6 0.5 500 0.4 0 100 200 300 400 500 0.8 0.7 0.6 XCSAM Pan = 0.10 Pan = 0.15 Pan = 0.20 Pan = 0.25 Pan = 0.30 0.5 Iterations (10000s) 300 c-1) XCS 1 0.8 200 Iterations (1000s) b-1) XCS 0.8 0 XCS Pan = 0.10 Pan = 0.15 Pan = 0.20 Pan = 0.25 Pan = 0.30 0.7 Iterations (1000s) 1 0.2 0.8 0.5 Performance 0 1 Performance 0.8 Performance Performance (Minority class) 1 400 500 0.4 0 100 200 300 400 Iterations (1000s) Iterations (1000s) a-2) XCSAM b-2) XCSAM c-2) XCSAM a) Class-imbalanced b) Gaussian noise c) Alternating noise 500 Figure 1: Performances of XCS and XCSAM on the multiplexer problem. 4.1.2 Class-imbalance in the class-imbalanced multiplexer problem, both systems have a tendency to overgeneralize classifiers for the majority class. Overgeneralized classifiers have relatively high fitness because inputs where they are wrong (minority-class inputs) happen with very low frequency. Since XCS deletes classifiers based on their fitness, XCS mistakenly keeps the overgeneralized classifiers as good classifiers. XCSAM also reaches optimality with ir = 0, 3 and 4. XCSAM with ir = 5 reaches 0.94 faster than XCS. With ir = 6 and 7 XCSAM clearly has better performances than XCS. While in XCSAM overgeneralized classifiers are also generated, it can detect and delete them in contrast with XCS. This is because XCSAM deletes classifiers which are not candidates for the best action map when it identifies overgeneralized classifiers (which have incorrect best actions) using the reward signal. These results suggest that the best action map can enhance the performance of ARLCS in class-imbalanced problems, because the evolution of the best action map works to detect and quickly delete overgeneralized classifiers. In our first set of experiments we use the class-imbalanced multiplexer problem (IMP) [15], which generates a binary string of k + 2k bits as an instance. If the answer of its instance, which is computed as in the normal MP, belongs to the minority class, then this instance is accepted as an input with probability 1/2ir , where ir denotes the imbalance level of the dataset. If the answer belongs to the majority class, this instance is accepted as the input at all times. Accepted inputs are sent to the LCS. Note ir = 0 represents the usual balanced multiplexer. We set the minority class to 0 and the majority class to 1. We use the following parameter settings [15]:2 N = 800, 0 = 1, μ = 0.04, P# = 0.6, χ = 0.8, β = 0.2, α = 0.1, δ = 0.1, ν = 5, θGA = 25, θdel = 20, θsub = 200, GA subsumption is turned on while AS subsumption is turned off; in XCSAM, we set ζ = 0.99. We set different imbalance levels ir to = 0, 3, 4, 5, 6, and 7. The maximum iteration is set to 5,000,000 as in [15]. Figure 1-a) shows the performances of XCS and XCSAM on the class-imbalanced 11-bit multiplexer problem. Each performance curve shows the rate of correct answers for the minority class only (the performance for the majority class reaches optimally with all ir in both systems). XCS reaches optimality with ir = 0, 3 and 4; it reaches near-optimal performance (0.94) with ir = 5, but it fails to reach nearoptimal performance with ir = 6 and 7 because in XCS overgeneralized classifiers remain in the population. Specifically, 4.1.3 Gaussian noise In our second set of experiments we add Gaussian noise to the reward on the (normal) multiplexer [21]. Gaussian noise is added with mean zero and standard deviation σ to the environment reward. The parameter settings are the same as the previous experiment. We set σ to different values: 100, 200, 300, 400 and 500. The maximum iteration is 500,000. Figure 1-b) shows the performances of XCS and XCSAM on the 11-bit multiplexer problem with Gaussian noise. XCS reaches optimality with σ = 100, 200 and 300. However, it suddenly fails to reach optimal performance with higher σ. In contrast, XCSAM reaches optimality faster than XCS with σ = 200 and 300. Even with a large noise (σ = 400 and 500), XCSAM reaches about 0.96 and 0.85 respectively. 2 [15] also adapts XCS parameter settings to optimize it for class-imbalanced problems but we use the unadapted settings from [15] because we want to compare best-action and complete maps without optimisations, and over all three problem conditions (not just class imbalance). Future work could consider parameter optimization for all cases. 560 With Gaussian noise, the fitness of a classifier is much lower because the prediction error which is used to calculate the fitness becomes large. This means most classifiers are inaccurate, hence XCS cannot select a correct action using the inaccurate system predictions (which are calculated from the fitness). In XCSAM most classifiers are also inaccurate but XCSAM can identify classifiers which are candidates for the best action map. This is because the reward signal, even with added noise, is still useful to identify best action rules; a positive reward, which is returned when a best action is executed, is usually larger than a negative reward, which is returned when a not-best action is executed. That is, the reward signal still indicates the best action even with Gaussian noise. Hence, XCSAM can select the best action classifiers, which remain in the population, by deleting redundant classifiers which are not candidate for the best action map. These results suggest that the best action map contributes to stably derive high performance by deleting redundant classifiers in Gaussian noise where the positive reward is larger than the negative reward. In the next experiments we test the case of noise where the positive reward is sometimes equal to the negative reward. 4.1.4 Table 1: Datasets and Results of XCS and XCSAM (Bold text indicates p<0.05). • /◦ indicates a positive significant difference of XCS/XCSAM respectively. DataName #len #cls XCS XCSAM p value Audiology 69 24 0.46 0.59 0.002◦ A.C.card 14 2 0.84 0.84 0.531 Balance S. 4 3 0.83 0.80 0.026• Bupa 6 2 0.69 0.67 0.530 Breast w 9 2 0.91 0.89 0.152 Breast wd 32 2 0.96 0.97 0.088 Cmc 9 3 0.53 0.51 0.010• Glass 9 6 0.74 0.70 0.207 Heart-c 13 5 0.48 0.51 0.091 Heart-h 13 5 0.64 0.68 0.154 Hepatitis 19 2 0.88 0.83 0.035• Iris 4 3 0.96 0.94 0.081 Libras 91 15 0.15 0.17 0.253 Segment 19 7 0.94 0.95 0.293 Soybean 35 19 0.21 0.45 6.6E-08◦ Vehicle 18 4 0.72 0.71 0.407 Vowel 13 19 0.62 0.63 0.854 Vote 16 2 0.96 0.96 0.988 Wine 13 3 0.96 0.96 0.985 Zoo 17 7 0.95 0.92 0.081 XCS• -XCSAM◦ : 3 - 2 Alternating noise In our third set of experiments we add alternating noise to the reward on the (normal) multiplexer problem. With alternating noise, the reward is swapped with a certain probability Pan . Specifically, the positive reward (1000) is replaced by the negative reward, which means this noise simulates a state space which includes aliasing states. For instance, an instance of 6MP “00100” has a best action “1”; however, when a learner executes an action “1”, 1-Pan % of all instances “00100” returns the positive reward 1000 but Pan % instance return the negative reward 0 even. The parameter settings are the same as the previous experiment. We use different Pan to 0.10, 0.15, 0.20, 0.25 and 0.30. The maximum iteration is set to 500,000. Figure 1-c) shows the performance of XCS and XCSAM on the 11-bit multiplexer problem with alternating noise. With Pan = 0.10 and 0.15, XCS reaches 0.98 and 0.95 respectively; it fails to reach optimal performance with Pan = 0.20, 0.25 and 0.30. The performance of XCSAM with Pan = 0.10 and 0.15 are worse than XCS, in fact they converge to 0.95 and 0.93 respectively. However, with Pan = 0.20, 0.25 and 0.30, XCSAM clearly improves on the performances of XCS. With alternating noise, XCSAM has a tendency to delete classifiers which are candidates for the best action map, because XCSAM mistakenly identifies them as redundant classifiers when the returned reward is 0 (i.e., reward 0 represents that executed action is not the best action). In contrast, XCS keeps them even with the added noise (XCS deletes classifiers based on their fitness), because they have relatively high fitness in the case of small noise. Hence, with small noise Pan = 0.10 and 0.15, the performance of XCSAM is lower than that of XCS. However, with large noises Pan = 0.20, 0.25 and 0.30, the fitness of classifiers is much low, hence, XCS cannot select actions using the incorrect system prediction (which is the same reason of Gaussian noise). 4.2 shown in Table 1 we use 20 datasets. In the Table, #len denotes the length of input in each dataset and #cls denotes the number of classes in each dataset. As in previous work [1], if the attributes are binary, the data is codified in the ternary alphabet. If the attributes are real, the data is codified as an interval range. For simplicity, all real attributes are normalized to the range [0,1). Nominal attributes are translated into numeric values, and so considered as real attributes. The datasets are run on a stratified ten-fold cross-validation test. To test the differences between the both systems, we use a paired t-test from the ten-fold crossvalidation results. The parameter settings are the same as previously except the population size limit N = 6400. The maximum iteration is 100,000. The classification accuracies of XCS and XCSAM are shown in Table 1. We note XCS is significantly better (p<0.05) than XCSAM on Balance S., Cmc and Hepatitis while XCSAM is significantly better Audiology and Soybean which have longer inputs and more classes than other datasets. Overall, we cannot find a clear advantage for either system. 5. ANALYSIS To understand why XCSAM does not derive higher classification accuracies than XCS, we analyze how XCSAM focuses on the best action map. From the analysis, we introduce a deletion strategy for XCSAM which can help to evolve the best action map. 5.1 Analysis Specifically, we analyze the parameters eam (effectiveness of adaptive action mapping) and (prediction error) of classifiers in the population when XCSAM is applied to the real dataset Vowel. The reason for focusing on Vowel is that XCSAM did not derive a better performance than XCS although Vowel has many classes like Audiology and Soybean Experiment on Real World Data To evaluate performance of XCS and XCSAM on a real world application, we apply both systems to classification tasks with real world data from the UCI repository [16]. As 561 9 7 5 3 1 0 100 200 300 400 500 11 Effectiveness of Adaptive action mapping Effectiveness of Adaptive action mapping Effectiveness of Adaptive action mapping 11 9 7 5 3 1 0 100 Prediction Error a) 10,000th iteration 200 300 400 Prediction Error 500 11 9 7 5 3 1 0 100 200 300 400 500 Prediction Error b) 50,000th iteration c) 100,000th iteration Figure 2: Classifies in the population when XCSAM is applied to Vowel where XCSAM clearly derived higher classification accuracy than XCS. In Vowel, the number of available actions nma is 11, which is the same as the number of classes of Vowel. Figure 2 shows the classifiers in the population when XCSAM is applied to Vowel. The vertical and horizontal axes indicate the classifiers’ eam and parameters. A classifier has the best action if its eam is close to 1, while a not-best action if its eam is close to 11 (nma). Similarly, a classifier is accurate if its is close to 0, while inaccurate if its is close to 500. For instance, classifiers which are placed at the left-top corner are accurate but not candidates for the best action map; classifiers which are placed at the left-bottom corner are accurate and candidates for the best action map. From figure a), at the 10,000th iteration, XCSAM generates redundant classifiers which are inaccurate (large ) or not candidates for the best action map (large eam). At the 50,000th iteration, XCSAM gradually generates classifiers which have large but small eam; on the other hand many redundant classifiers remain. At the 100,000th as the final iteration, XCSAM successfully learns several accurate classifiers (left-bottom corner) which have a best action (eam = 1); however many inaccurate classifiers and redundant classifiers (left-bottom corner) still remain. This analysis suggests XCSAM can partially learn the best action map, but it has many redundant classifiers which are inaccurate or are not best action map candidates. Hence, XCSAM sometimes fails to efficiently evolve the best action map. 5.2 is called when the executed action is identified as the best action. First, the average eam EAM (a) is computed from classifiers with action a. Then XCSAM builds a maximum eam set [E] composed of classifiers in [M] which advocate the maxEAM (a) (i.e., they are not even close to the best action map); and deletes all classifiers in [E] if nma[M ] > θmna . XCSAM repeats this process while nma[M ] > θmna . 6. EXPERIMENT: REVISED XCSAM We apply the revised XCSAM with the new deletion mechanism to the multiplexer and real world data. 6.1 Experiments on Multiplexer Problems The experimental settings are the same as the previous experiments (Section 4.1). Figure 3 shows the performance of the revised XCSAM on the 11-bit multiplexer with classimbalance, Gaussian noise or alternating noise. With classimbalance (Figure 3 a)), the revised XCSAM reaches optimality with ir = 0, 3 and 4; with ir = 5 and 6 it reaches 0.99 and 0.96 respectively, which is higher accuracy than the standard XCSAM which reaches to 0.94 and 0.87 respectively (see Figure 1-a)). With Gaussian noise (Figure 3 b)), with σ = 100, 200, 300 and 400, the performance is as good as that of the standard XCSAM. With σ = 500, the revised XCSAM reaches 0.90 while standard XCSAM reaches 0.85 (see Figure 1-b)). In contrast, with alternating noise (Figure 3 b)), the performance of revised XCSAM is lower than the standard XCSAM. For instance, with Pan = 0.10 and 0.15, the revised XCSAM reaches 0.89 and 0.84 respectively, while the standard XCSAM reaches 0.95 and 0.93. As noted in Section 4.1, with class-imbalance and Gaussian noise, since XCSAM can identify the best action map, the new deletion mechanism correctly deletes redundant classifiers. Hence, the revised XCSAM can successfully focus on the best action map and evolve them toward accurate classifiers. However, with alternating noise, XCSAM sometimes mistakenly identifies redundant classifier as the best action map due to noise. This causes the new deletion mechanism to mistakenly delete even classifiers which are candidates for the best action map. Hence, the revised XCSAM does not perform better than the standard XCSAM. This suggests that, if ARLCS can evolve only best action maps and have no redundant classifiers in their population, the best action map is more robust than the complete action map for classimbalance and Gaussian noise; in contrast, robustness with alternating noise is worse than the complete action map. Deletion mechanism From the analysis, to efficiently focus on the best action map, XCSAM needs to delete many redundant classifiers. However, using a steady-state GA, XCSAM deletes only two classifiers in each generation, hence many redundant classifiers remain in the population. Here, we introduce a new deletion mechanism that deletes more than two classifiers as necessary. Specifically, in XCSAM, θmna , which is calculated from the parameter eam, is the number of actions which should be in the matchset [M]. When θmna = nma (number of action types), classifiers in [M] should not be removed because XCSAM does not detect the best action in its state. In contrast, when θmna = 1 (which means XCSAM correctly identified the best action), classifiers will be redundant if [M] has more than one type of action. More generally, if the number of action types in [M] (nma[M ] ) is larger than θmna , we can suspect there are some redundant classifiers in [M]. For this reason, the new deletion mechanism deletes classifiers if nma[M ] > θmna . The new deletion mechanism 562 0.6 0.4 Revised XCSAM ir = 0 ir = 3 ir = 4 ir = 5 ir = 6 ir = 7 0.2 0 0 100 200 300 400 1 1 0.9 0.9 0.8 Revised XCSAM σ = 100 σ = 200 σ = 300 σ = 400 σ = 500 0.7 0.6 Performance 0.8 Performance Performance (Minority class) 1 0.5 500 0.4 0.7 0.6 Revised XCSAM Pan = 0.10 Pan = 0.15 Pan = 0.20 Pan = 0.25 Pan = 0.30 0.5 0 100 Iterations (10000s) 200 300 Iterations (1000s) a) Class-imbalance 0.8 b) Gaussian noise 400 500 0.4 0 100 200 300 400 500 Iterations (1000s) c) Alternating noise Figure 3: Performance of revised XCSAM on the multiplexer problem differences of revised XCSAM from the standard XCSAM are noted, the revised XCSAM successfully improves on the performance of XCSAM on the real world data. From these results, the best action map is more effective on the classification of data where an input has many elements (i.e., it is high-dimensional) and many classes. Finally, to understand how the revised XCSAM focuses on the best action map, we analyze the classifiers of revised XCSAM when it is applied to dataset Vowel ; see figure 4. At the 10,000th iteration, like the standard XCSAM, revised XCSAM generates redundant classifiers which are inaccurate (large ) or not candidates for the best action map (large eam). At the 50,000th iteration, the revised XCSAM gradually learns only the classifiers which have small eam parameters. Note in the standard XCSAM, many classifiers which have large eam remain, but the revised XCSAM successfully deletes them. At the 100,000th iteration, many accurate classifiers (which have = 0 and eam = 1) are generated and the classifiers which have large eam do not remain. From these results, the revised XCSAM successfully deletes classifies which are not best action map candidates, hence, the revised XCSAM can derive high performance. Table 2: Results of Revised XCSAM (Bold text indicates p<0.05). p1 value indicates the difference between XCS and revised XCSAM. • /◦ indicates a positive significant difference of XCS/Revised XCSAM respectively. p2 value indicates it between XCSAM and revised XCSAM. /♦ indicates a positive significant difference of XCSAM/ RevisedXCSAM respectively. DataName Audiology A.C.card Balance S. Bupa Breast w Breast wd Cmc Glass Heart-c Heart-h Hepatitis Iris Libras Segment Soybean Vehicle Vowel Vote Wine Zoo 6.2 Revised XCSAM p1 value 0.57 0.031◦ 0.84 0.755 0.85 0.003◦ 0.67 0.260 0.89 0.258 0.97 0.152 0.52 0.375 0.69 0.228 0.49 0.678 0.61 0.377 0.83 0.020◦ 0.95 0.343 0.22 0.002◦ 0.95 0.040◦ 0.40 1.1E-04◦ 0.70 0.179 0.85 2.6E-0.5◦ 0.95 0.044• 0.96 0.999 0.95 0.972 XCS• -Revised XCSAM◦ : XCSAM -Revised XCSAM♦ : p2 value 0.554 0.726 0.001♦ 0.812 0.891 0.811 0.423 0.879 0.304 0.055 0.838 0.168 0.005♦ 0.231 0.065 0.657 5.3E-0.7♦ 0.227 0.988 0.343 1-7 0-3 7. SUMMARY AND CONCLUSION This paper compares XCS [20] with XCSAM [11, 12] on multiplexer problems and classification of real world data. Specifically, firstly, we test both system on the multiplexer problem with the three problem difficulties: class-imbalance, Gaussian noise and alternating noise. Experimental results show that XCSAM can derive better performance (higher accuracy and sometimes faster learning) than XCS with classimbalance and Gaussian noise, where XCSAM can identify classifiers which are best action map candidates. Hence, XCSAM can efficiently evolve classifiers toward a best action map. In contrast, XCSAM sometimes derives lower accuracy than XCS when the alternating noise is small. From analysing this we introduce a new deletion mechanism. While original XCSAM sometimes fails to focus on the best action map in real world data, we show our new deletion mechanism can focus on it with real world data. The revised XCSAM which employs the new deletion mechanism can derive higher accuracy than XCSAM on the classimbalanced and Gaussian noise problems; however with alternating noise, its accuracy degrades more than XCSAM. On classification with real world data, the revised XCSAM can derive higher classification accuracy on data which have a high-dimensional input and many possible classes. These Experiment on Real World Data We test the revised XCSAM on classifying the real world datasets from Section 4.2, using the same parameter settings. Table 2 shows the classification accuracy of XCS and p values between XCS and the revised XCSAM and between the standard XCSAM and revised XCSAM. The revised XCSAM clearly improves on the classification accuracy of XCSAM (see Table 1). We also found seven positive significant differences of revised XCSAM from XCS. In particular, the revised XCSAM derives high classification accuracy on the datasets Audiology, Libras, Soybean and Vowel, where their input-lengths are relatively long and there are many possible classes. Additionally, since three positive significant 563 9 7 5 3 1 0 100 200 300 400 500 11 Effectiveness of Adaptive action mapping Effectiveness of Adaptive action mapping Effectiveness of Adaptive action mapping 11 9 7 5 3 1 0 100 Prediction Error a) 10,000th iteration 200 300 Prediction Error b) 50,000th iteration 400 500 11 9 7 5 3 1 0 100 200 300 400 500 Prediction Error c) 100,000th iteration Figure 4: Classifies in the population when revised XCSAM is applied to Vowel [6] D. E. Goldberg. Genetic Algorithms in Search, Optimization, and Machine Learning. Addison Wesley, 1989. [7] J. H. Holland. Escaping Brittleness: The Possibilities of General Purpose Learning Algorithms Applied to Parallel Rule-based system. Machine Learning, 2:593–623, 1986. [8] F. Kharbat, L. Bull, and M. Odeh. Mining breast cancer data with XCS. GECCO2007, pages 2066–2073, 2007. [9] T. Kovacs. Deletion schemes for classifier systems. GECCO1999, pages 329–336, 1999. [10] Pier Luca Lanzi. Learning classifier systems from a reinforcement learning perspective. Soft Computing A Fusion of Foundations, Methodologies and Applications, 6(3):162–170, 2002. [11] M. Nakata, P. L. Lanzi, and K. Takadama. Enhancing Learning Capabilities by XCS with Best Action Mapping. In PPSN XII, volume 7491 of LNCS, pages 256–265. Springer, 2012. [12] M. Nakata, P. L. Lanzi, and K. Takadama. XCS with Adaptive Action Mapping. In SEAL2012, volume 7673 of LNCS, pages 138–147. Springer, 2012. [13] Masaya Nakata, Pier Luca Lanzi, and Keiki Takada ma. Selection strategy for XCS with adaptive action mapping. In GECCO2013, p. 1085–1092. ACM, 2013. [14] A. Orriols and E. Bernadó-Mansilla. Class imbalance problem in UCS classifier system: fitness adaptation. In CEC2005, vol. 1, p. 604–611. IEEE, 2005. [15] A. Orriols-Puig and E. Bernadó-Mansilla. Bounding XCS’s parameters for unbalanced datasets. In GECCO2006, pages 1561–1568. ACM, 2006. [16] UCIrvine Machine Learning Repository. http://archive.ics.uci.edu/ml/. [17] C. Stone and L. Bull. Comparing XCS and ZCS on noisy continuous valued environments. In ETechnical Report. UWE, 2005. [18] R. S. Sutton and A. G. Barto. Reinforcement Learning – An Introduction. MIT Press, 1998. [19] Fani A. Tzima and Pericles A. Mitkas. ZCS Revisited: Zeroth-level Classifier Systems for Data Mining. ICDM Workshops, pages 700–709, 2008. [20] S. W. Wilson. ZCS: A Zeroth Level Classifier System. Evolutionary Computation, 2(1):1–18, 1994. [21] S. W. Wilson. Classifier fitness based on accuracy. Evolutionary Computation, 3(2):149–175, June 1995. result suggest that the best action map is more robust to the class-imbalance and Gaussian noise, since the evolution of best action maps works to detect and quickly delete overgeneralized classifiers and classifiers which do not have a best action map. Note, despite these problem difficulties, the reward signal is still useful to guide the LCS to the best action. The alternating noise causes XCSAM to mistakenly identify a not-best action as best action. Accordingly, XCSAM deletes accurate classifiers which have a best action by wrongly identifying them as not-best actions. Hence, the best action map is less robust than the complete action map in noise where the best action is mistakenly evaluated as the not-best action, i.e., in aliased states. Our conclusion is that the best action map is a powerful strategy in problems which distinctively return a positive reward for the best action with a negative reward for the not-best action (i.e., the positive reward is always larger than the negative reward). Additionally, if ARLCS can successfully focus on only the best action map, the best action map contributes to high performance on a complex problem which has a high-dimensional input and many possible classes. The complete action map is a better strategy than the best action map when given problems include the aliasing states. Results with RL problems could be different from classification problems but we leave this for future work. 8. REFERENCES [1] E. Bernadó-mansilla and J. M.Garrell-Guiu. Accuracy-based Learning Classifier Systems: Models, Analysis and Applications to Classification Tasks. Evolutionary Computation, 11:209–238, 2003. [2] Larry Bull, Ester Bernadó-Mansilla, and John H. Holmes, editors. Learning Classifier Systems in Data Mining, volume 125 of Studies in Computational Intelligence. Springer, 2008. [3] M. V. Butz and S. W. Wilson. An algorithmic description of XCS. Journal of Soft Computing, 6(3–4):144–153, 2002. [4] M. V. Butz, D. E. Goldberg, and P. L. Lanzi. Gradient descent methods in learning classifier systems: Improving XCS performance in multistep problems. IEEE Trans. on Evol. Comp., 9(5):452–473, October 2005. [5] M. V. Butz, P. L. Lanzi, and S. W. Wilson. Function approximation with XCS: Hyperellipsoidal conditions, recursive least squares, and compaction. IEEE Trans. on Evol. Comp., 12(3):355–376, 2008. 564
© Copyright 2025 Paperzz