XCS Structure and Function of the XCS Classifier System Stewart W. Wilson Prediction Dynamics Concord, MA [email protected] 1 What is it? XCS • Learning machine (program). • Minimum a priori. • “On-line”. • Capture regularities in environment. 2 What does it learn? XCS To get reinforcements (“rewards”, “payoffs”) Environment Payoffs Inputs XCS Actions (Not “supervised” learning—no prescriptive teacher.) 3 XCS What inputs and outputs? Inputs: Now binary, e.g., 100101110 —like thresholded sensor values. Later continuous, e.g., <43.0 92.1 7.4 ... 0.32> Outputs: Now discrete decisions or actions, e.g., 1 or 0 (“yes” or “no”), “forward”, “back”, “left”, “right” Later continuous, e.g., “head 34 degrees left” 4 XCS What’s going on inside? XCS contains rules (called classifiers), some of which will match the current input. An action is chosen based on the predicted payoffs of the matching rules. <condition>:<action> => <prediction>. Example: 01#1## : 1 => 943.2 Note this rule matches more than one input string: 010100 010110 010101 011111 011100 011101 011110 011111. This adaptive “rule-based” system contrasts with “PDP” systems such as NNs in which knowledge is distributed. 5 How does the performance cycle work? XCS Environment 0011 “left” Detectors match [P] p #011 : 01 11## : 00 #0## : 11 001# : 01 #0#1 : 11 1#01 : 10 Effectors ε 43 .01 32 .13 14 .05 27 .24 18 .02 24 .17 ...etc. F 99 9 52 3 92 15 Reward Match Set [M] #011 : 01 #0## : 11 001# : 01 #0#1 : 11 01 43 .01 99 14 .05 52 27 .24 3 18 .02 92 Action Set [A] Prediction Array action nil 42.5 nil 16.6 #011 : 01 001# : 01 43 .01 99 27 .24 3 selection • For each action in [M], classifier predictions p are weighted by fitnesses F to get system’s net prediction in the prediction array. • Based on the system predictions, an action is chosen and sent to the environment. • Some reward value is returned. 6 XCS 1. How do rules acquire their predictions? By “updating” the current estimate. For each classifier C j in the current [A], p j ← p j + α(R - p j ), where R is the current reward and α is the learning rate. This results in p j being a “recency weighted” average of previous reward values: p j (t) = αR(t) + α(1-α)R(t-1) + α(1-α) 2 R(t-2) + ... + (1-α) t p j (0). 2. And by trying different actions, according to an explore/exploit regime. A typical regime chooses a random action with probability 0.5. Exploration (e.g., random choice) is necessary in order to learn anything. But exploitation—picking the highest-prediction action is necessary in order to make best use of what is learned. There are many possible explore/exploit regimes, including gradual changeover from mostly explore to mostly exploit. 7 XCS Where do the rules come from? • Usually, the “population” [P] is initially empty. (It can also have random rules, or be seeded.) • The first few rules come from “covering”: if no existing rule matches the input, a rule is created to match, something like imprinting. Input: 11000101 Created rule: 1##0010# : 3 => 10 Random #’s and action, low initial prediction. • But primarily, new rules are derived from existing rules. 8 XCS How are new rules derived? • Besides its prediction p j , each classifier’s error and fitness are regularly updated. Error: ε j ← ε j + α(|R - p j | - ε j ). Accuracy: κj ≡ εj -n if ε j > ε 0 , otherwise ε 0 -n κj ′ ≡ κj ⁄ κi i ∑ Relative accuracy: Fitness: , over [A]. F j ← F j + α(κ j ′ - F j ). • Periodically, a genetic algorithm (GA) takes place in [A]. Two classifiers C i and C j are selected with probability proportional to fitness. They are copied to form C i ′ and C j ′. With probability χ, C i ′ and C j ′ are crossed to form C i ″ and C j ″, e.g., 10##11:1 #0001#:1 10##1#:1 #00011:1 ⇒ C i ″ and C j ″ (or C i ′ and C j ′ if no crossover occurred), possibly mutated, are added to [P]. 9 Can I see the overall process? XCS Environment 0011 “left” Detectors match [P] p #011 : 01 11## : 00 #0## : 11 001# : 01 #0#1 : 11 1#01 : 10 Effectors ε 43 .01 32 .13 14 .05 27 .24 18 .02 24 .17 ...etc. F 99 9 52 3 92 15 Reward Match Set [M] #011 : 01 #0## : 11 001# : 01 #0#1 : 11 01 43 .01 99 14 .05 52 27 .24 3 18 .02 92 Action Set [A] Prediction Array action nil 42.5 nil 16.6 #011 : 01 001# : 01 43 .01 99 27 .24 3 selection Update: predictions, errors, fitnesses (cover) 10 GA XCS What happens to the “parents”? They remain in [P], in competition with their offspring. But two classifiers are deleted from [P] in order to maintain a constant population size. Deletion is probabilistic, with probability proportional to, e.g.: • A classifier’s average action set size a j —estimated and updated like the other classifier statistics. • a j /F j , if the classifier has been updated enough times, otherwise a j /F ave , where F ave is the mean fitness in [P]. —And other arrangements, all with the aim of balancing resources (classifiers) devoted to each niche ([A]), but also eliminating low fitness classifiers rapidly. 11 XCS What are the results like? — 1 Basic example for illustration: Boolean 6-multiplexer. 101001 → → 0 F6 101001 F 6 = x 0 'x 1 'x 2 + x 0 'x 1 x 3 + x 0 x 1 'x 4 + x 0 x 1 x 5 l = k + 2k k>0 F 20 = x 0 'x 1 'x 2 'x 3 'x 4 + x 0 'x 1 'x 2 'x 3 x 5 + x 0 'x 1 'x 2 x 3 'x 6 + x 0 'x 1 'x 2 x 3 x 7 + x 0 'x 1 x 2 'x 3 'x 8 + x 0 'x 1 x 2 'x 3 x 9 + x 0 'x 1 x 2 x 3 'x 10 + x 0 'x 1 x 2 x 3 x 11 + x 0 x 1 'x 2 'x 3 'x 12 + x 0 x 1 'x 2 'x 3 x 13 + x 0 x 1 'x 2 x 3 'x 14 + x 0 x 1 'x 2 x 3 x 15 + x 0 x 1 x 2 'x 3 'x 16 + x 0 x 1 x 2 'x 3 x 17 + x 0 x 1 x 2 x 3 'x 18 + x 0 x 1 x 2 x 3 x 19 01100010100100001000 → 0 12 XCS What are the results like?— 2 13 What are the results like?— 3 XCS Population at 5,000 problems in descending order of numerosity (first 40 of 77 shown). 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34. 35. 36. 37. 38. 39. 11 00 01 01 11 00 01 10 00 10 11 10 01 00 10 11 11 0# 11 01 11 10 10 00 11 10 10 #1 01 01 10 11 00 10 11 01 01 11 10 #1 ## 1# #1 #1 ## 1# #0 ## 0# ## ## ## #0 0# ## ## ## 00 ## 10 #1 ## 1# 0# 1# #1 ## #0 10 00 0# 0# 1# #1 #1 #1 #0 ## ## 1# #0 ## ## ## #1 ## ## 1# ## 0# #0 1# ## ## 0# #1 11 ## 01 ## #0 11 0# 0# #0 0# 11 #0 ## ## 0# #1 #0 0# #1 #0 #1 01 ## #0 1 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 0 0 1 0 0 0 1 1 1 1 1 1 0 0 1 0 PRED 0. 0. 1000. 0. 0. 1000. 1000. 1000. 1000. 1000. 1000. 0. 0. 0. 0. 1000. 1000. 0. 1000. 0. 1000. 0. 0. 1000. 1000. 1000. 1000. 1000. 1000. 1000. 0. 1000. 1000. 0. 1000. 1000. 1000. 0. 703. 856. ERR .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .00 .31 .22 FITN 884. 819. 856. 840. 719. 698. 664. 712. 674. 706. 539. 638. 522. 545. 425. 458. 233. 210. 187. 168. 114. 152. 131. 117. 68. 46. 81. 86. 61. 58. 63. 63. 77. 93. 59. 75. 36. 92. 8. 11. 14 NUM 30 24 22 20 20 19 18 18 17 17 17 16 15 14 12 11 6 6 5 4 4 4 3 3 3 3 3 3 2 2 2 2 2 2 2 2 2 2 2 2 GEN .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .50 .33 .50 .33 .33 .33 .33 .33 .33 .33 .33 .33 .50 .33 .33 .33 .33 .33 .33 .33 .33 .33 .33 .67 .50 ASIZ 31.2 25.9 24.1 21.8 22.6 20.9 23.9 22.4 21.2 19.9 24.5 20.0 23.5 20.9 23.0 21.1 22.1 23.1 21.1 19.1 26.2 23.9 21.7 22.8 28.7 20.6 23.9 23.6 22.5 22.2 22.8 23.2 20.7 24.5 21.8 23.1 21.7 19.7 22.3 27.4 EXPER 287 286 348 263 238 222 254 236 155 227 243 240 283 110 141 76 130 221 86 123 113 34 111 57 38 4 113 228 16 46 22 35 7 28 12 21 3 41 10 22 TST 4999 4991 4984 4988 4972 4985 4997 4980 4992 4990 4978 4994 4967 4979 4968 4983 4942 4979 4983 4939 4978 4946 4968 4992 4978 4990 4950 4981 4997 4981 4866 4953 4985 4968 4983 4944 4997 4948 4980 4978 Can you show the evolution of a rule? XCS Action sets [A] for input 101001 and action 0 at several epochs. 247 0. 1. 2. ## ## ## ## 10 ## ## 10 0# PRED 0 431. 0 245. 0 893. ERR FITN NUM GEN ASIZ EXPER TST .440 8. 2 1.00 17.2 76 244 .362 109. 2 .67 10.6 14 236 .146 504. 5 .50 11.2 8 200 1135 0. 1. 2. 3. 4. 5. 6. 7. 8. 9. PRED 519. 510. 125. 1000. 454. 735. 169. 445. 1000. 1000. ERR FITN NUM GEN ASIZ EXPER .419 1. 1 .67 16.5 11 .390 27. 2 .67 16.8 15 .261 0. 1 .83 21.7 18 .021 4. 1 .67 17.7 0 .433 2. 1 .50 14.8 53 .343 27. 2 .33 14.4 13 .282 2. 1 .67 24.4 12 .418 13. 5 .67 18.6 27 .000 135. 2 .67 24.2 3 .000 451. 3 .50 23.4 17 TST 1134 1119 1132 1117 1106 1106 1119 1119 1117 1117 ## ## ## #0 #0 #0 1# 1# 10 10 #0 #0 1# ## 10 10 ## ## ## ## #1 0# ## 0# ## 0# #1 0# ## 0# 0 0 0 0 0 0 0 0 0 0 #0 1# 1# 1# 10 1# ## #0 10 ## 0# 0# #1 0# 0# PRED 0 761. 0 652. 0 107. 0 829. 0 1000. ERR FITN NUM GEN ASIZ EXPER .336 1. 1 .50 10.6 10 .387 5. 1 .67 10.9 11 .197 6. 1 .50 22.0 8 .228 26. 2 .33 14.3 9 .000 490. 4 .50 11.6 26 TST 1325 1325 1308 1325 1325 1# ## 0# 10 ## 0# PRED 0 360. 0 1000. ERR FITN NUM GEN ASIZ EXPER .394 0. 1 .67 18.1 14 .000 478. 10 .50 20.1 95 TST 2404 2392 PRED 863. 1000. 1000. 1000. ERR FITN NUM GEN ASIZ EXPER .237 0. 3 .67 21.1 18 .000 630. 13 .50 22.6 117 .000 49. 1 .33 22.4 9 .000 58. 1 .33 18.4 8 TST 2714 2714 2638 2693 1333 0. 1. 2. 3. 4. 2410 0. 1. 2725 0. 1. 2. 3. #0 10 10 10 ## ## #0 1# 0# 0# 0# 0# 0 0 0 0 15 Why accurate, maximally general rules? XCS Consider two classifiers C1 and C2 having the same action, and let C2 be a generalization of C1. That is, C2 can be obtained from C1 by changing some non-# alleles in the condition to #’s. Suppose that C1 and C2 are equally accurate. They will therefore have the same fitness. However, note that, since it is more general, C2 will occur in more action sets than C1. What does this mean? Since the GA acts in the action sets, C2 will have more reproductive opportunities than C1. This edge in reproductive opportunities will cause C2 to gradually drive C1 out of the population. Example: p ε F C1: 1 0 # 0 0 1 : 0 ⇒ 1000 .001 920 C2: 1 0 # # 0 # : 0 ⇒ 1000 .001 920 C2 has equal fitness but more reproductive opportunities than C1. C2 will “drive out” C1 16 XCS Does XCS scale up? 17 XCS What about complexity? 20m ~5x harder than 11m 11m ~5x harder than 6m. ⇒ D = cg p , where D = “difficulty”, here learning time, g = number of maximal generalizations, p = a power, about 2.3 c = a constant about 3.2 Thus “D is polynomial in g”. What is D with respect to l, string length? For the multiplexers, l = k + 2 k , or l → 2 k for large k. But g = 4·2 k , thus l ~ g, So that “D is polynomial in l” (not exponential). 18 XCS What about deferred reward? Apply ideas from multi-step reinforcement learning. Need the action-value of each action in each state. What is the action-value of a state more than one step from reward? Intuitive sketch: O γ2 F γ 1 γ2 γ3 γ3 γ2 γ2 γ2 γ γ3 p j ← p j + α[(r imm + γ max P(x′,a′)) - p j ] a′∈ A where p j is the prediction of a classifier in the current action set [A], x′ and a′ are the next state and possible actions, P(x′,a′) is a system prediction at the next state, and r imm is the current external reward. 19 Can I see the overall process? XCS Environment 0011 “left” Detectors match [P] p #011 : 01 11## : 00 #0## : 11 001# : 01 #0#1 : 11 1#01 : 10 Effectors ε 43 .01 32 .13 14 .05 27 .24 18 .02 24 .17 ...etc. F 99 9 52 3 92 15 (Reward) Match Set [M] #011 : 01 #0## : 11 001# : 01 #0#1 : 11 01 43 .01 99 14 .05 52 27 .24 3 18 .02 92 Action Set [A] Prediction Array action nil 42.5 nil 16.6 #011 : 01 001# : 01 43 .01 99 27 .24 3 selection max discount (cover) Update: predictions, errors, fitnesses + delay = 1 P Previous Action Set [A]-1 GA • Previous action set [A] -1 is saved and updates are done there, using the current prediction array for “next state” system predictions. • On the last step of a problem, updates occur in [A]. 20 XCS What are the results like?— 1 • Animat senses the 8 adjacent cells. Fbb O*b Qbb • Coding of each object: * F = 110 “food1” G = 111 “food2” O = 010 “rock1” Q = 011 “rock2” b = 000 “blank” • “Sense vector” for above situation: 000000000000000011010110 • A matching classifier: ####0#00####00001##101## : 7 21 XCS What are the results like?— 2 Two generalizations discovered by XCS in Woods1. 22 XCS What about continuous inputs, actions, time? Inputs: <<x 1 ± ∆x 1 > … <x n ± ∆x n >> : <action> ⇒ p Actions: <<x 1 ± ∆x 1 > … <x n ± ∆x n >> : <a ± ∆a> ⇒ p —and combine matching rules à la fuzzy logic, perhaps. Time: <<x 1 ± ∆x 1 > … <x n ± ∆x n >> : <a ± ∆a> ⇒ dp dt —action selection based on steepest ascent of p. 23 XCS What about Non-Markov environments? Example (McCallum’s Maze): * * * Aliased states. Optimal action not determinable from current sensory input. Approaches: • “History window” — remember previous inputs • Search for correlation with past input events ✔ • Adaptive internal state: Environmental condition External action Prediction ## ## #1 ## 0# ## ## ## # : 1 0 ⇒ 504 Internal condition 24 Internal action XCS What if generalizations are not conjunctive? Example: “if x > y for any x and y, and action a is taken, payoff is predicted to be p.” Cannot be represented using a single classifier with traditional conjunctive condition, since it’s a relation. However, it can be represented using an “s-classifier”: (> x y) : <action a> ⇒ p i.e., a classifier whose condition is a Lisp s-expression. With appropriate elementary functions, s-classifiers can encode an almost unlimited variety of conditions. They can be evolved using techniques of genetic programming. 25 XCS How is XCS different from other RL systems? Rule-based, not PDP (“parallel distributed processing”) • Structure is created as needed • Learning may often be faster because classifiers are inherently non-linear • Learning complexity may be less than most PDPs • Classifiers can keep and use statistics; difficult in a network • Hierarchy and reasoning may be easier, since knowledge is in subroutine-like packages 26
© Copyright 2026 Paperzz