Explanations for Cooperation: A Tale of Two Theories

Classifier Representation in
LCS
James Marshall and Tim Kovacs
Classifier Representations
• We are comparing traditional LCS
representations with alternatives from
different classification algorithms
– E.g. Artificial Immune Systems (AIS)
LCS Classifiers
• Classifier conditions in LCS are
specified in a ternary alphabet and look
like this:
– 00#1011##0
• Classifiers match instances if all their
bits match, apart from wildcards which
match 0 or 1, e.g.:
00#1011##0 (Classifier)
0011011010 (Instance)
LCS Classifiers
• So, classifiers match instances on a ddimensional hyperplane
– d is number of # in condition
• Classifiers specify an action as well as a
condition
• In classification, this can be a predicted
class for matched instances:
– 00#1011##0:1
AIS Classifiers
• Hyperplanes are not the only shape
• An obvious alternative classifier
representation comes from one AIS
representation
– Classifiers match instances if the Hamming
distance between them is below a
threshold
– i.e. hyperspheres of given radius
Representation Comparison
• Q: Apart from the obvious differences in
calculating matches, how do LCS and
AIS representations differ?
• A: quite a lot
– Instances covered by classifiers changes
in different ways with size
– Search space size for classifiers is
substantially different
Instance Coverage
• Hyperplane coverage varies with
dimension:
2d
• Hypersphere coverage varies with
problem size and radius:
n
 

k 1  k 
r
Instance Coverage
Classifier Search Space
• Number of possible hyperspheres changes
with problem size, but is constant for any
given radius:
2n
• Number of possible hyperplanes changes
with dimension and problem size:
 2 n  n d 
 2
d
 
Classifier Search Space
• N.B. as n increases n  12  3
n
n
• i.e. hypersphere search space much smaller
than hyperplane search space
Comparing Classifier
Performance on Multiplexers
Multiplexers
• A longstanding testbed for LCS
• Instances consist of address bits and
data bits
010 00101001
• Instance class given by value of
addressed data bit
• Typical multiplexer sizes used are 6 (2 +
22) and 11 (3 + 23)
Proofs
•
It’s easy to prove the following theorems for
the multiplexer:
1. 100% accurate hyperplanes always possible
2. 100% accurate hyperspheres never possible
3. Hyperspheres must be paired and have specificity to be
100% accurate
4. Hyperspheres must have variable radius to avoid
ambiguity
•
Proposition: more hyperspheres required
than hyperplanes to accurately classify
instance space
Enumeration of Classifiers
• 11-multiplexer is small enough to enumerate
classifiers and look at accuracy distribution
• i.e. measure percentage of instances covered
by a classifier that belong to same (majority)
class
• Let’s do this just for the smallest classifiers of
comparable size that generalise (i.e.
dimension 2, or radius 1)…
Enumeration of Classifiers
• N.B. 100% accurate classifiers are the
mode for 2-dimensional hyperplanes, no
100% accurate hyperspheres exist…
• …as predicted by theorems 1 and 2
Enumeration of Classifiers
• For 4-d hyperplane, 75% accurate classifiers are the
mode
• ~25% of all classifiers are 100% accurate
• Could help explain Tim’s result* on effectiveness of
selection and reinforcement of randomly generated
rules (i.e. no GA rule exploration)?
*Kovacs & Kerber. GECCO 2004, LNCS 3103, 785-796
XCSphere
• Extended an existing XCS
implementation to use hyperspheres
instead of hyperplanes
– Restrict to binary alphabet instead of
ternary
– Hamming distance < radius matching rule
– Generalisation of hyperspheres
• Proper superset condition easy to evaluate
Evaluation
XCSphere
XCS
• Results on 11-multiplexer:
Comparing Classifier
Performance on Hypersphere
Function
Hypersphere Function
• We decided a new function, whose
most efficient representation is with
hyperspheres
– Given a boolean function of odd length,
assign class 0 to all instances closest to al
0s string, and class 1 to all other instances
Evaluation
XCSphere
XCS
• Results on hypersphere function:
XCSphere: Multiple
Representation XCS
Competing Representations
• Competition between overlapping classifiers
is intense in XCS
• We can use this to implement a hybrid XCS
with hyperplane and hypersphere classifiers
• Seed initial population with 50% of each,
similarly during covering
• Sphere and plane classifiers can’t recombine,
hence are like different species
Evaluation
Hypersphere function
Multiplexer
• Results for XCSphere:
XCSphere Results
• XCSphere achieved performance
generally better, across all three
problems, than specific XCS versions
• XCSphere slower to converge on
multiplexer than XCS with hyperplanes
• …but, weak evidence that XCSphere
faster to converge on sphere function
than XCS with hyperspheres
Summary
• Hybrid representations in a single
classifier systems:
– A useful way to mitigate representational
bias?
– Possibility of evolving representations?