Inducing Partially-Defined Instances with Evolutionary Algorithms

Inducing Partially-Defined Instances with Evolutionary Algorithms
Xavier Llorà
[email protected]
Josep M. Garrell
[email protected]
Research Group in Intelligent Systems. Enginyeria i Arquitectura La Salle. Ramon Llull University. Psg.
Bonanova 8, 08022, Barcelona, Catalonia, Spain.
Abstract
This paper addresses the issue of reducing
the storage requirements on Instance-Based
Learning algorithms. Algorithms proposed
by other researches use heuristics to prune
instances of the training set or modify the instances themselves to achieve a reduced set of
instances. Our work presents an alternative
way. We propose to induce a reduced set of
partially-defined instances with Evolutionary
Algorithms. Experiments were performed
with GALE, our fine-grained parallel Evolutionary Algorithm, and other well-known reduction techniques on several datasets. Results suggest that Evolutionary Algorithms
are competitive and robust for inducing sets
of partially-defined instances, achieving better reduction rates in storage requirements
without losses in generalization accuracy.
1. Introduction
Instance-based learning (IBL) (Aha et al., 1991) has
been successfully used on many pattern classification
problems. Its classification capabilities rely on the
Nearest Neighbor Algorithm (NNA) (Covert & Hart,
1967). NNA uses a stored set T of training instances to
classify new incoming patterns. Each stored instance
has an input vector, with one value for each of several
input attributes, and an output class. A new pattern
x is classified by finding the instance y in T that is
closest to x, and using the output class of y.
IBL has some strengths and weaknesses. It can easily define non-axis-parallel frontiers in the space of
instances. It is also well suited for handling exceptions. However, IBL can be very vulnerable to noise
and irrelevant attributes (Domingos, 1995), storing a
large number of instances into the T set. Some algorithms have been proposed for solving these weak-
nesses. For instance, k-NNA reduces the effect of noise
classifying the incoming pattern using the majority
class of the k closest instances. It can be combined
with weighting algorithms looking for representative
attributes (Wettschereck & Aha, 1995; Aha, 1998).
The size of T can also be reduced using algorithms for
instance pruning, like IB3 (Aha et al., 1991) or DROP
(Wilson & Martinez, 2000). Nevertheless, empirical
studies show that all these algorithms work well in
some, but not all, domains; this has been termed the
selective superiority problem (Brodley, 1993).
We present an Evolutionary Algorithm (EA) (Holland,
1975; Goldberg, 1989; Koza, 1992; Fogel, 1992) to address some of IBL weaknesses in a new way. Instead
of pruning instances from the T set, or combining instances somehow to reduce the storage requirements,
we propose to induce partially-defined instances that
form a small T set. EAs have proved to be robust algorithms for search, optimization and machine learning. Therefore, we use EAs robustness to induce a
small T set that stores few instances without a significant degradation in generalization accuracy. This
evolutionary-driven algorithm, named GALE, induces
a T set of partially-defined instances, discarding irrelevant attributes of the contained instances. Therefore,
GALE neither prunes instances nor combine instances,
it induces them using artificial evolution. This algorithm is tested on ten different domains, showing that
it can reduce the size of the stored set of instances
while maintaining or even improving the generalization accuracy.
The rest of the paper is structured as follows. Section
2 presents a brief overview of methods for instance set
reduction. A brief review of some related work done
in EA community is presented in section 3. In this
section we also present GALE, an evolutionary-driven
algorithm for inducing small instance sets based on
fine-grained parallel EAs. Section 4 describes experiments on ten datasets, comparing the performance of
GALE with well-known reduction techniques like IB
(Aha et al., 1991) and DROP (Wilson & Martinez,
2000), using IB1 (NNA) as a baseline performance.
Finally, section 5 summarizes some conclusions and
further work.
2. Related Work
The reduction of the size of the training set T used
by IBL algorithms has been mainly addressed in two
different ways. The first one looks for heuristics that
guide a pruning process on the original set of instances.
Some instances of T are removed, obtaining a new
training set S, S ⊆ T , that looks for maintaining
the generalization performance while it reduces the
storage requirements. Other approaches reduce the
storage requirements and speed up classification by
modifying the instances themselves, instead of just
deciding which ones to keep out. Some representative algorithms of the pruning approach are IB (Aha
et al., 1991; Aha, 1992) and DROP (Wilson & Martinez, 1997; Wilson & Martinez, 2000). On the other
hand, RISE 2.0 (Domingos, 1995) and EACH, based
on the nested generalized exemplar theory (Salzberg,
1991; Wettschereck, 1994; Wettschereck & Dietterich,
1995), show how the instances in T can be modified.
IB1 is a simple implementation of NNA, while IB2
is similar to the Condensed Nearest Neighbor (CNN)
rule (Hart, 1968). CNN starts selecting, at random,
one instance for each class in the training set T and
putting them in S. Then, each instance in T is classified using only the instances in S. If an instance is
misclassified, it is added to S, to avoid further misclassification. IB2 usually retains noisy examples because
they are not classified correctly. IB3 uses statistical
tests to retain only acceptable misclassified instances.
In order to handle irrelevant attributes, IB4 and IB5
extends IB3 by computing a set of attribute weights
for each class. Another example of instance pruning is
DROP that defines its pruning heuristics in terms of
associates. Given an instance y, its associates are those
instances that have y as one of their k nearest neighbors. DROP1 removes an instance from S (S = T
originally) if at least as many of its associates in S
would be classified correctly without it. This heuristic
has some problems with noisy instances that DROP2
tries to solve removing an instance from S if at least
as many of its associates in T would be classified correctly without it. DROP3 and DROP4 refine the noisy
instances removal heuristic. Finally DROP5 smoothes
the decision boundary proposed by DROP2.
RISE 2.0 does not attempt to prune instances of T . It
treats each instance in T as a rule in R. For each rule
r ∈ R, the nearest instance y in T of the same class
as r, not covered by r, is found. The rule r is then
minimally generalized to cover y, unless that harms
accuracy. This process is repeated until no rules are
generalized during an entire pass through all the rules
in R. Another way to achieve a reduction in the size of
storage is proposed by EACH, where hyper-rectangles
are used to replace of one or more instances. Using this
idea, S becomes a set formed by instances and hyperrectangles (generalized instances). EACH initializes S
with several instances randomly selected. Then, for
any given pattern x, EACH finds the distance to the
nearest exemplar (i.e an instance or hyper-rectangle),
which is 0 if it is inside a hyper-rectangle. When the
pattern has the same class as its exemplar, the exemplar is generalized, otherwise it is specialized.
3. GALE
Genetic and Artificial Life Environment (GALE) explores an alternative path for reducing the storage requirements of IBL algorithms. It neither defines any
heuristic to prune instances of T nor modifies the instances themselves to achieve a reduced S set. GALE
uses the T set to induce a S set formed by partiallydefined instances. The induction algorithm explored
by GALE is based on EA, being an evolutionary-driven
induction approach. In this section, we briefly review some machine learning efforts of EA community.
Next, we present some definitions about the knowledge representation used (partially-defined instances).
The section concludes describing the EA proposed for
inducing sets of partially-defined instances.
3.1 Evolutionary Algorithms and Machine
Learning
Genetic Algorithms (GA) can induce rules using artificial evolution (Holland, 1975). First approaches used
GA as learning algorithms in parallel rule-based systems (Holland, 1986), starting a EA discipline known
as Classifier Systems (CS). Currently, XCS (Wilson,
1995; Wilson, 1999; Wilson, 2000) is the state-of-theart in CS. Its main contribution, among others, is the
measure of the performance (accuracy) of the evolved
rules along the evolutionary-driven induction. There
are other ways to induce rules using GAs. GABL, and
its incremental version GABIL, (De Jong & Spears,
1991; Spears & De Jong, 1993) spark an alternative
approach, known as Learning Systems (LS). LS usually induce rule sets using GA with genetic operators
inspired on machine learning. Some examples of this
algorithms are GIL (Janikow, 1993) and GA-MINER
(Flockhart, 1995), but these algorithms tend to be
time-consuming and some authors have proposed to
exploit the parallel behavior of evolution to reduce the
time required for training (Flockhart, 1995; Freitas,
1999; Araujo et al., 2000).
Some changes are introduced into the nearest neighbor algorithm to work with partially-defined instances.
These modifications rely on the distance function used.
EA can also induce decision trees using both GA and
Genetic Programming (GP). GA have been used to
minimize the impurity of the split defined by hyperplanes in oblique decision trees (Cantú-Paz & Kamath,
2000). On the other hand, GP can easily evolve decision trees (Koza, 1992; Bot, 2000), or function trees for
improving k-NNA classifiers (Ahluwaly & Bull, 1999).
Definition 5 (Partially-defined distance) Given
an input vector v and a partially-defined instance p,
the partially-defined distance function γ is:
3.2 Partially-Defined Instances
where dist computes the distance of two values for a
given attribute. If the attribute a is numeric then dist
is computed as dist(va , pa ) = va − pa . On the other
hand, if the attribute a is categorical dist(va , pa ) is
equal to 1 if va = pa , and 0 otherwise. max is the maximal distance for the given attribute, if the attribute is
numeric, or 1 if it is categorical.
GALE is an EA that induces a set of partially-defined
instances. Throughout, when we speak about “inducing instances” we mean that GALE produces a set
of partially-defined instances, S, using an evolutionary process. Computed instances of S are neither a
subset of T nor a combination of instances in T (i.e
hyper-rectangles). In order to avoid repeating lengthy
definitions, some notation is introduced through the
rest of this subsection.
Definition 1 (Instance) An instance x is an input
vector, with one value for each of several input attributes (xai ), and an output class (xχ ).
Definition 2 (Partially-defined instance) A
partially-defined instance p is an input vector, with
at least one known value of several input attributes
(pai ), and an output class (pχ ).
For instance, given a dataset with two attributes (a1
and a2 ), the following instances are partially-defined:
p1 :((a1 1.0) (a2 3.3) (χ α))
p2 :((a1 1.0) (χ β))
p3 :((a2 3.3) (χ α))
Definition 3 (Set of used attributes) Given
a
partially-defined instance p, the set of used attributes
π(p) is formed by the input attributes that have a
known value. |π(p)| is the number of used attributes.
The sets of used attributes for the previous partiallydefined instances are: π (p1 ) = {a1 , a2 }, π (p2 ) = {a1 },
and π (p3 ) = {a2 }. Therefore, the number of used attributes for each example is: |π(p1 )| = 2, |π(p2 )| = 1,
and |π(p3 )| = 1.
Definition 4 (Set of partially-defined instances)
Γ is a set of partially-defined instances if it contains,
at least, one partially-defined instance (Γ 6= ∅).
v
u
u
γ (v, p) = t
X dist(va , pa ) 2
1
|π(p)|
max(a)
a∈π(p)
NNA uses γ to compute distances. Throughout, we
use 1-NNA, leaving the performance of GALE using
k-NNA for further work.
3.3 Induction Algorithm
GALE uses a fine-grained parallel EA to obtain the set
of partially-defined instances S. EAs usually evolve
a population of individuals (set of feasible solutions)
that fit the environment where they are placed (solve
efficiently a given problem). The survival of the fittest
and their genetic material recombination model the
artificial evolution of GALE. Detailed descriptions of
EAs can be found in (Holland, 1975; Goldberg, 1989).
Each individual in the population of GALE is a feasible
set of partially-defined instances Γ. The population is
placed on a 2D grid. Every cell of the grid contains
up to one individual that has a fitness measure of its
degree of adaptation (accuracy solving the problem).
2
The fitness measure chosen is ct ; c is the number of
instances in T that are classified correctly by the individual (Γ), and t is the number of instances in T . This
fitness measure provides a non-linear bias toward correctly classifying instances in T while providing differential reward for imperfect Γ sets (De Jong & Spears,
1991). Each individual codifies Γ into its genotype (genetic representation of Γ set that undergoes evolution)
arranging the partially-defined instances into an array,
like GABL (Spears & De Jong, 1993) does.
GALE spreads the population over a 2D grid in order
to exploit massive parallelism and locality relations.
Every cell in the grid computes in parallel the same
algorithm, that summarizes as follows:
FOR-EACH cell C in Grid
DO
initialize the cell C
evaluate the accuracy of individual in C
REPEAT
merge among neighborhood(C)
split individual in C
evaluate the accuracy of individual in C
survival among neighborhood(C)
UNTIL <end-criterion>
DONE
The initialization phase decides if the cell contains
an individual, or if it remains empty. Empirical experiments show that a 70-80% of occupied cells is a
good initialization rate. The individuals of the population, which have at least one partially-defined instance per class, are built at random. Thus, the set
of partially-defined instances Γ contained in an individual has nothing in common with the instances in
T . Next, if the cell contains an individual, C computes
the fitness of the individual using the fitness function
explained before. At this point, the cell is ready to
enter the evolutionary process.
Merge, split, and survival are the phases of the evolutionary cycle that improves the accuracy of individuals. This cycle stops when an individual classifies correctly all the training instances in T , or when
a certain amount of evolutionary iterations are done.
Merge and split modify the sets of partially-defined instances (individual), whereas survival implements the
survival of the fittest, removing poorly adapted individuals from the grid and pruning unused partiallydefined instances.
Merge recombines the individual in C with and individual chosen at random among the neighborhood
of the cell, with a given pm probability. Thus, merge
is a two-step process: (1) chooses a mate among the
neighborhood, and (2) recombines the genetic material obtaining just one offspring. The neighborhood
of cell C is the set of its adjacent cells. Therefore,
neighborhood(C) are the eight cells that surrounds C.
The neighborhood topology is beyond the scope of
this paper. Next, merge recombines the genetic material using the two point crossover (De Jong & Spears,
1991). Crossover can occur anywhere (i.e. both on
instances boundaries and within partially-defined instances). The only requirement is that the corresponding crossover points on the two parents are valid. That
is, if one parent is being cut on an instance boundary, then the other parent must be also cut on a rule
boundary. Similarly, if one parent is being cut whitin
a partially-defined instance, then the other parent is
being cut in a similar spot. For instance, given two
individuals and their valid cut points
ind1 : [((a1 3.9) (a2 9.3) | (χ α)) | ((a1 6.9) (χ λ))]
ind2 : [((a1 1.0) | (χ β)) ((a1 1.0) (a2 3.3) (χ δ)) | ]
crossover produces a new individual recombining at
random the pieces of genetic material provided by its
parents.
indc :[((a1 3.9) (a2 9.3) | (χ β))
((a1 1.0) (a2 3.3) (χ δ)) |
((a1 6.9) (χ λ))]
Finally, the offspring (indc ) obtained after crossover
replaces the individual in C.
Split reproduces and mutates the individual in C, if
the probability defined as ps · f itness(indc ) is satisfied. The split rate is proportional to the performance
of the individual. Thus, adapted individuals expand
their genetic material rapidly. The splitted individual is placed in the cell with higher number of neighbors (occupied cells), or if all cells in the neighborhood are occupied, it is placed in the cell that contains
the worst individual. This technique biases the evolution towards the emergence of useful subpopulations or
demes, balancing the survival pressure introduced by
the survival phase. The mutation is done by changing some values of the genotype randomly (De Jong
& Spears, 1991). New attributes can be added, or
dropped, to partially-defined instances, as well as some
attribute values can be changed (including the class
tag χ). For instance, the individual obtained after
merge phase can be mutated as follows:
ind?c :[((a1 3.9) (χ β))
((a1 8.1) (a2 3.3) (χ δ))
((a1 6.9) (a2 7.2) (χ α))]
Survival decides if an individual is kept in the cell
for the next cycle. It also removes from individuals
those partially-defined instances that are never used
classifying the instances in T . Survival uses the number of neighbors in order to decide if the individual is
removed form the cell.
0-1 Neighbors: these configurations show a quasiisolated subpopulation without genetic diversity,
thus, there are few chances to improve genetic
the material exchanging it with the neighborhood.
The survival of the individual is proportional to
its performance, psr (indc ) = f itness(indc ), leading to extreme survival pressures. If the individual is removed, the cell becomes empty.
2-6 Neighbors: the subpopulations change rapidly,
exchanging a great deal of genetic material. The
individual is removed, leaving the cell empty, if
f itness(indc ) < µnei + ksr × σnei ; µnei is the average fitness value of the occupied neighbor cells,
and σnei their standard deviation. The parameter
ksr controls the survival pressure on the current
cell, as well as the convergence of GALE.
7-8 Neighbors: this is a crowded situation, without
space it the grid available, extreme survival pressure is applied to the individual in C. The individual of the cell is always replaced by its best neighbor, thus psr (indc ) = 0. Replacement introduces
a distributed approach to elitism, a selection technique usually used in many EAs (Goldberg, 1989).
GALE is a fine-grained parallel EA that scales up linearly to the number of dimensions and instances in T .
Detailed discussions can be found in (Llorà & Garrell,
2001). Theoretical analysis of speedup equations show
that further work should deeply study parallel implementations of this EA.
4. Experimental Results
The test suit designed for evaluating the classification
performance of GALE consists of several datasets and
machine learning algorithms. This fact let us study the
performance of GALE using statistical tools, like stratified ten-fold cross-validation runs and paired t-tests on
these runs (Witten & Frank, 2000). We also study the
storage size required once the instance pruning techniques were run. Time performance of GALE as a
parallel processing algorithm is beyond of the scope of
this paper, being part of the further work.
4.1 Datasets
In order to evaluate the performances of GALE on
different domains, we performed experiments on ten
datasets. Datasets can be grouped up to three different kinds of datasets: artificial, public and private.
We used two artificial datasets to tune GALE, because
we knew their solutions in advance. MX11 is the eleven
input multiplexer, widely used by CS community (Wilson, 1995). TAO is a dataset obtained from sampling
the TAO figure using a grid. The grid used is 2D ranging from [-6,6] centimeters with 4 instances per centimeter. The class of each instance is the color (black
or white) of the sample in the TAO figure. This grid
has nothing in common with the grid used by GALE,
it is just used for sampling the TAO figure.
Public datasets are obtained from UCI repository
(Merz & Murphy, 1996). We chose six datasets
from this repository:
breast-w, glass, iris,
pima-indians, sonar, and vehicle. These datasets
contain categorical and numeric attributes, as well as
binary and n-ary classification tasks. Finally, we also
used two private datasets from our own repository.
They deal with diagnosis of breast cancer, biopsies
(Martı́nez & Santamarı́a, 1996), and mammograms
(Martı́ et al., 1998). Biopsies is the result of digitally processing biopsy images, whereas mammograms
uses mammographic images.
4.2 Classifier Schemes
We also run several non-evolutionary classifier schemes
on the previous datasets.
We compare the results obtained by GALE to the ones obtained by
IB1, IB2, IB3 and IB4 (Aha et al., 1991), using the code provided by David Aha’s home page
[http://www.aic.nrl.navy.mil/∼ aha]. We chose IB1 as
a performance baseline, while IB2, IB3 and IB4 let
us study storage requirements. We also run DROP
algorithms, provided on-line by Wilson and Ramirez
[ftp://axon.cs.byu.edu/pub/randy/ml/drop]. Space
does not permit the inclusion of all these results, but
they are briefly discussed in the following subsection.
4.3 Results
GALE was run using the same parameters for all
datasets. The grid was 64×64 with a 80% of occupied cells after initialization. Merge and split probabilities were set to pm = 0.4 and ps = 0.01, and
ksr = −0.25. The maximum number of iterations was
150. These values have shown a good balance between
exploration and exploitation along the evolutionary
process (Llorà & Garrell, 2001). The non-evolutionary
classifier schemes used their default configurations, using k = 1 for a fair comparison with GALE distance
function.
Table 1 shows the percentage of correct classifications,
averaged over stratified ten-fold cross-validation runs,
with their corresponding standard deviations. The
same folds were used for each scheme. This table also
marks the results of non-evolutionary schemes with a ◦
if they show a significant improvement over the corresponding results for GALE, and with a • if they show
a significant degradation. Throughout, we speak of results being “significantly different” if the difference is
statistically significant at the 1% level according to a
paired two-sided t-test, each pair of data points consisting of the estimates obtained in a stratified ten-fold
cross-validation run for the two learning schemes being compared. Table 2 lists the mean percentage of
Table 1. Experimental results: percentage of correct classifications and standard deviation from stratified ten-fold crossvalidation runs. Results are also marked with a ◦ if they show a significant improvement (1% significant level on paired
two-sided t-test) over the corresponding results for GALE, and with a • if they show a significant degradation.
Dataset
Biopsies
Breast-w
Glass
Iris
Mammograms
MX11
Pima-indians
Sonar
TAO
Vehicle
Average
IB1
82.78±3.49
95.99±1.45
66.38±10.88
95.30±3.29
64.36±14.06
99.85±0.24
70.42±3.75•
83.66±9.60
95.99±1.45
69.63±4.91
82.44
IB2
75.74±2.80•
91.85±3.93•
62.62±10.65
93.98±3.78
66.20±11.22
87.06±2.62•
64.30±4.28•
80.77±12.85
91.86±3.93
65.49±3.01
77.99
IB3
78.48±6.16
94.44±2.58
65.41±10.48
91.33±6.31
60.19±11.80
81.59±3.47•
66.91±7.10•
61.53±10.94•
94.99±2.85
63.25±5.25
75.81
IB4
76.43±5.97•
94.85±3.18
66.35±9.12
96.66±4.71
60.17±9.05
81.34±3.76•
70.82±5.14
63.47±11.28•
94.85±3.17
63.71±3.87
76.86
GALE
83.64±4.71
94.99±2.50
64.95±9.38
95.33±3.22
66.66±11.55
100.00±0.0
75.39±4.21
81.73±9.94
95.50±1.03
68.79±3.78
82.67
Table 2. Experimental results: mean storage size (in %) and standard deviation for cross-validation runs.
Dataset
Biopsies
Breast-w
Glass
Iris
Mammograms
MX11
Pima-indians
Sonar
TAO
Vehicle
Average
IB1
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0±0.0
100.0
IB2
26.61±0.68
8.19±0.29
42.99±1.77
9.85±0.60
42.28±2.47
18.99±0.65
36.02±0.97
27.30±1.37
3.03±0.11
39.93±1.29
25.52
stored instances, as well as standard deviation, for the
compared algorithms. Finally, table 3 summarizes the
performance of the different methods compared with
each other. Numbers indicate how often the method
in the row significantly outperforms the method in the
column.
GALE maintains, and slightly improves, the classification accuracy of NNA (IB1) at the same time that
reduces its storage requirements. IB2, IB3 and IB4
also reduce the size of S, but they have a mean fall
of 5.5% in generalization accuracy. The mean storage required for non-evolutionary techniques (size of
S) is about a 17%, in the best case, of the instances of
T . Using partially defined instances, induced through
artificial evolution, GALE only needs a 4% of the number of instances in T . Finally, table 3 shows that
GALE is not overcome significantly by any of the nonevolutionary reduction techniques, while GALE even
clearly beats IB1 in pima-indians dataset. These re-
IB3
13.62±0.97
2.68±0.75
44.34±1.43
11.26±1.25
14.30±4.66
15.76±1.06
15.62±0.92
22.70±2.02
0.99±0.27
33.36±2.03
17.46
IB4
12.82±0.73
2.65±0.62
39.40±2.31
12.00±1.15
21.55±2.60
15.84±0.72
15.02±1.12
22.92±1.68
0.98±0.23
31.66±0.97
17.48
GALE
2.67±0.92
3.30±0.83
7.18±1.82
3.84±1.21
7.25±2.17
0.80±0.17
2.77±1.10
10.37±3.16
2.09±0.07
2.85±1.02
4.31
sults confirm the robustness of EAs across different
domains. DROP shows that generalization accuracy
(best DROP4, 79.98%) does not reach GALE. Storage
requirements of DROP are also worse, using up to a
21.55% of the T set, five times larger than GALE.
The partially-defined instances obtained along the
runs do not use all the available attributes. The
mean percentage of used attributes in the evolved
partially-defined instances is about 57.97%. Some relevant results are obtained in biopsies and vehicle
datasets. In biopsies dataset, GALE only uses a
10% and a 17.13% of the available attributes for benign and malign classes respectively. On the other
hand, vehicle dataset needs up to a 82.33% of the
attributes (Opel=81.41%, Saab=82.64%, bus=79.16%,
and van=86.1%) to achieve a competitive accuracy.
These results suggest that some post-processing technique of the evolved partially-defined instances might
be helpful identifying irrelevant attributes.
Table 3. Results of paired one-sided t-tests: number indicates how often method in row significantly outperforms method
in column. Table lists the results using p=0.05 and p=0.01.
IB1
IB2
IB3
IB4
GALE
one-tail paired t-test p=0.05
IB1 IB2 IB3 IB4 GALE
7
5
5
0
0
2
3
0
0
1
1
0
0
2
2
0
2
5
5
7
-
5. Conclusions and Further work
This paper presented a new way to reduce the storage requirements for IBL algorithms. Traditionally, instance reduction approaches tried to remove instances
from the training set T , obtaining a smaller subset
S ⊆ T , or modifying the instances themselves reducing
storage requirements. On the contrary, our approach
induces a small set of partially-defined instances, using
a parallel Evolutionary Algorithm.
Experiments were conducted on ten datasets. Results showed that GALE maintains, or even slightly
improves, the generalization accuracy of the NNA
(0.23%), outperforming the other reduction techniques. But the key point is that GALE needed, on
average, a 4.32% of the storage size (in number of
instances) of the training set T , instead of the 17%
(best size) retained by IB3. The results obtained with
GALE are encouraging, but further work is needed.
We are interested in proving the robustness of GALE
across different domains, using other representative
datasets. We are also working in reducing the amount
of time required by the evolutionary process. The serial implementation of GALE requires between 10-50
times more execution time than IB and DROP to induce the S set. Thus, exploiting parallel implementations of GALE is an important part of the further
work. Finally, we are planning to adapt GALE to
k-NN, and to introduce improved distance functions
(Wilson & Martinez, 2000). These functions should
be adapted to deal with partially-defined instances.
Acknowledgments
We would like to thank CIRIT and Fondo de Investigación Sanitaria (Instituto Carlos III) for their
support under grant numbers 1999FI-00719 and FIS00/0033-2, as well as Enginyeria i Arquitectura La
Salle for their support to our research group. We
would also like to thank D. Aha, D. R. Wilson and T.
R. Ramirez for providing their codes on-line. Finally,
we want to thank the anonymous reviewers for their
one-tail paired t-test p=0.01
IB1 IB2 IB3 IB4 GALE
6
4
5
0
0
2
2
0
0
1
0
0
0
2
1
0
1
4
4
4
-
useful comments that help us to improve this paper.
References
Aha, D. W. (1992). Tolerating noisy irrelevant and
novel attributes in instance-based learning algorithms. International Journal of Man-Machine
Studies, 36, 267–287.
Aha, D. W. (1998). Feature Extraction, Construction
and Selection: A Data Mining Perspective. H. Liu
& H. Motoda (Eds.), Norwell MA: Kluwer.
Aha, D. W., Kibler, D., & Albert, M. K. (1991).
Instance-based Learning Algorithms.
Machine
Learning, 6, 37–66.
Ahluwaly, M., & Bull, L. (1999). Coevolving Functions
in Genetic Programming: Classification using Knearest-neighbour. Genetic and Evolutionary Computation Conference (pp. 947–952). Morgan Kaufmann.
Araujo, D. L., Lopes, H. S., & Freitas, A. A. (2000).
Rule Discovery with a Parallel Genetic Algorithms.
Workshop on Data Mining with Evolutionary Computation held in GECCO2000 (pp. 89–92).
Bot, M. C. (2000). Improving Induction of Linear Classification Trees with Genetic Programming.
Genetic and Evolutionary Computation Conference
(pp. 403–410). Morgan Kaufmann.
Brodley, C. E. (1993). Addressing the seletive superiority problem: Automatic algorithm/model class
selection. Proceedings of the 10th International Conference on Machine Learning (pp. 17–24).
Cantú-Paz, E., & Kamath, C. (2000). Using Evolutionary Algorithms to Induce Oblique Decision
Trees. Genetic and Evolutionary Computation Conference (pp. 1053–1060). Morgan Kaufmann.
Covert, T. M., & Hart, P. E. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on
Computers, 13, 21–27.
De Jong, K. A., & Spears, W. M. (1991). Learning
Concept Classification Rules Using Genetic Algorithms. Proceedings of the 12th International Joint
Conference on Artificial Intelligence (pp. 651–656).
Morgan Kaufmann.
Martı́nez, E., & Santamarı́a, E. (1996). Morphological Analysis of Mammary Biopsy Images. 8th
Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer
Science and Telecommunications. (pp. 1067–1070).
Domingos, P. (1995). Rule Induction and Instancebased Learning: A Unified Approach. Proceedings of
the 14th International Joint Conference on Artificial
Intelligence (pp. 1226–1232). Morgan Kaufmann.
Merz, C. J., & Murphy, P. M. (1996).
UCI
Repository for Machine Learning Data-Bases
[http://www.ics.uci.edu/∼mlearn/MLRepository.html].
Irvine, CA: University of California, Department
of Information and Computer Science.
Flockhart, I. W. (1995). GA-MINER: Parallel Data
Mining with Hierarchical Genetic Algorithms (Final Report)Technical Report EPCC-AIKMS-GAMINER-REPORT 1.0). University of Edinburgh.
Salzberg, S. (1991). An Nearest-Hyperrectangle Learning Method. Machine Learning, 6, 277–309.
Fogel, D. B. (1992). Evolutionary Computation:
Toward a New Philosopy of Machine Intelligence.
IEEE Press.
Spears, W. M., & De Jong, K. A. (1993). Using Genetic Algorithms For Supervised Concept Learning.
Machine Learning, 13, 161–188.
Freitas, A. A. (1999). A genetic algorithm for generalized rule induction. Advances in Soft Computing
- Engineering Design and Manufacturing, 340–353.
Wettschereck, D. (1994). A hybrid Nearest-Neighbor
and Nearest-Hyperrectangle Algorithm. Proceedings
of the 7th European Conference on Machine Learning, LNAI (pp. 323–335).
Goldberg, D. E. (1989). Genetic Algorithms in Search,
Optimization and Machine Learning. AddisonWesley Publishing Company Inc.
Hart, P. E. (1968). The Condensed Nearest Neighbor
Rule. IEEE Transactions on Information Theory,
14, 515–516.
Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press/Bradford Books edition.
Holland, J. H. (1986). Escaping Brittleness: The Possibilities of General Purpose Learning Algorithms
applied to Parallel Rule-Based Systems. Machine
Learning: An Artificial Intelligence Approach, II,
593–623.
Janikow, C. W. (1993). A knowledge-intensive genetic
algorithm for supervised learning. Machine Learning, 13, 189–228.
Koza, J. R. (1992). Genetic Programing: On the Programing of Computers by Means of Natural Selection
(Complex Adaptive Systems). MIT Press.
Llorà, X., & Garrell, J. M. (2001). KnowledgeIndependent Data Mining with Fine-Grained Parallel Evoluationary Algorithms. Genetic and Evolutionary Computation Conference (to appear). Morgan Kaufmann.
Martı́, J., Cufı́, X., & Regincós, J. (1998). Shapebased feature selection for microcalcification evaluation. Proceedings of the SPIE Medical Imaging
Conference on Image Processing (pp. 1215–1224).
Wettschereck, D., & Aha, D. W. (1995). Weighting
features. First International Conference on CaseBased Reasoning (pp. 347–358). Springer Verlag,
Lisbon-Portugal.
Wettschereck, D., & Dietterich, T. G. (1995). An
Experimental Comparasion of the Nearest-Neighbor
and Nearest-Hyperrectangle Algorithms. Machine
Learning, 38, 5–28.
Wilson, R. D., & Martinez, T. R. (1997). Instance
Pruning Thecniques. Proceedings of the 14th International Conference on Machine Learning (pp. 403–
411). Morgan Kaufmann.
Wilson, R. D., & Martinez, T. R. (2000). Reduction
Techniques for Instance-based Learning Algorithms.
Machine Learning, 38, 257–286.
Wilson, S. W. (1995). Classifier Fitness Based on Accuracy. Evolutionary Computation, 3, 149–175.
Wilson, S. W. (1999). Get Real! XCS with continousvalued inputs. Festschrift in Honor of John H. Holland (pp. 111–121). Center for the Study of Complex
Systems, University of Michigan, Ann Arbor, MI.
Wilson, S. W. (2000). Mining Oblique Data with XCS.
IlliGAL Report No. 2000028.
Witten, I. H., & Frank, E. (2000). Data Mining:
practical machine learning tools and techniques with
Java implementations. Morgan Kaufmann.