Inducing Partially-Defined Instances with Evolutionary Algorithms Xavier Llorà [email protected] Josep M. Garrell [email protected] Research Group in Intelligent Systems. Enginyeria i Arquitectura La Salle. Ramon Llull University. Psg. Bonanova 8, 08022, Barcelona, Catalonia, Spain. Abstract This paper addresses the issue of reducing the storage requirements on Instance-Based Learning algorithms. Algorithms proposed by other researches use heuristics to prune instances of the training set or modify the instances themselves to achieve a reduced set of instances. Our work presents an alternative way. We propose to induce a reduced set of partially-defined instances with Evolutionary Algorithms. Experiments were performed with GALE, our fine-grained parallel Evolutionary Algorithm, and other well-known reduction techniques on several datasets. Results suggest that Evolutionary Algorithms are competitive and robust for inducing sets of partially-defined instances, achieving better reduction rates in storage requirements without losses in generalization accuracy. 1. Introduction Instance-based learning (IBL) (Aha et al., 1991) has been successfully used on many pattern classification problems. Its classification capabilities rely on the Nearest Neighbor Algorithm (NNA) (Covert & Hart, 1967). NNA uses a stored set T of training instances to classify new incoming patterns. Each stored instance has an input vector, with one value for each of several input attributes, and an output class. A new pattern x is classified by finding the instance y in T that is closest to x, and using the output class of y. IBL has some strengths and weaknesses. It can easily define non-axis-parallel frontiers in the space of instances. It is also well suited for handling exceptions. However, IBL can be very vulnerable to noise and irrelevant attributes (Domingos, 1995), storing a large number of instances into the T set. Some algorithms have been proposed for solving these weak- nesses. For instance, k-NNA reduces the effect of noise classifying the incoming pattern using the majority class of the k closest instances. It can be combined with weighting algorithms looking for representative attributes (Wettschereck & Aha, 1995; Aha, 1998). The size of T can also be reduced using algorithms for instance pruning, like IB3 (Aha et al., 1991) or DROP (Wilson & Martinez, 2000). Nevertheless, empirical studies show that all these algorithms work well in some, but not all, domains; this has been termed the selective superiority problem (Brodley, 1993). We present an Evolutionary Algorithm (EA) (Holland, 1975; Goldberg, 1989; Koza, 1992; Fogel, 1992) to address some of IBL weaknesses in a new way. Instead of pruning instances from the T set, or combining instances somehow to reduce the storage requirements, we propose to induce partially-defined instances that form a small T set. EAs have proved to be robust algorithms for search, optimization and machine learning. Therefore, we use EAs robustness to induce a small T set that stores few instances without a significant degradation in generalization accuracy. This evolutionary-driven algorithm, named GALE, induces a T set of partially-defined instances, discarding irrelevant attributes of the contained instances. Therefore, GALE neither prunes instances nor combine instances, it induces them using artificial evolution. This algorithm is tested on ten different domains, showing that it can reduce the size of the stored set of instances while maintaining or even improving the generalization accuracy. The rest of the paper is structured as follows. Section 2 presents a brief overview of methods for instance set reduction. A brief review of some related work done in EA community is presented in section 3. In this section we also present GALE, an evolutionary-driven algorithm for inducing small instance sets based on fine-grained parallel EAs. Section 4 describes experiments on ten datasets, comparing the performance of GALE with well-known reduction techniques like IB (Aha et al., 1991) and DROP (Wilson & Martinez, 2000), using IB1 (NNA) as a baseline performance. Finally, section 5 summarizes some conclusions and further work. 2. Related Work The reduction of the size of the training set T used by IBL algorithms has been mainly addressed in two different ways. The first one looks for heuristics that guide a pruning process on the original set of instances. Some instances of T are removed, obtaining a new training set S, S ⊆ T , that looks for maintaining the generalization performance while it reduces the storage requirements. Other approaches reduce the storage requirements and speed up classification by modifying the instances themselves, instead of just deciding which ones to keep out. Some representative algorithms of the pruning approach are IB (Aha et al., 1991; Aha, 1992) and DROP (Wilson & Martinez, 1997; Wilson & Martinez, 2000). On the other hand, RISE 2.0 (Domingos, 1995) and EACH, based on the nested generalized exemplar theory (Salzberg, 1991; Wettschereck, 1994; Wettschereck & Dietterich, 1995), show how the instances in T can be modified. IB1 is a simple implementation of NNA, while IB2 is similar to the Condensed Nearest Neighbor (CNN) rule (Hart, 1968). CNN starts selecting, at random, one instance for each class in the training set T and putting them in S. Then, each instance in T is classified using only the instances in S. If an instance is misclassified, it is added to S, to avoid further misclassification. IB2 usually retains noisy examples because they are not classified correctly. IB3 uses statistical tests to retain only acceptable misclassified instances. In order to handle irrelevant attributes, IB4 and IB5 extends IB3 by computing a set of attribute weights for each class. Another example of instance pruning is DROP that defines its pruning heuristics in terms of associates. Given an instance y, its associates are those instances that have y as one of their k nearest neighbors. DROP1 removes an instance from S (S = T originally) if at least as many of its associates in S would be classified correctly without it. This heuristic has some problems with noisy instances that DROP2 tries to solve removing an instance from S if at least as many of its associates in T would be classified correctly without it. DROP3 and DROP4 refine the noisy instances removal heuristic. Finally DROP5 smoothes the decision boundary proposed by DROP2. RISE 2.0 does not attempt to prune instances of T . It treats each instance in T as a rule in R. For each rule r ∈ R, the nearest instance y in T of the same class as r, not covered by r, is found. The rule r is then minimally generalized to cover y, unless that harms accuracy. This process is repeated until no rules are generalized during an entire pass through all the rules in R. Another way to achieve a reduction in the size of storage is proposed by EACH, where hyper-rectangles are used to replace of one or more instances. Using this idea, S becomes a set formed by instances and hyperrectangles (generalized instances). EACH initializes S with several instances randomly selected. Then, for any given pattern x, EACH finds the distance to the nearest exemplar (i.e an instance or hyper-rectangle), which is 0 if it is inside a hyper-rectangle. When the pattern has the same class as its exemplar, the exemplar is generalized, otherwise it is specialized. 3. GALE Genetic and Artificial Life Environment (GALE) explores an alternative path for reducing the storage requirements of IBL algorithms. It neither defines any heuristic to prune instances of T nor modifies the instances themselves to achieve a reduced S set. GALE uses the T set to induce a S set formed by partiallydefined instances. The induction algorithm explored by GALE is based on EA, being an evolutionary-driven induction approach. In this section, we briefly review some machine learning efforts of EA community. Next, we present some definitions about the knowledge representation used (partially-defined instances). The section concludes describing the EA proposed for inducing sets of partially-defined instances. 3.1 Evolutionary Algorithms and Machine Learning Genetic Algorithms (GA) can induce rules using artificial evolution (Holland, 1975). First approaches used GA as learning algorithms in parallel rule-based systems (Holland, 1986), starting a EA discipline known as Classifier Systems (CS). Currently, XCS (Wilson, 1995; Wilson, 1999; Wilson, 2000) is the state-of-theart in CS. Its main contribution, among others, is the measure of the performance (accuracy) of the evolved rules along the evolutionary-driven induction. There are other ways to induce rules using GAs. GABL, and its incremental version GABIL, (De Jong & Spears, 1991; Spears & De Jong, 1993) spark an alternative approach, known as Learning Systems (LS). LS usually induce rule sets using GA with genetic operators inspired on machine learning. Some examples of this algorithms are GIL (Janikow, 1993) and GA-MINER (Flockhart, 1995), but these algorithms tend to be time-consuming and some authors have proposed to exploit the parallel behavior of evolution to reduce the time required for training (Flockhart, 1995; Freitas, 1999; Araujo et al., 2000). Some changes are introduced into the nearest neighbor algorithm to work with partially-defined instances. These modifications rely on the distance function used. EA can also induce decision trees using both GA and Genetic Programming (GP). GA have been used to minimize the impurity of the split defined by hyperplanes in oblique decision trees (Cantú-Paz & Kamath, 2000). On the other hand, GP can easily evolve decision trees (Koza, 1992; Bot, 2000), or function trees for improving k-NNA classifiers (Ahluwaly & Bull, 1999). Definition 5 (Partially-defined distance) Given an input vector v and a partially-defined instance p, the partially-defined distance function γ is: 3.2 Partially-Defined Instances where dist computes the distance of two values for a given attribute. If the attribute a is numeric then dist is computed as dist(va , pa ) = va − pa . On the other hand, if the attribute a is categorical dist(va , pa ) is equal to 1 if va = pa , and 0 otherwise. max is the maximal distance for the given attribute, if the attribute is numeric, or 1 if it is categorical. GALE is an EA that induces a set of partially-defined instances. Throughout, when we speak about “inducing instances” we mean that GALE produces a set of partially-defined instances, S, using an evolutionary process. Computed instances of S are neither a subset of T nor a combination of instances in T (i.e hyper-rectangles). In order to avoid repeating lengthy definitions, some notation is introduced through the rest of this subsection. Definition 1 (Instance) An instance x is an input vector, with one value for each of several input attributes (xai ), and an output class (xχ ). Definition 2 (Partially-defined instance) A partially-defined instance p is an input vector, with at least one known value of several input attributes (pai ), and an output class (pχ ). For instance, given a dataset with two attributes (a1 and a2 ), the following instances are partially-defined: p1 :((a1 1.0) (a2 3.3) (χ α)) p2 :((a1 1.0) (χ β)) p3 :((a2 3.3) (χ α)) Definition 3 (Set of used attributes) Given a partially-defined instance p, the set of used attributes π(p) is formed by the input attributes that have a known value. |π(p)| is the number of used attributes. The sets of used attributes for the previous partiallydefined instances are: π (p1 ) = {a1 , a2 }, π (p2 ) = {a1 }, and π (p3 ) = {a2 }. Therefore, the number of used attributes for each example is: |π(p1 )| = 2, |π(p2 )| = 1, and |π(p3 )| = 1. Definition 4 (Set of partially-defined instances) Γ is a set of partially-defined instances if it contains, at least, one partially-defined instance (Γ 6= ∅). v u u γ (v, p) = t X dist(va , pa ) 2 1 |π(p)| max(a) a∈π(p) NNA uses γ to compute distances. Throughout, we use 1-NNA, leaving the performance of GALE using k-NNA for further work. 3.3 Induction Algorithm GALE uses a fine-grained parallel EA to obtain the set of partially-defined instances S. EAs usually evolve a population of individuals (set of feasible solutions) that fit the environment where they are placed (solve efficiently a given problem). The survival of the fittest and their genetic material recombination model the artificial evolution of GALE. Detailed descriptions of EAs can be found in (Holland, 1975; Goldberg, 1989). Each individual in the population of GALE is a feasible set of partially-defined instances Γ. The population is placed on a 2D grid. Every cell of the grid contains up to one individual that has a fitness measure of its degree of adaptation (accuracy solving the problem). 2 The fitness measure chosen is ct ; c is the number of instances in T that are classified correctly by the individual (Γ), and t is the number of instances in T . This fitness measure provides a non-linear bias toward correctly classifying instances in T while providing differential reward for imperfect Γ sets (De Jong & Spears, 1991). Each individual codifies Γ into its genotype (genetic representation of Γ set that undergoes evolution) arranging the partially-defined instances into an array, like GABL (Spears & De Jong, 1993) does. GALE spreads the population over a 2D grid in order to exploit massive parallelism and locality relations. Every cell in the grid computes in parallel the same algorithm, that summarizes as follows: FOR-EACH cell C in Grid DO initialize the cell C evaluate the accuracy of individual in C REPEAT merge among neighborhood(C) split individual in C evaluate the accuracy of individual in C survival among neighborhood(C) UNTIL <end-criterion> DONE The initialization phase decides if the cell contains an individual, or if it remains empty. Empirical experiments show that a 70-80% of occupied cells is a good initialization rate. The individuals of the population, which have at least one partially-defined instance per class, are built at random. Thus, the set of partially-defined instances Γ contained in an individual has nothing in common with the instances in T . Next, if the cell contains an individual, C computes the fitness of the individual using the fitness function explained before. At this point, the cell is ready to enter the evolutionary process. Merge, split, and survival are the phases of the evolutionary cycle that improves the accuracy of individuals. This cycle stops when an individual classifies correctly all the training instances in T , or when a certain amount of evolutionary iterations are done. Merge and split modify the sets of partially-defined instances (individual), whereas survival implements the survival of the fittest, removing poorly adapted individuals from the grid and pruning unused partiallydefined instances. Merge recombines the individual in C with and individual chosen at random among the neighborhood of the cell, with a given pm probability. Thus, merge is a two-step process: (1) chooses a mate among the neighborhood, and (2) recombines the genetic material obtaining just one offspring. The neighborhood of cell C is the set of its adjacent cells. Therefore, neighborhood(C) are the eight cells that surrounds C. The neighborhood topology is beyond the scope of this paper. Next, merge recombines the genetic material using the two point crossover (De Jong & Spears, 1991). Crossover can occur anywhere (i.e. both on instances boundaries and within partially-defined instances). The only requirement is that the corresponding crossover points on the two parents are valid. That is, if one parent is being cut on an instance boundary, then the other parent must be also cut on a rule boundary. Similarly, if one parent is being cut whitin a partially-defined instance, then the other parent is being cut in a similar spot. For instance, given two individuals and their valid cut points ind1 : [((a1 3.9) (a2 9.3) | (χ α)) | ((a1 6.9) (χ λ))] ind2 : [((a1 1.0) | (χ β)) ((a1 1.0) (a2 3.3) (χ δ)) | ] crossover produces a new individual recombining at random the pieces of genetic material provided by its parents. indc :[((a1 3.9) (a2 9.3) | (χ β)) ((a1 1.0) (a2 3.3) (χ δ)) | ((a1 6.9) (χ λ))] Finally, the offspring (indc ) obtained after crossover replaces the individual in C. Split reproduces and mutates the individual in C, if the probability defined as ps · f itness(indc ) is satisfied. The split rate is proportional to the performance of the individual. Thus, adapted individuals expand their genetic material rapidly. The splitted individual is placed in the cell with higher number of neighbors (occupied cells), or if all cells in the neighborhood are occupied, it is placed in the cell that contains the worst individual. This technique biases the evolution towards the emergence of useful subpopulations or demes, balancing the survival pressure introduced by the survival phase. The mutation is done by changing some values of the genotype randomly (De Jong & Spears, 1991). New attributes can be added, or dropped, to partially-defined instances, as well as some attribute values can be changed (including the class tag χ). For instance, the individual obtained after merge phase can be mutated as follows: ind?c :[((a1 3.9) (χ β)) ((a1 8.1) (a2 3.3) (χ δ)) ((a1 6.9) (a2 7.2) (χ α))] Survival decides if an individual is kept in the cell for the next cycle. It also removes from individuals those partially-defined instances that are never used classifying the instances in T . Survival uses the number of neighbors in order to decide if the individual is removed form the cell. 0-1 Neighbors: these configurations show a quasiisolated subpopulation without genetic diversity, thus, there are few chances to improve genetic the material exchanging it with the neighborhood. The survival of the individual is proportional to its performance, psr (indc ) = f itness(indc ), leading to extreme survival pressures. If the individual is removed, the cell becomes empty. 2-6 Neighbors: the subpopulations change rapidly, exchanging a great deal of genetic material. The individual is removed, leaving the cell empty, if f itness(indc ) < µnei + ksr × σnei ; µnei is the average fitness value of the occupied neighbor cells, and σnei their standard deviation. The parameter ksr controls the survival pressure on the current cell, as well as the convergence of GALE. 7-8 Neighbors: this is a crowded situation, without space it the grid available, extreme survival pressure is applied to the individual in C. The individual of the cell is always replaced by its best neighbor, thus psr (indc ) = 0. Replacement introduces a distributed approach to elitism, a selection technique usually used in many EAs (Goldberg, 1989). GALE is a fine-grained parallel EA that scales up linearly to the number of dimensions and instances in T . Detailed discussions can be found in (Llorà & Garrell, 2001). Theoretical analysis of speedup equations show that further work should deeply study parallel implementations of this EA. 4. Experimental Results The test suit designed for evaluating the classification performance of GALE consists of several datasets and machine learning algorithms. This fact let us study the performance of GALE using statistical tools, like stratified ten-fold cross-validation runs and paired t-tests on these runs (Witten & Frank, 2000). We also study the storage size required once the instance pruning techniques were run. Time performance of GALE as a parallel processing algorithm is beyond of the scope of this paper, being part of the further work. 4.1 Datasets In order to evaluate the performances of GALE on different domains, we performed experiments on ten datasets. Datasets can be grouped up to three different kinds of datasets: artificial, public and private. We used two artificial datasets to tune GALE, because we knew their solutions in advance. MX11 is the eleven input multiplexer, widely used by CS community (Wilson, 1995). TAO is a dataset obtained from sampling the TAO figure using a grid. The grid used is 2D ranging from [-6,6] centimeters with 4 instances per centimeter. The class of each instance is the color (black or white) of the sample in the TAO figure. This grid has nothing in common with the grid used by GALE, it is just used for sampling the TAO figure. Public datasets are obtained from UCI repository (Merz & Murphy, 1996). We chose six datasets from this repository: breast-w, glass, iris, pima-indians, sonar, and vehicle. These datasets contain categorical and numeric attributes, as well as binary and n-ary classification tasks. Finally, we also used two private datasets from our own repository. They deal with diagnosis of breast cancer, biopsies (Martı́nez & Santamarı́a, 1996), and mammograms (Martı́ et al., 1998). Biopsies is the result of digitally processing biopsy images, whereas mammograms uses mammographic images. 4.2 Classifier Schemes We also run several non-evolutionary classifier schemes on the previous datasets. We compare the results obtained by GALE to the ones obtained by IB1, IB2, IB3 and IB4 (Aha et al., 1991), using the code provided by David Aha’s home page [http://www.aic.nrl.navy.mil/∼ aha]. We chose IB1 as a performance baseline, while IB2, IB3 and IB4 let us study storage requirements. We also run DROP algorithms, provided on-line by Wilson and Ramirez [ftp://axon.cs.byu.edu/pub/randy/ml/drop]. Space does not permit the inclusion of all these results, but they are briefly discussed in the following subsection. 4.3 Results GALE was run using the same parameters for all datasets. The grid was 64×64 with a 80% of occupied cells after initialization. Merge and split probabilities were set to pm = 0.4 and ps = 0.01, and ksr = −0.25. The maximum number of iterations was 150. These values have shown a good balance between exploration and exploitation along the evolutionary process (Llorà & Garrell, 2001). The non-evolutionary classifier schemes used their default configurations, using k = 1 for a fair comparison with GALE distance function. Table 1 shows the percentage of correct classifications, averaged over stratified ten-fold cross-validation runs, with their corresponding standard deviations. The same folds were used for each scheme. This table also marks the results of non-evolutionary schemes with a ◦ if they show a significant improvement over the corresponding results for GALE, and with a • if they show a significant degradation. Throughout, we speak of results being “significantly different” if the difference is statistically significant at the 1% level according to a paired two-sided t-test, each pair of data points consisting of the estimates obtained in a stratified ten-fold cross-validation run for the two learning schemes being compared. Table 2 lists the mean percentage of Table 1. Experimental results: percentage of correct classifications and standard deviation from stratified ten-fold crossvalidation runs. Results are also marked with a ◦ if they show a significant improvement (1% significant level on paired two-sided t-test) over the corresponding results for GALE, and with a • if they show a significant degradation. Dataset Biopsies Breast-w Glass Iris Mammograms MX11 Pima-indians Sonar TAO Vehicle Average IB1 82.78±3.49 95.99±1.45 66.38±10.88 95.30±3.29 64.36±14.06 99.85±0.24 70.42±3.75• 83.66±9.60 95.99±1.45 69.63±4.91 82.44 IB2 75.74±2.80• 91.85±3.93• 62.62±10.65 93.98±3.78 66.20±11.22 87.06±2.62• 64.30±4.28• 80.77±12.85 91.86±3.93 65.49±3.01 77.99 IB3 78.48±6.16 94.44±2.58 65.41±10.48 91.33±6.31 60.19±11.80 81.59±3.47• 66.91±7.10• 61.53±10.94• 94.99±2.85 63.25±5.25 75.81 IB4 76.43±5.97• 94.85±3.18 66.35±9.12 96.66±4.71 60.17±9.05 81.34±3.76• 70.82±5.14 63.47±11.28• 94.85±3.17 63.71±3.87 76.86 GALE 83.64±4.71 94.99±2.50 64.95±9.38 95.33±3.22 66.66±11.55 100.00±0.0 75.39±4.21 81.73±9.94 95.50±1.03 68.79±3.78 82.67 Table 2. Experimental results: mean storage size (in %) and standard deviation for cross-validation runs. Dataset Biopsies Breast-w Glass Iris Mammograms MX11 Pima-indians Sonar TAO Vehicle Average IB1 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0±0.0 100.0 IB2 26.61±0.68 8.19±0.29 42.99±1.77 9.85±0.60 42.28±2.47 18.99±0.65 36.02±0.97 27.30±1.37 3.03±0.11 39.93±1.29 25.52 stored instances, as well as standard deviation, for the compared algorithms. Finally, table 3 summarizes the performance of the different methods compared with each other. Numbers indicate how often the method in the row significantly outperforms the method in the column. GALE maintains, and slightly improves, the classification accuracy of NNA (IB1) at the same time that reduces its storage requirements. IB2, IB3 and IB4 also reduce the size of S, but they have a mean fall of 5.5% in generalization accuracy. The mean storage required for non-evolutionary techniques (size of S) is about a 17%, in the best case, of the instances of T . Using partially defined instances, induced through artificial evolution, GALE only needs a 4% of the number of instances in T . Finally, table 3 shows that GALE is not overcome significantly by any of the nonevolutionary reduction techniques, while GALE even clearly beats IB1 in pima-indians dataset. These re- IB3 13.62±0.97 2.68±0.75 44.34±1.43 11.26±1.25 14.30±4.66 15.76±1.06 15.62±0.92 22.70±2.02 0.99±0.27 33.36±2.03 17.46 IB4 12.82±0.73 2.65±0.62 39.40±2.31 12.00±1.15 21.55±2.60 15.84±0.72 15.02±1.12 22.92±1.68 0.98±0.23 31.66±0.97 17.48 GALE 2.67±0.92 3.30±0.83 7.18±1.82 3.84±1.21 7.25±2.17 0.80±0.17 2.77±1.10 10.37±3.16 2.09±0.07 2.85±1.02 4.31 sults confirm the robustness of EAs across different domains. DROP shows that generalization accuracy (best DROP4, 79.98%) does not reach GALE. Storage requirements of DROP are also worse, using up to a 21.55% of the T set, five times larger than GALE. The partially-defined instances obtained along the runs do not use all the available attributes. The mean percentage of used attributes in the evolved partially-defined instances is about 57.97%. Some relevant results are obtained in biopsies and vehicle datasets. In biopsies dataset, GALE only uses a 10% and a 17.13% of the available attributes for benign and malign classes respectively. On the other hand, vehicle dataset needs up to a 82.33% of the attributes (Opel=81.41%, Saab=82.64%, bus=79.16%, and van=86.1%) to achieve a competitive accuracy. These results suggest that some post-processing technique of the evolved partially-defined instances might be helpful identifying irrelevant attributes. Table 3. Results of paired one-sided t-tests: number indicates how often method in row significantly outperforms method in column. Table lists the results using p=0.05 and p=0.01. IB1 IB2 IB3 IB4 GALE one-tail paired t-test p=0.05 IB1 IB2 IB3 IB4 GALE 7 5 5 0 0 2 3 0 0 1 1 0 0 2 2 0 2 5 5 7 - 5. Conclusions and Further work This paper presented a new way to reduce the storage requirements for IBL algorithms. Traditionally, instance reduction approaches tried to remove instances from the training set T , obtaining a smaller subset S ⊆ T , or modifying the instances themselves reducing storage requirements. On the contrary, our approach induces a small set of partially-defined instances, using a parallel Evolutionary Algorithm. Experiments were conducted on ten datasets. Results showed that GALE maintains, or even slightly improves, the generalization accuracy of the NNA (0.23%), outperforming the other reduction techniques. But the key point is that GALE needed, on average, a 4.32% of the storage size (in number of instances) of the training set T , instead of the 17% (best size) retained by IB3. The results obtained with GALE are encouraging, but further work is needed. We are interested in proving the robustness of GALE across different domains, using other representative datasets. We are also working in reducing the amount of time required by the evolutionary process. The serial implementation of GALE requires between 10-50 times more execution time than IB and DROP to induce the S set. Thus, exploiting parallel implementations of GALE is an important part of the further work. Finally, we are planning to adapt GALE to k-NN, and to introduce improved distance functions (Wilson & Martinez, 2000). These functions should be adapted to deal with partially-defined instances. Acknowledgments We would like to thank CIRIT and Fondo de Investigación Sanitaria (Instituto Carlos III) for their support under grant numbers 1999FI-00719 and FIS00/0033-2, as well as Enginyeria i Arquitectura La Salle for their support to our research group. We would also like to thank D. Aha, D. R. Wilson and T. R. Ramirez for providing their codes on-line. Finally, we want to thank the anonymous reviewers for their one-tail paired t-test p=0.01 IB1 IB2 IB3 IB4 GALE 6 4 5 0 0 2 2 0 0 1 0 0 0 2 1 0 1 4 4 4 - useful comments that help us to improve this paper. References Aha, D. W. (1992). Tolerating noisy irrelevant and novel attributes in instance-based learning algorithms. International Journal of Man-Machine Studies, 36, 267–287. Aha, D. W. (1998). Feature Extraction, Construction and Selection: A Data Mining Perspective. H. Liu & H. Motoda (Eds.), Norwell MA: Kluwer. Aha, D. W., Kibler, D., & Albert, M. K. (1991). Instance-based Learning Algorithms. Machine Learning, 6, 37–66. Ahluwaly, M., & Bull, L. (1999). Coevolving Functions in Genetic Programming: Classification using Knearest-neighbour. Genetic and Evolutionary Computation Conference (pp. 947–952). Morgan Kaufmann. Araujo, D. L., Lopes, H. S., & Freitas, A. A. (2000). Rule Discovery with a Parallel Genetic Algorithms. Workshop on Data Mining with Evolutionary Computation held in GECCO2000 (pp. 89–92). Bot, M. C. (2000). Improving Induction of Linear Classification Trees with Genetic Programming. Genetic and Evolutionary Computation Conference (pp. 403–410). Morgan Kaufmann. Brodley, C. E. (1993). Addressing the seletive superiority problem: Automatic algorithm/model class selection. Proceedings of the 10th International Conference on Machine Learning (pp. 17–24). Cantú-Paz, E., & Kamath, C. (2000). Using Evolutionary Algorithms to Induce Oblique Decision Trees. Genetic and Evolutionary Computation Conference (pp. 1053–1060). Morgan Kaufmann. Covert, T. M., & Hart, P. E. (1967). Nearest Neighbor Pattern Classification. IEEE Transactions on Computers, 13, 21–27. De Jong, K. A., & Spears, W. M. (1991). Learning Concept Classification Rules Using Genetic Algorithms. Proceedings of the 12th International Joint Conference on Artificial Intelligence (pp. 651–656). Morgan Kaufmann. Martı́nez, E., & Santamarı́a, E. (1996). Morphological Analysis of Mammary Biopsy Images. 8th Mediterranean Electrotechnical Conference on Industrial Applications in Power Systems, Computer Science and Telecommunications. (pp. 1067–1070). Domingos, P. (1995). Rule Induction and Instancebased Learning: A Unified Approach. Proceedings of the 14th International Joint Conference on Artificial Intelligence (pp. 1226–1232). Morgan Kaufmann. Merz, C. J., & Murphy, P. M. (1996). UCI Repository for Machine Learning Data-Bases [http://www.ics.uci.edu/∼mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science. Flockhart, I. W. (1995). GA-MINER: Parallel Data Mining with Hierarchical Genetic Algorithms (Final Report)Technical Report EPCC-AIKMS-GAMINER-REPORT 1.0). University of Edinburgh. Salzberg, S. (1991). An Nearest-Hyperrectangle Learning Method. Machine Learning, 6, 277–309. Fogel, D. B. (1992). Evolutionary Computation: Toward a New Philosopy of Machine Intelligence. IEEE Press. Spears, W. M., & De Jong, K. A. (1993). Using Genetic Algorithms For Supervised Concept Learning. Machine Learning, 13, 161–188. Freitas, A. A. (1999). A genetic algorithm for generalized rule induction. Advances in Soft Computing - Engineering Design and Manufacturing, 340–353. Wettschereck, D. (1994). A hybrid Nearest-Neighbor and Nearest-Hyperrectangle Algorithm. Proceedings of the 7th European Conference on Machine Learning, LNAI (pp. 323–335). Goldberg, D. E. (1989). Genetic Algorithms in Search, Optimization and Machine Learning. AddisonWesley Publishing Company Inc. Hart, P. E. (1968). The Condensed Nearest Neighbor Rule. IEEE Transactions on Information Theory, 14, 515–516. Holland, J. H. (1975). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control and Artificial Intelligence. MIT Press/Bradford Books edition. Holland, J. H. (1986). Escaping Brittleness: The Possibilities of General Purpose Learning Algorithms applied to Parallel Rule-Based Systems. Machine Learning: An Artificial Intelligence Approach, II, 593–623. Janikow, C. W. (1993). A knowledge-intensive genetic algorithm for supervised learning. Machine Learning, 13, 189–228. Koza, J. R. (1992). Genetic Programing: On the Programing of Computers by Means of Natural Selection (Complex Adaptive Systems). MIT Press. Llorà, X., & Garrell, J. M. (2001). KnowledgeIndependent Data Mining with Fine-Grained Parallel Evoluationary Algorithms. Genetic and Evolutionary Computation Conference (to appear). Morgan Kaufmann. Martı́, J., Cufı́, X., & Regincós, J. (1998). Shapebased feature selection for microcalcification evaluation. Proceedings of the SPIE Medical Imaging Conference on Image Processing (pp. 1215–1224). Wettschereck, D., & Aha, D. W. (1995). Weighting features. First International Conference on CaseBased Reasoning (pp. 347–358). Springer Verlag, Lisbon-Portugal. Wettschereck, D., & Dietterich, T. G. (1995). An Experimental Comparasion of the Nearest-Neighbor and Nearest-Hyperrectangle Algorithms. Machine Learning, 38, 5–28. Wilson, R. D., & Martinez, T. R. (1997). Instance Pruning Thecniques. Proceedings of the 14th International Conference on Machine Learning (pp. 403– 411). Morgan Kaufmann. Wilson, R. D., & Martinez, T. R. (2000). Reduction Techniques for Instance-based Learning Algorithms. Machine Learning, 38, 257–286. Wilson, S. W. (1995). Classifier Fitness Based on Accuracy. Evolutionary Computation, 3, 149–175. Wilson, S. W. (1999). Get Real! XCS with continousvalued inputs. Festschrift in Honor of John H. Holland (pp. 111–121). Center for the Study of Complex Systems, University of Michigan, Ann Arbor, MI. Wilson, S. W. (2000). Mining Oblique Data with XCS. IlliGAL Report No. 2000028. Witten, I. H., & Frank, E. (2000). Data Mining: practical machine learning tools and techniques with Java implementations. Morgan Kaufmann.
© Copyright 2026 Paperzz