Applying Genetic and Symbolic Learning
Algorithms to Extract Rules from Artificial
Neural Networks
Claudia R. Milaré, Gustavo E. A. P. A. Batista,
André C. P. L. F. de Carvalho, and Maria C. Monard
University of São Paulo - USP
Institute of Mathematics and Computer Science - ICMC
Department of Computer Science and Statistics - SCE
Laboratory of Computational Intelligence - LABIC
P. O. Box 668, 13560-970, São Carlos, SP, Brazil
{claudia, gbatista, andre, mcmonard}@icmc.usp.br
Abstract. Several research works have shown that Artificial Neural
Networks — ANNs — have an appropriate inductive bias for several
domains, since they can learn any input-output mapping, i.e., ANNs
have the universal approximation property. Although symbolic learning algorithms have a less flexible inductive bias than ANNs, they are
needed when a good understating of the decision process is essential,
since symbolic ML algorithms express the knowledge induced using symbolic structures that can be interpreted and understood by humans. On
the other hand, ANNs lack the capability of explaining their decisions,
since the knowledge is encoded as real-valued weights and biases of the
network. This encoding is difficult to be interpreted by humans. Aiming
to take advantage of both approaches, this work proposes a method that
extract symbolic knowledge, expressed as decision rules, from ANNs. The
proposed method combines knowledge induced by several symbolic ML
algorithms through the application of a Genetic Algorithm — GA. Our
method is experimentally analyzed in a number of application domains.
Results show that the method is able to extract symbolic knowledge
having high fidelity with trained ANNs. The proposed method is also
compared to TREPAN, another method for extracting knowledge from
ANNs, showing promising results.
1
Introduction
Artificial Neural Networks — ANNs — have been successfully employed in several
application domains. However, the comprehensibility of the induced hypothesis
is as important as its performance in many of these applications. This is one
of the main criticism about ANNs: the lack of capability for explaining their
decisions since the knowledge is encoded as real-valued weights and biases. On
the other hand, the comprehensibility of the induced hypothesis is one of main
characteristics of symbolic Machine Leaning — ML — systems. This work explores the lack of comprehensibility of the models induced by ANNs, proposing
solutions for the following problem:
Given a model produced by a learning system, in this case ANNs, and
represented in a language that is difficult to be understood by the majority
of the users, how to re-represent this model in a language that improves
comprehensibility in order to be easily understood by an user.
Several research works have investigated how to convert hypotheses induced
by ANNs to more human comprehensible representations, as surveyed in [1].
The majority of the proposed methods has several limitations, such as: can only
be applied to specific network models or training algorithm; do not scale well
with the network size; and are restricted to problems having exclusively discretevalued features.
This work proposes the use of symbolic ML systems and GAs to extract
comprehensible knowledge from ANNs. The main goal of this work is to obtain
a symbolic description of an ANN that has a high degree of fidelity with the
knowledge induced by the ANN. The proposed method can be applied to any type
of ANN. In other words, the method does not assume that the network has any
particular architecture, nor that the ANN has been trained in any special way.
Furthermore, the induction of symbolic representations is not directly affected
by the network size, and the method can be used for applications involving both
real-valued and discrete-valued features.
This paper is organized as follows: in Section 2 is described the proposed
method to extract comprehensible knowledge from ANNs; in Section 3 the method
is experimentally evaluated on several application domains; and finally, in Section 4 the main conclusions of this work are presented, as well as some directions
for future work.
2
Proposed Method
The method proposed in this work uses symbolic ML algorithms and GAs to
extract symbolic knowledge from trained ANNs. In this method, an ANN is
trained over a data set E. After the training phase, the ANN is used as a “black
box” to classify the data set E creating a new class attribute. The values of
the new class attribute reflect the knowledge learned by the ANN, i.e., these
values reflect the hypothesis h induced by the ANN. The data set labelled by the
trained ANN is subsequently used as input to p symbolic ML systems, resulting
in p symbolic classifiers h01 , h02 , . . . h0p . Each classifier h0i , 1 ≤ i ≤ p approximates
the hypothesis h induced by the ANN.
Unfortunately, each symbolic ML system represents the induced concept in a
different language, hindering the integration of these classifiers. Thus, it is necessary to translate the representation of these classifiers to a common language. In
order to make such translation, we used the work of Prati [7], which proposes a
standard syntax called PBM to represent rules. Thus, each symbolic classifier is
translated to the PBM format. After the classifiers h01 , h02 , . . . h0p are converted
to the PBM syntax, they are integrated into a rule database. An unique natural
number is assigned to each rule in the rule database in order to identify each
rule.
The rules stored in the rule database are used to form the individuals of the
GA population. Each individual is formed by a set of rules. The representation
of each individual is a vector of natural numbers, where each number is the
identifier of a rule in the rule database. The initial population of the GA is
composed by vectors with random numbers, representing sets of random rules
from the rule database.
During the GA execution, the individuals, i.e. the rule sets, are modified
by the mutation and crossover operators. The mutation operator randomly exchanges a rule from one of the rule sets by another rule from the rule database.
The crossover operator implemented is asymmetric, i.e., given two rule sets, two
crossover points are chosen. The sub-vectors defined by the two crossover points
are exchanged. It is important to note that as the crossover operator is asymmetric, even though all initial individuals have the same number of rules, the
selection of the most adapted individuals may conduct to the survival of larger
or smaller rule sets. The GA fitness function calculates the infidelity rate between
the ANN and each individual. Thus, the objective of the GA is to minimize the
infidelity rate. Infidelity rate is the percentage of instances where the classification made by an ANN disagrees with the classification made by the method used
to explain the ANN.
Another important issue is how to classify a new instance given a rule set. The
strategies employed to classify an instance given a set of rules are: SingleRule,
that uses the classification given by the rule with highest prediction power; and
MultipleRules that uses all fired rules in order to classify an instance. After the
execution of the GA, one individual winner is obtained. The set of rules is usually
composed by rules obtained from classifiers induced by different symbolic ML
systems. This set of rules is then used to explain the behavior of the ANN. A
post-processing phase may be applied to the winner. The objective of this postprocessing is to remove those rules from the winner that are not fired for any
instance in the training and validation sets. Additional details about the GA
implementation can be found in [6].
3
Experimental Evaluation
Several experiments were carried out in order to evaluate the proposed method.
Experiments were conducted using six data sets, collected from the UCI repository [2]. These data sets are related to classification problems in different application domains. Table 1 shows the corresponding data sets names and summarizes
some of their characteristics. The characteristics are: #Instances – the total
number of instances; #Features – the total number of features as well as the
number of continuous and nominal features; Class and Class % – the class values and distribution of these values; Majority Error – the majority error; and,
Missing Values – if the data set has missing values.
The experiments were divided into three phases. In the following sections
these phases are described in the same order that they were conducted and the
results obtained in each phase are also presented.
Data #Instances #Features
Set
(cont.,nom.)
breast
699
9(9,0)
crx
690
15(6,9)
heart
303
13(6,7)
pima
768
8(8,0)
sonar
208
60(60,0)
votes
435
16(0,16)
Class
benign
malignant
+
absence
presence
0
1
M
R
republican
democrat
Class % Majority Missing
Error
Values
65.52% 34.48%
yes
34.48%
55.50% 44.50%
yes
44.50%
54.13% 45.87%
yes
45.87%
65.02% 34.98%
no
34.98%
53.37% 46.63%
no
46.63%
54.80% 45.20%
no
45.20%
Table 1. Data sets summary description.
3.1
Phase 1
The objective of Phase 1 is to train the ANNs, whose
knowledge will be extracted in the next phases.
Phase 1 was performed as follows:
breast 2.98 (0.46)
1. Each of the six data sets was divided using
crx
13.47 (1.01)
the 10-fold stratified cross-validation resamheart 17.78 (2.12)
pling method [10]. Each training set was dipima 22.92 (1.15)
vided in two subsets: training set (with 90% of
sonar 20.14 (3.21)
instances) and validation set (with 10% of invotes 3.69 (0.86)
stances).
2. All the networks were trained using the Backpropagation with Momentum algorithm [9] and Table 2. Error rate oba validation set was used to decide when the tained by the ANNs —
training should stop. After several experiments, mean and standard devithe architectures chosen were: breast (9-3- ation.
1), crx (43-7-1), heart (22-1-1), pima (8-2-1),
sonar (60-12-1) and votes (48-1).
3. The error rate for each data set was measured on the test set. Table 2 shows
the mean error rates obtained in the 10 folds, and their respective standard
deviation between parenthesis.
3.2
Phase 2
In this phase, the ANNs trained in Phase 1 are used to label the data sets,
i.e., they are used to create a new class-attribute. The results obtained by the
proposed method are compared with another method for extracting knowledge
from ANNs, the TREPAN [4]. Given a trained ANN, and the training set used
for its training, the TREPAN method builds a decision tree to explain the ANN
behavior. TREPAN uses the trained ANN to label the training set and builds a
decision tree based on this data set. TREPAN can also generate artificial data
automatically, using the trained ANN to label the new instances. Unlike most
decision tree algorithms, which separate the instances of different classes by using
a single attribute to partition the input space, TREPAN uses m-of-n expressions
for its splits. A m-of-n expression is a boolean expression which is satisfied when
at least m (an integer threshold) of its n conditions (boolean conditions) are
satisfied.
Phase 2 was performed as follows:
1. TREPAN was executed with different values assigned to its main parameters, inMinSample SplitTest
cluding its default parameters, as showed
simple0
0 simple
in Table 3. The MinSample parameter
simple1000
1000 simple
of TREPAN specifies the minimum nummofn0
0 mofn
ber of instances (i.e. training instances
mofn1000
1000 mofn
plus artificial instances) to be considered
before selecting each split. The default Table 3. Parameter values emvalue is 1000. When this parameter is set ployed in the experiments with
to 0, no artificial instance is generated by TREPAN.
TREPAN. The SplitTest parameter defines if the splits of the internal nodes are
m-of-n expressions (mofn) or simple splits (simple). The option mofn1000 is
the default option used by the TREPAN method, that is, m-of-n expressions
are employed and 1000 instances must be available in a node for this node
to be either expanded or converted into a leaf node.
2. The infidelity rate between TREPAN and the ANN was measured on the test
set.
3. The syntactic comprehensibility of the knowledge extracted by TREPAN was
measured. In this work, the syntactic comprehensibility was measured considering the number of induced rules and the average number of conditions
per induced rule.
4. The training, validation and test sets used to train the ANNs in Phase 1 were
labelled by the corresponding ANNs.
5. The symbolic ML systems C4.5, C4.5rules [8] and CN2 [3] were chosen to
be used in the experiments. These systems are responsible for generating
the rules for the proposed method, i.e., the rules that will be integrated
by the GA. The C4.5, C4.5rules and CN2 were executed with their default
parameters.
6. The infidelity rate between the symbolic ML systems (C4.5, C4.5rules, and
CN2) and the ANN was measured on the test set, as well as the syntactic
comprehensibility of the models induced by the symbolic inducers.
3.3
Phase 3
In this phase, the classifiers produced in Phase 2 by the symbolic ML systems
C4.5, CN2 e C4.5rules on the training set labelled by the ANNs are used to form
the GA individuals.
Phase 3 was carried out as follows:
parameters values
1. The classifiers induced by the symbolic ML systems
ng
20
C4.5, C4.5rules e CN2 were converted to the PBM synn
20
i
tax.
t
10
i
2. For each of the six data sets employed in this experipc
0.25
ment, a GA was executed several times, varying their
pm
0.01
parameters as well as the strategies used to classify an
instance given a set of rules. Table 4 shows the values used for the parameters: ng (number of gerations); Table 4. GA’s pani (number of individuals); ti (initial size of individ- rameter values emual, that is, the number of rules of each individual); pc ployed in the exper(probability of crossover) and pm (probability of mu- iments.
tation), used with the approaches SingleRule and MultipleRules in the GA. The values ng = 40, ni = 15 and
pc = 0.4 were also used with the approach SingleRule.
3. Finally, the infidelity rate and the syntactic comprehensibility for the individual winner were measured.
Table 5 shows the infidelity rate (mean and standard deviation) obtained
by the symbolic ML systems, TREPAN and GA. Table 6 shows the number of
induced rules and Table 7 shows the average number of conditions per rule.
The results of infidelity rate, number of induced rules and mean number of
conditions per induced rule, for the symbolic ML systems and TREPAN, were
obtained in Phase 2. SingleRule means that the GA was executed with the
SingleRule strategy and SingleRulePP refers to the same strategy followed
by the post-processing; MultipleRules means the MultipleRules strategy and
MultipleRulesPP refers to the same strategy followed by the post-processing.
The GA was executed with the parameters ng = 20, ni = 20, ti = 10, pc = 0.25
and pm = 0.01 to SingleRule, SingleRulePP, MultipleRules and MultipleRulesPP. In ∗SingleRule and ∗SingleRulePP the parameters are ng = 40,
ni = 15, ti = 10, pc = 0.4 and pm = 0.01
In what follows, the best results are shown in boldface. The 10-fold crossvalidated paired t test [5] was used to compare the infidelity rate and comprehensibility. According to the 10-fold cross-validated paired t test, the difference
between two algorithms is statistically significant with 95% of confidence if the
result of this test is greater than 2.262 in absolute value. The ↑ symbol indicates that a difference is significant with 95% of confidence when the method in
boldface is compared to the other methods.
Table 8 shows the methods in ascending order of infidelity rate. Table 9
presents the methods in ascending order of mean number of induced rules, and
Table 10 shows the methods in ascending order of mean number of conditions
per induced rule. The numbers in the right side of the method’s name indicate
how many significant results, with 95% of confidence, the method obtained when
compared with the remaining methods. It can be observed that:
– For the breast data set, even though the method SingleRule obtained the
best result for the infidelity rate with 3 significant results when compared
breast
crx
heart
pima
sonar
votes
(0.53) 2.91 (0.81) ↑ 14.07 (1.54)
8.85 (1.07) 22.98 (3.21) 3.21 (0.97)
(0.53) 3.06 (0.72) 10.00 (1.11)
8.46 (1.01) 23.43 (3.25) 2.98 (1.07)
(0.62) ↑ 5.82 (0.75) ↑ 13.70 (1.99)
9.12 (0.94) 22.57 (3.35) 2.75 (0.95)
(0.44) 3.36 (0.71) ↑ 14.07 (1.81) ↑ 10.81 (0.65) 24.90 (3.06) 3.67 (1.03)
(0.95) 5.37 (1.85) 12.22 (2.53)
8.86 (1.15) 22.14 (2.91) 2.30 (0.96)
(0.63) 3.83 (0.84) 12.22 (1.75) 10.42 (1.31) 25.57 (3.15) 2.99 (0.97)
(0.80) 4.75 (1.49) 7.04 (1.40) 7.95 (1.01) ↑ 27.36 (2.73) 2.76 (0.75)
GA: ng = 20, ni = 20, ti = 10, pc = 0.25, pm = 0.01
SingleRule
1.61 (0.55) 3.52 (0.65) 12.22 (1.57)
8.85 (1.00) 22.05 (3.24) 2.76 (0.66)
SingleRulePP
2.20 (0.70) 4.44 (0.66) 12.59 (1.58) 10.42 (1.06) ↑ 24.93 (3.36) 3.91 (1.07)
MultipleRules
2.04 (0.58) 3.22 (0.67) 13.33 (1.48)
8.20 (0.97) ↑ 25.00 (2.11) 2.99 (0.83)
MultipleRulesPP 2.04 (0.58) 3.22 (0.67) 13.33 (1.48)
8.20 (0.97) ↑ 25.00 (2.11) 2.99 (0.83)
GA: ng = 40, ni = 15, ti = 10, pc = 0.4, pm = 0.01
∗SingleRule
1.75 (0.64) 3.68 (0.66)
8.89 (0.99)
8.60 (1.10) 23.05 (2.74) 2.98 (1.32)
∗SingleRulePP
2.63 (0.71) 4.90 (0.60) 11.48 (2.24)
8.34 (1.21) 21.07 (2.24) 4.15 (1.37)
↑ 2.93
2.48
1.90
↑ 3.37
↑ 4.39
2.78
3.07
C4.5
C4.5rules
CN2
simple0
simple1000
mofn0
mofn1000
Table 5. Infidelity rate (mean and standard deviation).
C4.5
C4.5rules
CN2
simple0
simple1000
mofn0
mofn1000
breast
↑ 8.90 (0.91)
↑ 8.00 (0.54)
↑ 12.10 (0.55)
↑ 9.40 (0.48)
↑ 10.60 (0.73)
↑ 7.40 (0.78)
2.40 (0.31)
SingleRule
SingleRulePP
MultipleRules
MultipleRulesPP
↑ 14.60
↑ 7.00
↑ 14.50
↑ 14.50
∗SingleRule
∗SingleRulePP
↑ 17.60 (1.33)
↑ 7.60 (0.56)
(1.12)
(0.47)
(1.92)
(1.92)
crx
heart
pima
sonar
↑ 14.10 (3.44) ↑ 17.40 (1.33) ↑ 25.00 (0.92) ↑ 13.50 (0.34)
↑ 8.70 (1.58) ↑ 11.40 (0.8) ↑ 18.30 (1.16) ↑ 8.10 (0.38)
↑ 13.50 (1.42) ↑ 16.00 (0.75) ↑ 24.40 (0.97) ↑ 25.20 (0.53)
↑ 31.50 (4.81) ↑ 18.10 (0.99) ↑ 26.10 (1.48) ↑ 12.40 (0.60)
↑ 31.00 (8.03) ↑ 20.00 (1.92) ↑ 29.30 (2.01) ↑ 9.50 (1.10)
↑ 11.10 (1.76) ↑ 9.20 (0.83) ↑ 23.30 (1.38) ↑ 6.90 (0.53)
↑ 9.90 (1.94) 5.30 (0.83) ↑ 26.00 (1.55) 4.10 (0.67)
GA: ng = 20, ni = 20, ti = 10, pc = 0.25, pm = 0.01
↑ 16.20 (3.15) ↑ 21.50 (2.26) ↑ 18.40 (1.97) ↑ 20.80 (3.86)
4.50 (0.96
7.90 (0.74) 9.60 (1.37) ↑ 7.60 (1.24)
↑ 19.80 (4.29) ↑ 18.40 (3.06) ↑ 30.20 (3.39) ↑ 22.60 (2.91)
↑ 18.80 (4.08) ↑ 18.10 (2.98) ↑ 30.20 (3.39) ↑ 21.60 (2.71)
GA: ng = 40, ni = 15, ti = 10, pc = 0.4, pm = 0.01
↑ 20.30 (5.08) ↑ 24.30 (2.03) ↑ 42.00 (3.71) ↑ 31.60 (2.09)
↑ 5.70 (1.07) ↑ 10.80 (0.87) ↑ 19.60 (1.65) ↑ 11.80 (1.45)
↑ 9.40
5.80
↑ 13.30
↑ 13.60
↑ 11.00
↑ 7.10
6.50
votes
(1.07)
(0.44)
(0.67)
(0.90)
(1.61)
(0.53)
(0.72)
↑ 15.00 (1.62)
5.30 (0.67)
↑ 14.00 (1.29)
↑ 12.90 (1.22)
↑ 14.40 (0.95)
6.10 (0.64)
Table 6. Number of induced rules (mean and standard deviation).
with the remaining methods, this method did not obtain good results for the
number of induced rules and for the mean number of conditions per induced
rule. The method SingleRulePP, even obtaining only 1 significant result
related to the infidelity rate, obtained 8 significant results for the number
of induced rules and 7 significant results for the mean number of conditions
per induced rule.
– For the crx data set, the method C4.5rules obtained good results for the
number of induced rules and for the mean number of conditions per induced
rule. For the infidelity rate, this method obtained only one significant result.
However, the other methods were not able to obtain many significant results.
– For the heart data set, the methods ∗SingleRule and mofn1000 obtained
the best results for the infidelity rate. However, for the number of induced
rules, the method ∗SingleRule did not obtain good results, and the method
mofn1000 did not obtain good results for the mean number of conditions per
induced rule.
↑ 3.63
2.57
2.56
↑ 3.54
↑ 4.00
↑ 6.74
↑ 7.49
breast
(0.17)
(0.05)
(0.07)
(0.11)
(0.19)
(0.54)
(0.48)
SingleRule
2.78
SingleRulePP
2.54
MultipleRules ↑ 2.79
MultipleRulesPP ↑ 2.79
(0.11)
(0.08)
(0.10)
(0.10)
C4.5
C4.5rules
CN2
simple0
simple1000
mofn0
mofn1000
∗SingleRule
∗SingleRulePP
2.65 (0.13)
2.46 (0.08)
crx
heart
pima
sonar
votes
2.62 (0.46) ↑ 3.25 (0.06) ↑ 5.66 (0.11) ↑ 4.27 (0.11) 2.68 (0.20)
2.17 (0.10) 2.41 (0.04) 2.92 (0.05) ↑ 2.84 (0.13) 2.52 (0.11)
↑ 2.84 (0.08v ↑ 2.92 (0.10)
3.02 (0.05) 2.03 (0.02) 2.64 (0.05)
↑ 2.78 (0.10) ↑ 3.05 (0.11) ↑ 5.58 (0.13) ↑ 3.93 (0.13) 2.83 (0.10)
2.61 (0.27) ↑ 3.27 (0.12) ↑ 5.94 (0.17) ↑ 3.81 (0.24) 2.61 (0.18)
↑ 7.62 (0.75) ↑ 9.47 (0.59) ↑ 10.23 (0.49) ↑ 9.35 (0.49) ↑ 5.46 (0.41)
↑ 10.10 (0.91) ↑ 10.17 (0.41) ↑ 10.73 (0.32) ↑ 23.41 (1.51) ↑ 9.04 (0.90)
GA: ng = 20, ni = 20, ti = 10, pc = 0.25, pm = 0.01
↑ 2.97 (0.14)
2.83 (0.08)
3.73 (0.16)
2.87 (0.09) 2.55 (0.11)
2.25 (0.27)
2.75 (0.12)
2.96 (0.09)
2.94 (0.14) 2.28 (0.16)
↑ 3.00 (0.16)
2.89 (0.11) ↑ 3.79 (0.08)
2.91 (0.07) 2.54 (0.11)
↑ 2.96 (0.16)
2.88 (0.11) ↑ 3.79 (0.08)
2.95 (0.08) 2.47 (0.09)
GA: ng = 40, ni = 15, ti = 10, pc = 0.4, pm = 0.01
↑ 2.86 (0.15)
2.78 (0.10) ↑ 3.83 (0.07)
2.94 (0.11) 2.59 (0.12)
2.41 (0.20)
2.75 (0.08)
3.14 (0.09)
2.91 (0.05) 2.31 (0.08)
Table 7. Average number of conditions per induced rule (mean and standard
deviation).
– For the pima data set, the methods ∗SingleRulePP and C4.5rules obtained
good results for the syntactic complexity. For the infidelity rate, the maximum number of significant results obtained by the methods was 1, thus none
of them can be considered the best.
– For the sonar data set, the method ∗SingleRulePP obtained very good
results for the infidelity rate and the number of induced rules.
– For the votes data set, none of them presented significant results for the
infidelity rate. However, the methods SingleRulePP, ∗SingleRulePP and
C4.5rules presented good results for the syntactic complexity.
Finally, analyzing the results in a general way, it is possible to conclude that
the use of GAs for extracting comprehensible knowledge from ANNs is promising
and should be explored in more deep. One of the aspects that should be better
investigated refers to the good results obtained by the strategy SingleRule compared to the strategy MultipleRules. Apparently, a GA builds better classifiers
considering the best rule to classify a new instance, instead of considering all
the rules that cover an instance and to decide the classification of this instance
based on the global quality of these rules.
4
Conclusion
In this work, we propose a method based on symbolic ML systems and GAs in
order to extract comprehensible knowledge from ANNs. The main advantage of
the proposed method is that it can be applied to any supervised ANN. The use
of GAs allows the integration of the knowledge extracted by several symbolic ML
systems in a single set of rules. This set of rules should have a high fidelity with
the ANN due to the fact that this set of rules will be used to explain the ANN.
This task is not trivial, since in order to obtain a high degree of fidelity, the rules
have to complement themselves. In the experiments carried out, the proposed
method achieved satisfactory results, and should be further explored.
breast
crx
heart
pima
sonar
votes
1
SingleRule3
C4.51
mofn10003
mofn10001 ∗SingleRulePP4
simple1000
2
∗SingleRule3
C4.5rules1
∗SingleRule6
MultipleRules1
SingleRule1
CN2
3
CN21
MultipleRules2
C4.5rules1 MultipleRulesPP1
simple1000
SingleRule
4
MultipleRules2 MultipleRulesPP2 ∗SingleRulePP
∗SingleRulePP
CN2
mofn1000
5 MultipleRulesPP2
simple01
SingleRule1
C4.5rules1
C4.5
C4.5rules
6
SingleRulePP1
SingleRule
mofn0
∗SingleRule
∗SingleRule
∗SingleRule
7
C4.5rules
∗SingleRule
simple1000
SingleRule
C4.5rules MultipleRulesPP
8
∗SingleRulePP1
mofn01
SingleRulePP
C4.5
simple0 MultipleRules
9
mofn0
SingleRulePP MultipleRules
simple1000
SingleRulePP
mofn0
10
C4.5
mofn1000 MultipleRulesPP
CN2 MultipleRulesPP
C4.5
11
mofn1000
∗SingleRulePP
CN2
SingleRulePP MultipleRules
CN2
12
simple0
simple1000
C4.5
mofn0
mofn0
SingleRulePP
13
simple1000
CN2
simple0
simple0
mofn1000 ∗SingleRulePP
Table 8. Methods ordered by infidelity rate.
breast
crx
heart
pima
sonar
votes
1
mofn100012
SingleRulePP11
mofn100011 SingleRulePP12
mofn100012
SingleRulePP9
8
10
10
7
8
2
SingleRulePP ∗SingleRulePP
SingleRulePP
C4.5rules
mofn0
C4.5rules8
5
3
8
6
7
3
mofn0
C4.5rules
mofn0
SingleRule
SingleRulePP
∗SingleRulePP8
4
∗SingleRulePP7
mofn10003 ∗SingleRulePP7 ∗SingleRulePP8
C4.5rules7
mofn10006
5
C4.5rules4
mofn03
C4.5rules6
mofn02
simple10006
mofn06
6
C4.54
CN22
CN2
CN22 ∗SingleRulePP4
C4.54
7
simple03
C4.52
C4.5
C4.5
simple05
simple1000
8
simple10001
SingleRule1
simple0
mofn10001
C4.53 MultipleRulesPP2
9
CN2 MultipleRulesPP1 MultipleRulesPP
simple01
SingleRule1
CN2
10 MultipleRules MultipleRules1
MultipleRules
simple1000 MultipleRulesPP2
simple0
11 MultipleRulesPP
∗SingleRule1
simple1000 MultipleRules
MultipleRules
MultipleRules
12
SingleRule
simple1000
SingleRule MultipleRulesPP
CN2
∗SingleRule
13
∗SingleRule
simple0
∗SingleRule
∗SingleRule
∗SingleRule
SingleRule
Table 9. Methods ordered by number of induced rules.
Several ideas for future works will be evaluated, such as:
– More symbolic ML systems can be used to increase the diversity of rules.
Another strategies are to vary the parameters of these systems and to induce
classifiers on different samples.
– In the current implementation, we chose to generate the individuals of GA by
randomly selecting rules from the whole set of classifiers. As the individuals of
the initial population are not necessarily “good” classifiers, there is a higher
chance of stopping in a local maxima. In this work, we opted for building
the initial population randomly, with the objective of verifying the potential
of the proposed method without favoring the GA. We intend to investigate
the behavior of the GA when each initial individual of the population is a
“good” classifier. This can be accomplished by using the induced classifiers
as initial individuals.
– Another important aspect regarding the set of generated rules is that this
set should cover an expressive region of the instance space. In other words,
it is of little use to have a set of rules in which each individual rule is highly
fidel to an ANN if all these rules cover the same instances. If a set of rules
has several redundant rules, then several instances might be classified by the
breast
crx
heart
pima
sonar
votes
1
∗SingleRulePP7
C4.5rules8
C4.5rules6
C4.5rules8
CN26
SingleRulePP2
2
SingleRulePP7
SingleRulePP2 ∗SingleRulePP2
SingleRulePP5
C4.5rules5 ∗SingleRulePP2
3
CN24 ∗SingleRulePP2
SingleRulePP2
CN28
SingleRule1 MultipleRulesPP3
4
C4.5rules5
simple10002
∗SingleRule2 ∗SingleRulePP9
MultipleRules2
C4.5rules3
5
∗SingleRule7
C4.52
SingleRule2
SingleRule5 ∗SingleRulePP1
MultipleRules2
6
SingleRule5
simple02 MultipleRulesPP2
MultipleRules5
∗SingleRule1
SingleRule3
7
MultipleRules5
CN22
MultipleRules2 MultipleRulesPP5
SingleRulePP1
∗SingleRule2
8 MultipleRulesPP5
∗SingleRule2
CN24
∗SingleRule5 MultipleRulesPP1
simple10002
9
simple03 MultipleRulesPP2
simple02
simple03
simple10002
simple02
10
C4.53
SingleRule2
C4.52
C4.52
simple03
CN22
11
simple10002
MultipleRules2
simple10002
simple10002
C4.52
C4.52
12
mofn0
mofn01
mofn0
mofn0
mofn01
mofn01
13
mofn1000
mofn1000
mofn1000
mofn1000
mofn1000
mofn1000
Table 10. Methods ordered by average number of conditions per induced rule.
default rule. In the current implementation, the default rule classifies every
instance as belonging to the majority class. A strategy to create individuals
with complementary covering rules is to introduce this information in the
GA fitness function.
Acknowledgements This research is partially supported by Brazilian Research
Councils CNPq and FAPESP.
References
[1] R. Andrews, J. Diederich, and A. B. Tickle. A Survey and Critique of Techniques
for Extracting Rules from Trained Artificial Neural Networks. Knowledge-Based
Systems, 8(6):373–389, 1995.
[2] C. Blake, E. Keogh, and C. J. Merz. UCI Repository of Machine Learning
Datasets, 1998.
[3] P. Clark and T. Niblett. The CN2 Induction Algorithm. Machine Learning,
3:261–284, 1989.
[4] M. W. Craven. Extracting Comprehensible Models from Trained Neural Networks.
PhD thesis, University of Wisconsin - Madison, 1996.
[5] T. G. Dietterich. Approximate Statistical Tests for Comparing Supervised Classification Learning Algorithms. Neural Computation, 10(7):1895–1924, 1997.
[6] C. R. Milaré. Extraction of Knowledge from Artificial Neural Networks Using
Symbolic Learning Systems and Genetic Algorithms (in Portuguese). PhD thesis,
University of São Paulo, 2003.
[7] R. C. Prati, J. A. Baranauskas, and M. C. Monard. A Proposal for Unifying
the Representation Language of Symbolic Machine Learning Algorithms (in Portuguese). Technical Report 137, ICMC-USP, 2001.
[8] J. R. Quinlan. C4.5 Programs for Machine Learning. Morgan Kaufmann, 1988.
[9] D. Rumelhart, G. Hilton, and R. Williams. Learning Internal Representations
by Error Propagation. In Parallel Distributed Processing: Explorations in the
Microstructure of Cognition, volume 1. MIT Press, 1986.
[10] S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn. Morgan Kaufmann, 1991.
© Copyright 2026 Paperzz