Use of Dempster-Shafer theory to combine classifiers which use

Use of Dempster-Shafer theory to
combine classifiers which use different
class boundaries
Mohammad Reza Ahmadzadeh and Maria Petrou
Centre for Vision, Speech and Signal Processing,
School of Electronics, Computing and Mathematics,
University of Surrey, Guildford GU2 7XH, UK
Tel: +44 1483 689801, Fax: +44 1483 686031
E-mail:[email protected]
Abstract
In this paper we present the Dempster-Shafer theory as a framework within which the results of a
Bayesian Network classifier and a fuzzy logic-based classifier are combined to produce a better final classification. We deal with the case when the two original classifiers use different classes for the outcome.
The problem of different classes is solved by using a superset of finer classes which can be combined to
produce classes according to either of the two classifiers. Within the Dempster-Shafer formalism not only
the problem of different number of classes can be solved, but also the relative reliability of the classifiers
can be taken into consideration.
Keywords
Classifier Combination, Dempster-Shafer Theory, Bayesian Networks, Fuzzy Logic, Expert Rules
1
Introduction
It has been established recently that combining classifiers improves the classification accuracy for many
problems. This has been established both theoretically, mainly within the framework of probability theory
[7], and experimentally by many researchers. In addition, in the neural network field several approaches
have been introduced to combine individual neural networks to improve the overall accuracy and perfor-
Currently with the School of Electronic Technology, Shiraz University, Shiraz, Iran, Tel: +98 711 7266262, Fax:
mance.
+98 711 7262102, Email: [email protected]
1
Combining multiple neural network classifiers can be divided into two categories: Ensemble and modular [14], [15]. In the ensemble-based approaches, the number of output classes is the same for all classifiers.
Each classifier has a complete solution for the problem, however combination of classifiers is used to improve the classification rate. On the other hand, in the modular approach, a problem is broken into several
simple sub-problems. For example, a problem with 5 output classes can be changed into several subproblems with 2 output classes. Each sub-problem can be solved using a neural network. Combination of
all classifiers provides a solution for the original problem. We can see that in this approach each classifier
does not provide a solution for the problem, but all classifiers together are complementarily used to find the
final classification.
Combination of classifiers has also been investigated extensively when other types of classifier are
used. Generally, classifiers can be combined at different levels: abstract level, ranking level and measurement level [19], [6]. In the abstract level only the top choice of each classifier is used for the purpose of
combination. In the measurement level complete information for the outputs of the classifiers e.g. score for
each possible output, is available and is used in the combination process. Although the combination at the
abstract level uses the least information (only the top choice of the classifiers), it has been used frequently
because all kinds of classifiers, such as statistical and syntactic, can be combined easily [19]. The ranking
level approach may also be used to combine classifiers of any type at the expense of decision detail. It
can be used not only to combine classifiers the outputs of which are either class rankings or confidence
measures that can easily be converted into class rankings, but also classifiers that output single classes. The
latter, however, is achieved by coarsening the the output of the other classifiers to comply with the classifier
that outputs the least information [5]. Approaches that combine classifiers at the measurement level can
combine any kind of classifiers that output measurements, but for the purpose of combination, these measurements should be translated into the same kind of measurement. For example, a classifier which supplies
information at the output based on distances can not be directly combined with a classifier which outputs
post-probability.
Xu et al. in [19] used three methods of combining classifiers, all at the abstract level. They combined
classifiers by using the Bayesian formalism, the voting principle and the Dempster-Shafer formalism. Their
results on a case study showed that the Dempster-Shafer theory of evidence had the best results in comparison with the other methods. They used the recognition rate and substitution rate of classifiers to define
2
the mass functions. The mass function of the selected output (top choice) was defined from the recognition
rate and the mass function of the other outputs (complement of the selected output) was defined from the
substitution rate. If the summation of the recognition rate and the substitution rate was less than 100%,
the remaining was called rejection rate and was assigned to the frame of discernment. This remaining indicated that the classifier was not able to decide, so it was interpreted as lack of information according to
Dempster-Shafer theory.
Rogova in [10] uses the Dempster-Shafer theory of evidence to combine neural network classifiers. All
, is the same
represents the hypothesis that the output vector is of class . Let the th classifier be
for all classifiers.
$$% & . Further, let
denoted by , the input vector by , and the output vector by !#" , where the mean vector of the outputs of classifier ' be denoted by ( when the input is an element of the training
set for class . A proximity measure can be defined using ( and . Rogova used a proximity measure,
) * % ( & , to define the mass functions. She defined various proximity measures in order to identify
)
the best one. For any classifier and each class the proximity measure was defined to represent
+
or pro , , -/ . , was denoted by ) , . Proximity
the support for hypothesis . Any evidence against
classifiers have the same number of outputs. So the frame of discernment,
measures which were considered as mass functions of the simple support functions were combined using
a simplified version of Dempster’s rule of combination. Having combined evidence from all classifiers,
Dempster’s rule of combination was used again to find the total confidence for each class . The class
with the maximum confidence was singled out as the output of the classification. Rogova claimed that this
method of combing classifiers could reduce misclassification by 15-30% compared with the best individual
classifiers [10].
There are other classifier combination approaches in the literature, some of which were compared in
[18]. The average [17], the weighted average, the Borda count [2], the fuzzy integral, the fuzzy connectives
[9], the fuzzy templates and neural network approaches are among those which have been investigated in
the literature. For an up-to-date overview of this research area, see for example the collection of papers in
[8].
In all studies found in the literature so far, the classifiers combined are expected to use the same classes
to classify the objects in question. In this paper we address the problem of different classes, which however
span the same classification space. Some clarification is necessary here: When the classes of objects are
3
expressed by nouns, they are discrete and they are uniquely defined. Examples of such classes are “chair”,
“door”, etc. However, there are problems where the classification of objects refers to some of their properties which may vary continuously. In such cases the defined classes are ad hoc quantisations of otherwise
continuously varying properties. Such classes are for example “very hot”, “hot”, “lukewarm”, “cold” etc.
In all such cases there is a hidden measurable quantity which takes continuous values and which characterises the state of the object. In the particular example, the measurable quantity is the temperature of the
object, and the attribute according to which the object has to be classified is its thermal state. The division
of this continuity of possible states into a small number of discrete classes can be done on the basis of the
actual temperature value, using class boundaries that may be specified arbitrarily. Two different experts
(classifiers) may specify different class boundaries. It is this type of problem we address here, with the
help of the Dempster-Shafer theory of evidence, where the results of the two classifiers are considered as
items of evidence in support of a certain proposition. The problem of different classes is solved by using a
superset of finer classes which can be combined to produce classes according to either of the two classifiers.
The nearest research to our work is that of using error correcting codes (ECC) [3] for classification. An
ECC uses more characteristics than there are necessary to classify something and then it maps the superset
of characteristics to the minimum meaningful number that are needed for the classification. One may see
each characteristic as defining a different class and consider that ECC maps a larger number of classes to a
smaller one (the correct ones). Our method differs in several respects: 1) The classes used by either of the
classifiers are both legitimate sets of classes. 2) Our method can be used to refine/improve the performance
of either classifier. 3) ECC uses classifiers that have as output yes/no answers, while the approach we use
here comes up with probabilities assigned to each possible class.
Our method is demonstrated in conjunction with the problem of predicting the risk of soil erosion of
burned forests in the Mediterranean region using data concerning relevant factors like soil depth, ground
slope and rock permeability. This problem has been solved in the past using Pearl-Bayes networks [16]
and Fuzzy Logic [12], [11]. The results of these classifiers were available to us and they are combined to
produce a more reliable classification.
4
2 Data Description
Soil erosion depends on three variables: slope, soil depth and rock permeability. Other factors that
may influence soil erosion are not taken into account as they were uniform in the area of study which
our data refer to. Geophysical data are available from 39 sites of four areas in Greece. For each of these
sites the classification by a human expert to one of five possible classes of soil erosion is also available.
Each of the problem variables takes values from a small set of possible classes. Stassopoulou et al. [16]
implemented a Pearl-Bayes network with which they solved the problem of combining the values of the
attributes, alongside the uncertainties associated with them, in order to infer the probability with which
a site belonged to one of the possible classes of risk of soil erosion. The use of a Pearl-Bayes network
involved the use of conditional probability functions. For the case when the combined attributes and inferred
conclusions are discrete valued quantities, these conditional probabilities are matrices. In the particular
case, as three attributes were combined to assess the risk of soil erosion, if each variable involved could
take
0
possible values, the matrix ought to have been
0213041504130
. So, for
0 76 , there should be
625 elements of the matrix, each expressing the probability of the site to belong to a certain class of soil
erosion, given that the site attributes have certain combination of classes. The calculation of such a large
number of probabilities, however, required the availability of a large number of data. In research problems
one very seldomly has at one’s disposal enough relevant data for such estimation. To reduce the severity of
the problem, Stassopoulou et al. quantised all variables of the problem into three classes each, thus having
to compute only 81 conditional probability values. Their results were quite satisfactory: They obtained
consistent results on the training set for 28 out of the 30 training sites, and hardening their conclusions
produced agreement with the expert in 7 out of the 9 test sites. However, in spite of their accuracy, these
results used gross classes, as each variable was quantised only into one of 3 possible classes.
Sasikala et al. [11], solved the same problem, using the same data, but as no numerical restriction
existed, their results classified the risk of soil erosion into one of five possible classes, the same ones
used by the expert who had made the assessment in the first place. Sasikala et al., in order to solve this
problem developed a new fuzzy methodology, which involved a training stage: weights were used for the
membership functions to reflect the relative importance of the combined attributes, and many different
combination rules were tried. The system was trained for the selection of the best set of weights and the
best combination rule. Upon hardening the final classification, they could have consistency in the training
5
data in 18 out of the 30 sites and they could predict correctly the class of the test sites in 5 out of the 9 cases.
However, the use of weights and a variety of combination rules, produced a blunt decision system: in some
cases more than one possible classes had equally high membership functions.
The idea we propose here is to combine the results of the accurate probabilistic classifier, which uses
gross classes, with the results of the blunt fuzzy classifier, which uses finer classes, in order to obtain a final
classification which will be more accurate and less blunt.
3 Dempster-Shafer Theory
The theory of evidence was introduced by Glean Shafer in 1976 as a mathematical framework for the
8 and 8 be two mass functions on the same frame of discernment,
. The mass function 8 8 : 9 8 which is called orthogonal summation of 8<; and 8>= is defined as
representation of uncertainty. Let
follows [13], [1].
for all non empty sets
D
?UT 8!%@?A& ;CBE; D G:H
F KI JML 8 O% NE&P8 %RQS&
(1)
where
D GVH
F IKJ'W 8 %RN!&P8 %@QX& is considered as a normalisation factor and is needed to make sure that no mass is assigned to the empty
set. In addition it is a measure of conflict between the two masses. If
conflict. If
D ;
the combination of
8!;
and
85=
DY . ;
it means that there is no
does not exist and they are totally or flatly contradictory.
For illustrating purposes it is convenient to represent this rule of combination graphically, as shown in
? ? ? , and Z Z Z\[ be focal elements of mass functions 8 8 respectively. Along the horizontal side of a unit square we show the mass functions of all elements of 8 .
figure 1. Let
and
The width of each strip is proportional to the value of its corresponding mass function. The mass functions
8 are shown along the vertical side of the same square. The area of intersection of strips 8 %@? , & and
8 %@Z [ & (dashed area) represents the amount of mass that is assigned to ? ,^] Z [ . According to Dempster’s
8 C9 8 %@_`& is proportional to the sum of the areas of all rectangles where
combination rule, 8E%@_`&
_ ? ,] Z [ . It is possible that for some - and a we have ? ,] Z [ cb . The mass functions have to be
scaled in order to make sure that the sum of a mass function over all subsets of is 1. This is done with
, ] Z\[ db for all - and a , all mass of the combination goes to the empty
the help of D in equation (1). If ?
and 8 are not combinable or are totally contradictory.
set and we say that 8
of
6
8 @% Z
8 %@Z
8 %@? & 8 %@? & eee 8 %R? , & eee
&
&
..
.
8 %@ZC[&
////////
////////
////////
..
.
Figure 1: Combination of mass functions
4
The Proposed System of Classifier Combination
In this paper we use the Dempster-Shafer theory to combine the items of evidence that come from
the Bayesian network of Stassopoulou et al. and the fuzzy logic classifier of Sasikala et al. One of the
conditions to be able to use the Dempster-Shafer theory is that all sources should have the same frame of
discernment [13], [4]. In our case this is not true, as for example, risk of soil erosion is classified into 3
classes, which we denote by
denote by
? , ? , g? f , in the Bayesian network method, and into 5 classes, which we
Z , , ZAh , in the fuzzy logic method.
To be able to use the Dempster-Shafer theory in this application, we look for a definition of a frame of
discernment in which both methods can be defined. Since both methods span the same classification space,
we quantise the classification space into 15 classes,
_ to _ h . These classes can be expressed in both
methods because 15 can be divided by 3 and 5. In other words, the union, for example, of the first 5 new
classes i.e.
_ :V _:h is the same as the first class of the Bayesian network method, i.e.
union of the first 3 new classes i.e.
_ _ i_ f is the same as the first class, i.e.
? . Also the
Z , of the fuzzy logic
method. Figure 2 shows schematically the idea of defining this superset of classes.
The next step is defining the mass functions from two classifiers. We interpret the beliefs of the Bayesian
network system as mass functions in the Dempster-Shafer theory. Since the output measurements of a
Bayesian network are in the form of probabilities, no further conversion is needed to use them as mass
functions. However, the membership grades of classes in the fuzzy system, although in the range
j k ;l ,
they do not sum up to 1. Therefore we cannot interpret them as mass functions directly. Instead, we use
them in order to distribute the unit mass, proportionally, to the corresponding classes.
7
A1
B1
A2
B2
A3
B3
C1
B4
B5
C 15
...
Figure 2: Definition of the superset of classes.
Before using the defined mass functions in Dempster’s combination rule, another factor that should be
taken into consideration is the relative reliability of the two classifiers. If we have the recognition rate,
substitution rate (error rate) and rejection rate of a classifier, its reliability can be defined as [19]:
nmo -qp
rs- o -qtP ;k+k^nmyuB vxw nm-Pa tP-mxv u tP- v (2)
If a classifier does not include a rejection option, like the Bayesian classifier of Stassopoulou et al. [16], its
reliability is the same as its recognition rate. So, we are going to use as reliability of the Bayesian classifier
its recognition rate.
The fuzzy logic classifier, however, was based heavily on the use of individual production rules, which
themselves might be treated as a collection of individual classifiers. One does not necessarily expect that all
rules used are equally reliable; indeed, some of them may even be wrong. To assign, therefore, an overall
reliability factor to the fuzzy classifier would be equivalent to ignoring the peculiarities of the individual
classifiers this is a collection of. We decided instead, to examine, with the help of a training phase, the
reliability of the individual firing rules. They are these individual reliability factors that are used to moderate
the mass functions of the fuzzy classifier.
In the Dempster-Shafer theory we can interpret unreliability of a source as lack of information. So,
after we scale down the mass functions which we have already defined for each classifier, by taking into
consideration the reliability of the classifier, we assign the remaining mass to the frame of discernment as
lack of information.
In figure 3 the mass functions derived from the Bayesian network and the fuzzy logic system after
considering the reliability of the classifiers, are denoted by
8
8 and
8 respectively. The combination of
z ƒ |‰ˆ { 
z ƒ |‰ˆ ƒ 
z ƒ |‰ˆ … 
z ƒ |‰ˆM
z ƒ |‰ˆKށ
znƒ|‡€
z`{}|~€{‚
z`{}|~„ƒ
z`{}|~K…
z`|Š {}‹ Š ƒ‹ Š … 
Œ
Œ
z†|Š$ ‹ ŠŽ
z`|Š$
Œ
Œ
†z |А ‹ Š$‘ ‹ Š$’
Œ
Œ
†z |Š {R“ 
z`|‰Š {q{‹ Š {@ƒ 
Œ
Œ
`z |‰Š {R…‹ Š {  ‹ Š { ށ
z`|ŠK{ ‹•”•”–”–‹ Š Ž  z†|Š  ‹”••” ”–‹ ŠK{R“ †z |ŠK{q{ ‹”•”•”–‹ ŠK{ Ž 
z†{}|‡€
z†|Š {‹ Š ƒ‹ Š … 
z†|Š$ ‹ ŠŽ ‹ Š$
z†|А ‹ Š$‘ ‹ Š$’
z†|Š {R“‹ Š {q{}‹ Š {@ƒ 
z†|Š {R…‹ Š {  ‹ Š { ށ
z`|‡€
Figure 3: Combination of mass functions
mass functions is denoted by
8
. Note that the square area denoted by, for example,
8!%—_™˜&
is equal to
8 %@? &A1E8 %RZ & . This value is used in Dempster’s rule of combination given by equation 1 in order
to assign mass to _ ˜ . As it can be seen, in fifteen cases, the mass functions which resulted from the
combination of the two sources can be assigned to non empty sets. Here the normalisation factor, D , is:
D 8 %@? &P8 %RZ &Mš›8 %@? f &P8 %@Z &–šœ8 %R? f &q8 %RZ &š8 %R? &q8 %RZ f &š
(3)
8 %R?gf&q8 %RZAf&š8 %R? &q8 %RZgž&šŸ8 %R? &q8 %RZAh&š8 %R? &q8 %RZAhx&
For example, we have:
R% ? &q8 %RZ &
+
8
E8 %@_ _ :_ fx&
C; B!D
(4)
Although we have classified the risk of soil erosion into 15 classes, we would like to have the result in
5 classes as used by the expert and by the fuzzy logic system. Thus, we calculate the belief function
of the classes of interest, by using the mass functions of the focal elements. So, after scaling all mass
functions which are assigned to non empty subsets, the summation of masses of the classes in each row
will be the belief function of the corresponding class. For example, summation of
8E%@_\ž _:hx& , 8!%—_i˜& and
8E%@_iž _:h _i˜& in the second row will be assigned as r mxo %@_Vž _:h _:˜x& which is the belief of the second class
out of the 5 possible classes, i.e.
r mxo %RZ & r mo %@_ ž _ h _ ˜ & !8 %—_ ž _ h &'š›8E%@_ ˜ &š›8E%@_ ž _ h _ ˜ &
Figure 4 shows schematically the proposed combination system.
9
A
Slope
Bayesian
Mass
Network
Function
Gen.
Soil depth
Dempster’s
B
Rock per.
Fuzzy
Mass
Logic
Function
System
Gen.
Combination
Rule
Reliability of Classifiers
Figure 4: Proposed combination of classifiers system to assess the risk of soil erosion.
5 Experimental Results
If we denote the output beliefs of the Bayesian network by
recognition rate by
with
« ¬ ¸ÉÈÈ ½ ÈÊ
«­¬ , we used:
®
¡£¢`¤™¥@¦`§¨ , ¡£¢`¤™¥@¦™©¨ , ¡£¢`¤™¥@¦™ª¨
and its
¯°O±q²
³s±q°O±q´Pµ™¶ · ¦¸« ¬º¹
»¼¼¾½
¿ § ¥R¦gÀP¨Á¸ ¡†¢†¤™¥R¦gÀP¨VÂë ¬ ± ¸ »
»¼+¼
Ä
ĂůÄÇ
.
To deal with the reliability of the fuzzy classifier, we multiplied with weights,
¼SËÍÌ ÀºÎ » Ä ± ¸
» ĂÅ
Ä ½½½ Ä‚Ï , the different mass functions which resulted from different expert rules used by it. We used 30
training sites to identify the best weights which would produce the best results. It is worth mentioning that
we used exhaustive space search to find the best weights. However, in every set of weights we fixed one of
the weights to be 1 in order to make the space search smaller, and because this way the weights measured
the relative importance of the various rules. We found that the best results could be obtained when the mass
¡©
¡™Ð
were scaled down by 0.51 .
Ñ This is a very interesting outcome as it indicates that perhaps less emphasis should be placed on rules that lead
functions of classes
and
to classes other than the two extremes and the middle. We speculate that perhaps people find easy to classify things
in classes like “low”, “medium” and “high”, or “good”, “average” and “bad” etc. It is more difficult to ask people to
classify things in classes like “very low”, “low”, “medium”, “high” and “very high”, or “very good”, “good”, “average”,
“bad” and “very bad”. It seems that heuristics devised by people to yield classification in classes inserted between the
3 gross ones, may not be as reliable as rules that classify into the clear cut and well defined classes of two extremes and
a medium.
10
After the reliability of each classifier was taken into consideration, the sum of its mass functions was
not 1. The difference of the sum from 1 was assigned to the frame of discernment which is interpreted as
the lack of information. For example, for the fuzzy classifier
h
8 % & 8 %RZ Z AZ f Zgž ZAhx& C; B F, J KÒ , 8 %@Z , & where
Ò , are the weights identified during the training phase. For the Bayesian classifier
8 % & 8 %@? ? ? f & ;CB F , f Z£(`ә%@? , &qÔ L ;k+k
J
By using the maximum belief function criterion in the decision making step, 6 out of the 9 testing sites
were correctly classified and the other 3 sites were classified in the next class from that assigned by the
expert. This should be compared with the 5 sites which were correctly classified by the fuzzy classifier
alone.
6 Discussion and Conclusions
In classification problems where the classes used represent different grades of the same attribute, it is
possible to have different divisions into classes used by different classifiers. A similar situation may arise
when the individual classifiers are unsupervised and determine the data classes automatically. In such cases
it is not possible to combine the results of the different classifiers in a straightforward way. We proposed
here that one may handle such problems within the framework of Dempster-Shafer theory. The DempsterShafer theory as a classifier combination method allows one to deal with the different number of classes
used by the different classifiers, because, unlike Bayesian theory, it assigns probabilities to sets of possible
hypotheses, not just to individual hypotheses. In addition, it allows one to take into consideration the
reliability of the classifiers in the process of mass definition.
We demonstrated our ideas using an example problem of prediction of soil erosion. Within the limitations of this problem, we showed that not only the accuracy of the individual classifiers was improved
but also that a finer set of output classes could be obtained. Although our results are too limited and their
statistical significance can not be estimated due to the limitations of our dataset, this should not degrade
the proposed methodology which is generic and applicable to any situation where the classes defined by
the different classifiers are different. This stems from the ability of the Dempster-Shafer theory to assign
probabilities to sets of possible classes and not just to individual classes.
11
Acknowledgements
Mohammad Reza Ahmadzadeh was on leave from the University of Shiraz, Iran, when this work was
carried out as part of his PhD thesis. He is grateful to the Ministry of Science, Research and Technology of
Iran for its financial support during his research studies.
References
[1]
J. A. Barnett. Computational methods for a mathematical theory of evidence. In Proceedings of the 7th International Joint Conference on Artificial Intelligence., Vancouver, BC, Ca, volume 2, pages 868–875, Aug. 1981.
[2]
C. Black. The Theory of Committees and Elections. Cambridge University Press, 1963.
[3]
T. G. Dietterich and G. Bakiri. Solving multiclass learning problems via error-correcting output codes. Journal of
Artificial Intelligence Research, 2:263–286, 1995.
[4]
J. W. Guan and D. A. Bell. Evidence Theory and its Applications. ELSEVIER SCIENCE PUBLISHER B.V.,
1991.
[5]
T. K. Ho, J. J. Hull and S. N. Srihari. Decision Combination in Multiple Classifier Systems. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 16(1):66–75, 1994.
[6]
H. Kang, K. Kim, and J. H. Kim. A framework for probabilistic combination of multiple classifiers at an abstract
level. Engineering Applications of Artificial Intelligence, 10(4):379–385, 1997.
[7]
J. Kittler. Combining classifiers: A theoretical framework. Pattern Analysis and Applications, 1:18–27, Jan 1998.
[8]
J. Kittler and F. Roli (Eds). Multiple Classifier Systems. Springer LNCS 2096, ISBN 3-540-42284-6, 2001.
[9]
L. Kuncheva. An application of owa operators to the aggregation of multiple classification decisions. In R. Yager
and J. Kacprzyk, editors, The Ordered Weighted Averaging Operators. Theory and Applications. Kluwer Academic Publishewrs, 1997.
[10] G. Rogova. Combining the result of several neural network classifiers. Neural Networks, 7(5):777–781, 1994.
[11] K. R. Sasikala and M. Petrou. Generalised fuzzy aggregation in estimating the risk of desertification of a burned
forest. Fuzzy Sets and Systems, 118(1):121–137, February 2001.
[12] K. R. Sasikala, M. Petrou, and J. Kittler. Fuzzy classification with a
GIS
as an aid to decision making. EARSeL
Advances in remote sensing, 4(4):97–105, November 1996.
[13] G. Shafer. A Mathematical Theory of Evidence. Princeton University Press, 1976.
[14] A. J. C. Sharkey. On combining artificial neural nets. Connection Science, 8(3/4):299–314, 1996.
12
[15] A. J. C. Sharkey. Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems. Springer-Verlag,
1998.
[16] A. Stassopoulou, M. Petrou, and J. Kittler. Application of a Bayesian network in a
GIS
based decision making
system. Int. J. Geographical Information Science, 12(1):23–45, 1998.
[17] M. Taniguchi and V. Tresp. Averaging regularized estimation. Neural Computation, 9:1163–1178, 1997.
[18] A. Verikas, A. Lipnickas, K. Malmqvist, M.Bacauskiene, and A. Gelzinis. Soft combination of neural classifiers:
A comparative study. Pattern Recognition Letters, 20(4):429–444, 1999.
[19] L. Xu, A. Krzyzak, and C. Y. Suen. Methods of combining multiple classifiers and their applications to handwriting recognition. IEEE Transactions on Systems, Man and Cybernetics, 22(3):418–435, 1992.
13