A mixture of experts network structure for modelling Doppler

Computers in Biology and Medicine 35 (2005) 565 – 582
http://www.intl.elsevierhealth.com/journals/cobm
A mixture of experts network structure for modelling Doppler
ultrasound blood *ow signals
Inan G-uler∗ , Elif Derya Ubeyl2̇
Department of Electronics and Computer Education, Faculty of Technical Education, Gazi University,
06500 Teknikokullar, Ankara, Turkey
Received 12 January 2004; accepted 13 April 2004
Abstract
Mixture of experts (ME) is a modular neural network architecture for supervised learning. This paper illustrates the use of ME network structure to guide modelling Doppler ultrasound blood *ow signals. Expectation
–Maximization (EM) algorithm was used for training the ME so that the learning process is decoupled in a
manner that ;ts well with the modular structure. The ophthalmic and internal carotid arterial Doppler signals
were decomposed into time–frequency representations using discrete wavelet transform and statistical features
were calculated to depict their distribution. The ME network structures were implemented for diagnosis of
ophthalmic and internal carotid arterial disorders using the statistical features as inputs. To improve diagnostic
accuracy, the outputs of expert networks were combined by a gating network simultaneously trained in order
to stochastically select the expert that is performing the best at solving the problem. The ME network structure
achieved accuracy rates which were higher than that of the stand-alone neural network models.
? 2004 Elsevier Ltd. All rights reserved.
Keywords: Mixture of experts; Expectation–Maximization algorithm; Diagnostic accuracy; Discrete wavelet transform;
Doppler signal; Ophthalmic artery; Internal carotid artery
1. Introduction
There have recently been widespread interests in the use of multiple models for pattern classi;cation and regression in statistics and neural network communities. The basic idea underlying these
methods is the application of a so-called divide-and-conquer principle that is often used to tackle
a complex problem by dividing it into simpler problems whose solutions can be combined to yield
∗
Corresponding author. Tel.: +90-312-212-3976; fax: +90-312-212-0059.
E-mail address: [email protected] (I. G-uler).
0010-4825/$ - see front matter ? 2004 Elsevier Ltd. All rights reserved.
doi:10.1016/j.compbiomed.2004.04.001
566
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
a ;nal solution. Utilizing this principle, Jacobs et al. [1] proposed a modular neural network architecture called mixture of experts (ME). The ME models the conditional probability density of the
target output by mixing the outputs from a set of local experts, each of which separately derives
a conditional probability density of the target output. The ME weights the input space by using
the posterior probabilities that expert networks generated for getting the output from the input. The
outputs of expert networks are combined by a gating network simultaneously trained in order to
stochastically select the expert that is performing the best at solving the problem [2,3]. As pointed
out by Jordan and Jacobs [4], the gating network performs a typical multiclass classi;cation task
[5–7].
Expectation–Maximization (EM) algorithm have been introduced to the ME architecture so that
the learning process is decoupled in a manner that ;ts well with the modular structure [2–4]. The
EM algorithm can be extended to provide an eFective training mechanism for the MEs based on
a Gaussian probability assumption. Though originally the model structure is predetermined and the
training algorithm is based on the Gaussian probability assumption for each expert model output, the
ME framework is a powerful concept that can be extended to a wide variety of applications including
medical diagnostic decision support system applications due to numerous inherent advantages such
as (i) a global model can be decomposed into a set of simple local models, from which controller
design is straightforward. Each model can represent a diFerent data source with an associated state
estimator/predictor. In this case, the ME system can be viewed as a data fusion algorithm. (ii) The
local models operate independently but provide output correlated information that can be strongly
correlated with each other, so that the overall system performance can be enhanced in terms of
reliability or fault tolerance. (iii) The global output of the ME system is derived as a convex combination of the outputs from a set of N experts, in which the overall system predictive performance
is generally superior to any of the individual experts [2–5].
Neural networks have been successfully used in a variety of medical applications [8,9]. Recent
advances in the ;eld of neural networks have made them attractive for analyzing signals. The
application of neural networks has opened a new area for solving problems not resolvable by other
signal processing techniques [10,11]. However, neural network analysis of Doppler shift signals is
a relatively new approach [12–14]. Doppler ultrasound is widely used as a noninvasive method for
the assessment of blood *ow both in the central and peripheral circulation [15,16]. It may be used
to estimate blood *ow, to image regions of blood *ow and to locate sites of arterial disease as well
as *ow characteristics and resistance of ophthalmic and internal carotid arteries [17–21].
Up to now, there is no study in the literature relating to the assessment of ME accuracy in analysis
of Doppler shift signals. In this study, experimental results on ME predictions for diagnosis of
ophthalmic arterial diseases and internal carotid arterial diseases were presented. In the con;guration
of ME for the diagnosis of ophthalmic arterial disorders, we used four local experts and a gating
network, which were in the form of multilayer perceptron neural networks (MLPNNs), since there
were four possible outcomes of the diagnosis of ophthalmic arterial conditions (healthy, ophthalmic
artery stenosis, ocular Behcet disease, uveitis disease). In the development of ME for the diagnosis
of internal carotid arterial disorders, we used three local experts and a gating network, which were
in the form of MLPNNs, because there were three possible outcomes of the diagnosis of internal
carotid arterial conditions (healthy, internal carotid artery stenosis, internal carotid artery occlusion).
We were able to achieve signi;cant improvement in accuracy by using the ME network structures
compared to the stand-alone neural networks used in our previous studies [13,14].
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
567
The outline of this study is as follows. In Section 2, we explain spectral analysis of signals
using discrete wavelet transform (DWT) in order to extract features characterizing the behavior of
the signal under study. In Section 3, we present description of neural network models including
MLPNN and ME architecture used in this study. We also explain EM algorithm used for training
the ME networks. In Section 4, we present the application results of ME networks to ophthalmic
and internal carotid arterial Doppler signals. Finally, in Section 5 we conclude the study.
2. Spectral analysis using discrete wavelet transform
The Doppler shift signal contains a wealth of information about blood *ow occurring within the
sample volume of the Doppler ultrasonography. The most complete way to display this information
is to perform spectral analysis. The wavelet transform (WT) provides very general techniques which
can be applied to many tasks in signal processing. One very important application is the ability
to compute and manipulate data in compressed parameters which are often called features [22–24].
Thus, the Doppler signal, consisting of many data points, can be compressed into a few parameters.
These parameters characterize the behavior of the Doppler signal. This feature of using a smaller
number of parameters to represent the Doppler signal is particularly important for recognition and
diagnostic purposes. The WT can be thought of as an extension of the classic Fourier transform,
except that, instead of working on a single scale (time or frequency), it works on a multi-scale
basis. This multi-scale feature of the WT allows the decomposition of a signal into a number of
scales, each scale representing a particular coarseness of the signal under study. The procedure
of multiresolution decomposition of a signal x[n] is schematically shown in Fig. 1. Each stage of
this scheme consists of two digital ;lters and two downsamplers by 2. The ;rst ;lter, g[ · ] is the
discrete mother wavelet, high-pass in nature, and the second, h[ · ] is its mirror version, low-pass
in nature. The downsampled outputs of ;rst high-pass and low-pass ;lters provide the detail, D1
and the approximation, A1 , respectively. The ;rst approximation, A1 is further decomposed and this
process is continued as shown in Fig. 1.
All wavelet transforms can be speci;ed in terms of a low-pass ;lter h, which satis;es the standard
quadrature mirror ;lter condition:
H (z)H (z −1 ) + H (−z)H (−z −1 ) = 1;
(1)
where H (z) denotes the z-transform of the ;lter h. Its complementary high-pass ;lter can be de;ned
as
G(z) = zH (−z −1 ):
(2)
A sequence of ;lters with increasing length (indexed by i) can be obtained:
i
Hi+1 (z) = H (z 2 )Hi (z);
i
Gi+1 (z) = G(z 2 )Hi (z);
i = 0; : : : ; I − 1
(3)
with the initial condition H0 (z) = 1. It is expressed as a two-scale relation in time domain
hi+1 (k) = [h]↑2i ∗ hi (k);
gi+1 (k) = [g]↑2i ∗ hi (k);
(4)
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
568
g[n]
D1
2
x[n]
g[n]
D2
2
A1
h[n]
2
h[n]
2
g[n]
2
h[n]
2
D3
A2
A3
Fig. 1. Subband decomposition of discrete wavelet transform implementation; g[n] is the high-pass ;lter, h[n] is the
low-pass ;lter.
where the subscript [ · ]↑m indicates the up-sampling by a factor of m and k is the equally sampled
discrete time.
The normalized wavelet and scale basis functions ’i; l (k), i; l (k) can be de;ned as
’i; l (k) = 2i=2 hi (k − 2i l);
i; l (k)
= 2i=2 gi (k − 2i l);
(5)
where the factor 2i=2 is an inner product normalization, i and l are the scale parameter and the
translation parameter, respectively. The DWT decomposition can be described as
a(i) (l) = x(k) ∗ ’i; l (k);
d(i) (l) = x(k) ∗
i; l (k);
(6)
where a(i) (l) and di (l) are the approximation coeNcients and the detail coeNcients at resolution i,
respectively [22–24].
3. Description of neural network models
3.1. Multilayer perceptron neural network
Arti;cial neural networks (ANNs) may be de;ned as structures comprised of densely interconnected adaptive simple processing elements (neurons) that are capable of performing massively
parallel computations for data processing and knowledge representation. ANNs can be trained to
recognize patterns and the nonlinear models developed during training allow neural networks to generalize their conclusions and to make application to patterns not previously encountered [10,11,25].
The MLPNNs are the most commonly used neural network arhitectures since they have features
such as the ability to learn and generalize, smaller training set requirements, fast operation, ease of
implementation. A MLPNN consists of (i) an input layer with neurons representing input variables
to the problem, (ii) an output layer with neurons representing the dependent variables (what is being
modeled), and (iii) one or more hidden layers containing neurons to help capture the nonlinearity
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
569
O(x)
Gating
Network
X
Expert
Network
1
Expert
Network
N
X
X
Fig. 2. The architecture of mixture of experts.
in the data. The MLPNN is a nonparametric technique for performing a wide variety of detection
and estimation tasks [10,11,25].
3.2. Mixture of experts and expectation–maximization algorithm
As illustrated in Fig. 2, the ME architecture is composed of a gating network and several expert
networks. The gating network receives the vector x as input and produces scalar outputs that are
partition of unity at each point in the input space. Each expert network produces an output vector for
an input vector. The gating network provides linear combination coeNcients as veridical probabilities
for expert networks and, therefore, the ;nal output of the ME architecture is a convex weighted sum
of all the output vectors produced by expert networks. Suppose that there are N expert networks in
the ME architecture. All the expert networks are linear with a single output nonlinearity that is also
referred to as “generalized linear”. The ith expert network produces its output oi (x) as a generalized
linear function of the input x:
oi (x) = f(Wi x);
(7)
where Wi is a weight matrix and f(·) is a ;xed continuous nonlinearity. The gating network is also
generalized linear function, and its ith output, g(x; vi ), is the multinomial logit or softmax function
of intermediate variables i :
e i
g(x; vi ) = N
;
(8)
k
k=1 e
where i = viT x and vi is a weight vector. The overall output o(x) of the ME architecture is
N
g(x; vk )ok (x):
o(x) =
(9)
k=1
The ME architecture can be given a probabilistic interpretation. For an input–output pair (x; y), the
values of g(vi ; x) are interpreted as the multinomial probabilities associated with the decision that
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
570
terminates in a regressive process that maps x to y. Once the decision has been made, resulting in
a choice of regressive process i, the output y is then chosen from a probability density P(y|x; Wi ),
where Wi denotes the set of parameters or weight matrix of the i th expert network in the model.
Therefore, the total probability of generating y from x is the mixture of the probabilities of generating
y from each component densities, where the mixing proportions are multinomial probabilities:
P(y|x; ) =
N
g(x; vk )P(y|x; Wk );
(10)
k=1
where is the set of all the parameters including both expert and gating network parameters. Moreover, the probabilistic component of the model is generally assumed to be a Gaussian distribution in
the case of regression, a Bernoulli distribution in the case of binary classi;cation, and a multinomial
distribution in the case of multiclass classi;cation [1–3].
Based on the probabilistic model in Eq. (10), learning in the ME architecture is treated as a
maximum likelihood problem. Jordan and Jacobs [4] have proposed an EM algorithm for adjusting
the parameters of the architecture. Suppose that the training set is given as = {(xt ; yt )}Tt=1 . The
EM algorithm consists of two steps. For the sth epoch, the posterior probabilities h(t)
i (i = 1; : : : ; N ),
which can be interpreted as the probabilities P(i|xt ; yt ), are computed in the E-step as
h(t)
i
g(xt ; vi(s) )P(yt |xt ; Wi(s) )
= N
k=1
g(xt ; vk(s) )P(yt |xt ; Wk(s) )
:
(11)
The M-step solves the following maximization problems:
Wi(s+1) = arg max
Wi
T
hi(t) log P(yt |xt ; Wi )
(12)
t=1
and
V
(s+1)
= arg max
V
N
T t=1 k=1
hk(t) log g(xt ; vk );
(13)
where V is the set of all the parameters in the gating network. Therefore, the EM algorithm is
summarized as
1. For each data pair (xt ; yt ), compute the posterior probabilities h(t)
i using the current values of the
parameters.
2. For each expert network i, solve a maximization problem in Eq. (12) with observations
T
{(xt ; yt )}Tt=1 and observation weights {h(t)
i }t=1 .
3. For the gating network, solve the maximization problem in Eq. (13) with observations
T
{(xt ; h(t)
k )}t=1 .
4. Iterate by using the updated parameter values.
In this framework, a number of relatively small expert networks can be used together with a gating
network designed to divide the global classi;cation task into simpler subtasks (Fig. 2) [1–4]. In the
present study, both the gating and expert networks were MLPNNs consisting of neurons arranged in
contiguous layers. This con;guration occured on the theory that MLPNN has features such as the
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
571
ability to learn and generalize, smaller training set requirements, fast operation, ease of implementation. The ME network structure proposed for the diagnosis of ophthalmic arterial disorders and
internal carotid arterial disorders were implemented by using MATLAB software package (MATLAB
version 6.5 with neural networks toolbox).
4. Experimental results
4.1. Feature extraction using discrete wavelet transform
Diagnosis of arterial diseases is feasible by analysis of spectral shape and parameters. Since *ow
in arteries is pulsatile and the moving targets have a random spatial distribution, the Doppler signal is
time-varying and random. It is known that the WT is better suited to analyzing nonstationary signals,
since it is well localized in time and frequency. The property of time and frequency localization
is known as compact support and is one of the most attractive features of the WT. The main
advantage of the WT is that it has a varying window size, being broad at low frequencies and
narrow at high frequencies, thus leading to an optimal time–frequency resolution in all frequency
ranges [22–24]. Therefore, spectral analysis of the ophthalmic and internal carotid arterial Doppler
signals was performed using the DWT as described in Section 2.
Selection of appropriate wavelet and the number of decomposition levels is very important in
analysis of signals using the WT. The number of decomposition levels is chosen based on the
dominant frequency components of the signal. The levels are chosen such that those parts of the
signal that correlate well with the frequencies required for classi;cation of the signal are retained
in the wavelet coeNcients. In the present study, since the Doppler signals do not have any useful
frequency components below 40 Hz, the number of decomposition levels was chosen to be 7. Thus,
the ophthalmic and internal carotid arterial Doppler signals were decomposed into details D1 –D7 and
one ;nal approximation, A7 . The ranges of various frequency bands are given in Table 1. Usually,
tests are performed with diFerent types of wavelets and the one which gives maximum eNciency
is selected for the particular application. The smoothing feature of the Daubechies wavelet of order
1 (db1) made it more suitable to detect changes of arterial Doppler signals. Therefore, the wavelet
coeNcients were computed using the db1 in the present study. In order to investigate the eFect of
other wavelets on classi;cations accuracy, tests were carried out using other wavelets also. Apart
from db1, Symmlet of order 10 (sym10), Coi*et of order 4 (coif4), and Daubechies of order 8 (db8)
were also tried. It was seen that the Daubechies wavelet oFers better accuracy than the others, and
db1 is marginally better than db8. The discrete wavelet coeNcients were computed using MATLAB
software package.
Feature selection is an important component of designing the neural network based on pattern
classi;cation since even the best classi;er will perform poorly if the features used as inputs are
not selected well. The computed discrete wavelet coeNcients provide a compact representation that
shows the energy distribution of the signal in time and frequency. Therefore, the computed detail
wavelet coeNcients of the ophthalmic and internal carotid arterial Doppler signals of each subject
were used as the feature vectors representing the signals. It was observed that the values of the
coeNcients are very close to zero in A7 . So the coeNcients corresponding to the frequency band,
A7 were discarded, thus reducing the number of feature vectors representing the signal. In order
572
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
Table 1
Ranges of frequency bands in wavelet decomposition
Decomposed signal
Frequency range (Hz)
D1
D2
D3
D4
D5
D6
D7
A7
2500–5000
1250–2500
625–1250
312.5–625
156.25–312.5
78.13–156.25
39.07–78.13
0–39.07
to further reduce the dimensionality of the extracted feature vectors, statistics over the set of the
wavelet coeNcients was used. The following statistical features were used to represent the time–
frequency distribution of the Doppler signals:
1.
2.
3.
4.
5.
6.
mean of the absolute values of the coeNcients in each subband;
maximum of the absolute values of the coeNcients in each subband;
average power of the wavelet coeNcients in each subband;
standard deviation of the coeNcients in each subband;
ratio of the absolute mean values of adjacent subbands;
distribution distortion of the coeNcients in each subband.
Features 1–3 represent the frequency distribution of the signal and the features 4–6 the amount of
changes in frequency distribution. These feature vectors, calculated for the frequency bands D1 –
D7 , were used for classi;cation of the ophthalmic and internal carotid arterial Doppler signals. In
some applications, in order to further reduce the dimensionality of the extracted feature vectors, only
some of the statistical features given in this section can be used to represent the time–frequency
distribution of the signal under study. However, in our applications all of the statistical features
(6 statistical features) were used to represent the ophthalmic and internal carotid arterial Doppler
signals.
4.2. Application of mixture of experts to ophthalmic arterial Doppler signals
The ophthalmic arterial Doppler signals were obtained from 214 subjects. The group consisted of
103 females and 111 males with ages ranging from 19 to 65 years and a mean age of 33.5 years
(standard deviation-SD 9.8). Diasonics Synergy color Doppler ultrasonography was used during
examinations and sonograms were taken into consideration. According to the examination results,
52 of 214 subjects suFered from ophthalmic artery stenosis, 54 of them suFered from ocular Behcet
disease, 45 of them suFered from uveitis disease, and the rest were healthy subjects (control group)
who had no ocular or systemic disease. The group suFering from ophthalmic artery stenosis consisted
of 25 females and 27 males with a mean age 35.5 years (SD 8.3, range 23–65), the group suFering
from ocular Behcet disease consisted of 25 females and 29 males with a mean age 35.5 years (SD
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
573
8.5, range 21–63), the group suFering from uveitis disease consisted of 22 females and 23 males
with a mean age 34.5 years (SD 8.1, range 22–62), and the healthy subjects were 31 females and
32 males with a mean age 30.0 years (SD 9.3, range 19–64).
Ophthalmic artery examinations were performed with a Doppler unit using a 10 MHz ultrasonic
transducer. The measurement system consisted of ;ve units. These were 10 MHz ultrasonic transducer, analog Doppler unit (Diasonics Synergy), recorder (Sony), analog/digital interface board
(Sound Blaster Pro-16 bit), and a personal computer with a printer. The ultrasonic transducer was
applied on a horizontal plane to the closed eyelids using sterile methylcellulose as a coupling gel.
Care was taken not to apply pressure to the eye in order to avoid artifacts. The probe was most often
placed at an angle of 60◦ from the midline pointing towards the orbital apex. Good and consistent
signals were obtained at 37–42 mm depth. The ophthalmic arterial Doppler signals were sampled in
5 kHz and framed by equal time intervals. The frame length was chosen as 256.
The ME architecture used for the diagnosis of ophthalmic arterial disorders is shown in Fig. 2.
Since we investigated four-group classi;cation exclusively, the ME was con;gured with four local
experts and a gating network which were in the form of MLPNNs. ANN architectures are derived by
trial and error and the complexity of the neural network is characterized by the number of hidden
layers. There is no general rule for selection of appropriate number of hidden layers. A neural
network with a small number of neurons may not be suNciently powerful to model a complex
function. On the other hand, a neural network with too many neurons may lead to over;tting the
training sets and lose its ability to generalize which is the main desired characteristic of a neural
network. The most popular approach to ;nding the optimal number of hidden layers is by trial and
error. Our architecture studies con;rmed that for the ophthalmic arterial Doppler signals, a minimal
network has better generalization properties and results in higher classi;cation accuracy. For this
data, MLPNNs with one hidden layer were superior to models with two hidden layers. The most
suitable network con;guration found was 10 neurons for the hidden layer and the number of output
was 4. Samples with target outputs healthy, ophthalmic artery stenosis, ocular Behcet disease, and
uveitis disease were given the binary target values of (0; 0; 0; 1), (0; 0; 1; 0), (0; 1; 0; 0), and (1; 0; 0; 0),
respectively.
The adequate functioning of neural networks depends on the sizes of the training set and test set.
In the ME, 80 of 214 subjects were used for training and the rest for testing. A practical way to
;nd a point of better generalization is to use a small percentage (around 20%) of the training set
for cross validation. For obtaining a better network generalization 16 training subjects were selected
randomly to be used as a cross validation set. The training set consisted of 20 subjects suFering from
ophthalmic artery stenosis, 20 subjects suFering from ocular Behcet disease, 20 subjects suFering
from uveitis disease, and 20 healthy subjects. The testing set consisted of 32 subjects suFering from
ophthalmic artery stenosis, 34 subjects suFering from ocular Behcet disease, 25 subjects suFering
from uveitis disease, and 43 healthy subjects. The cross validation set consisted of 4 subjects suFering
from ophthalmic artery stenosis, 4 subjects suFering from ocular Behcet disease, 4 subjects suFering
from uveitis disease, and 4 healthy subjects.
The computed discrete wavelet coeNcients were used as the inputs of the MLPNNs employed in
the architecture of ME. In order to extract features, the wavelet coeNcients corresponding to the
D1 –D7 frequency bands of the ophthalmic arterial Doppler signals were computed. For each ophthalmic arterial Doppler signal frame (256 samples), the detail wavelet coeNcients (dk ; k = 1; 2; 3; 4;
5; 6; 7) at the ;rst, second, third, fourth, ;fth, sixth and seventh levels (128 + 64 + 32 + 16 +
574
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
50
15
Detail wavelet coefficients
Detail wavelet coefficients
40
30
20
10
0
-10
-20
5
0
-5
-10
-15
-30
-40
-20
0
(a)
20
40
60
80
100
120
140
0
(b)
Number of detail wavelet coefficients
3
8
2
6
4
2
0
-2
-4
-6
40
60
80
100
120
140
1
0
-1
-2
-3
-4
-5
-8
-10
-6
0
(c)
20
Number of detail wavelet coefficients
10
Detail wavelet coefficients
Detail wavelet coefficients
10
20
40
60
80
100
120
Number of detail wavelet coefficients
140
0
(d)
20
40
60
80
100
120
140
Number of detail wavelet coefficients
Fig. 3. The detail wavelet coeNcients corresponding to the D1 frequency band of the ophthalmic arterial Doppler signals
recorded from: (a) 35-year-old healthy subject (subject no: 12), (b) 29-year-old subject suFering from ophthalmic artery
stenosis (subject no: 17), (c) 37-year-old subject suFering from ocular Behcet disease (subject no: 35), (d) 34-year-old
subject suFering from uveitis disease (subject no: 41).
8 + 4 + 2 coeNcients) were computed. Then 254 detail wavelet coeNcients were obtained for each
ophthalmic arterial Doppler signal frame. In order to reduce the dimensionality of the extracted feature vectors, statistics explained in Section 4.1 over the set of the wavelet coeNcients was used. Then
the MLPNNs had 41 inputs, equal to the number of input feature vectors. The detail wavelet coef;cients corresponding to the D1 frequency band of the ophthalmic arterial Doppler signals obtained
from 35-year-old healthy subject (subject no: 12), 29-year-old subject suFering from ophthalmic
artery stenosis (subject no: 17), 37-year-old subject suFering from ocular Behcet disease (subject
no: 35), and 34-year-old subject suFering from uveitis disease (subject no: 41) are given in Figs.
3(a)–(d), respectively. It can be noted that the detail wavelet coeNcients of the ophthalmic arterial
Doppler signals obtained from a healthy subject (Fig. 3(a)) and subjects suFering from ophthalmic
arterial diseases (Figs. 3(b)–(d)) are diFerent from each other.
The training holds the key to an accurate solution, so the criterion to stop training must be very
well described. When the network is trained too much, the network memorizes the training patterns
and does not generalize well. Cross validation is a highly recommended criterion for stopping the
training of a network. When the error in the cross validation increases, the training should be
stopped because the point of best generalization has been reached. Training of the ME was done
in 400 epochs since the cross validation errors began to rise at 400 epochs. Owing to the values
of MSE converged to small constants approximately zero in 400 epochs, training of the ME was
determined to be successful. However, in our previous study [13] the stand-alone MLPNN trained
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
575
with the least-mean squares backpropagation algorithm had a slow convergence and MSE converged
to a small constant of approximately zero in 3000 epochs. The backpropagation algorithm searched
global optimal solution for the classi;cation problem so that the number of epochs required for
convergence increased. However, in the ME the classi;cation problem was divided into simpler
problems and then each solution was combined. In addition to this, the training algorithm of the ME
is a general technique for maximum likelihood estimation that ;ts well with the modular structure
and enables a signi;cant speed up over the backpropagation algorithm. Thus, the convergence rate
of ME presented in this study was found to be higher than that of the stand-alone MLPNN used in
the previous study [13].
In classi;cation, the aim is to assign the input patterns to one of several classes, usually represented
by outputs restricted to lie in the range from 0 to 1, so that they represent the probability of class
membership. While the classi;cation is carried out, a speci;c pattern is assigned to a speci;c class
according to the characteristic features selected for it. In this application, there were four classes:
healthy, ophthalmic artery stenosis, ocular Behcet disease, uveitis disease. Classi;cation results of the
ME were displayed by a confusion matrix. The confusion matrix showing the classi;cation results
of the ME is given below.
Confusion matrix
Output/desired
Result
(healthy)
Result
(ophthalmic
artery stenosis)
Result
(ocular Behcet
disease)
Result (uveitis disease)
Result
(healthy)
Result
(ophthalmic artery stenosis)
Result
(ocular Behcet disease)
Result
(uveitis disease)
42
0
0
0
1
31
1
0
0
1
32
1
0
0
1
24
According to the confusion matrix, one healthy subject was classi;ed incorrectly by the ME as
a subject suFering from ophthalmic artery stenosis, one subject suFering from ophthalmic artery
stenosis was classi;ed as a subject suFering from ocular Behcet disease, one subject suFering from
ocular Behcet disease was classi;ed as a subject suFering from ophthalmic artery stenosis, one subject
suFering from ocular Behcet disease was classi;ed as a subject suFering from uveitis disease, and
one subject suFering from uveitis disease was classi;ed as a subject suFering from ocular Behcet
disease.
The test performance of the ME was determined by the computation of the following statistical
parameters:
Speci9city: number of correct classi;ed healthy subjects/number of total healthy subjects.
Sensitivity (ophthalmic artery stenosis): number of correct classi;ed subjects suFering from
ophthalmic artery stenosis/number of total subjects suFering from ophthalmic artery stenosis.
576
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
Table 2
The values of statistical parameters of the ME used for the diagnosis of ophthalmic arterial disorders
Statistical parameters
Values
Speci;city
Sensitivity (ophthalmic artery stenosis)
Sensitivity (ocular Behcet disease)
Sensitivity (uveitis disease)
Total classi;cation accuracy
97.67%
96.88%
94.12%
96.00%
96.27%
Sensitivity (ocular Behcet disease): number of correct classi;ed subjects suFering from ocular
Behcet disease/number of total subjects suFering from ocular Behcet disease.
Sensitivity (uveitis disease): number of correct classi;ed subjects suFering from uveitis
disease/number of total subjects suFering from uveitis disease.
Total classi9cation accuracy: number of correct classi;ed subjects/number of total subjects.
The values of these statistical parameters are given in Table 2. As it is seen from Table 2, the
ME classi;ed healthy subjects, subjects suFering from ophthalmic artery stenosis, subjects suFering
from ocular Behcet disease and subjects suFering from uveitis disease with the accuracy of 97.67%,
96.88%, 94.12% and 96.00%, respectively. The healthy subjects, subjects suFering from ophthalmic
artery stenosis, subjects suFering from ocular Behcet disease and subjects suFering from uveitis
disease were classi;ed with the accuracy of 96.27%. The correct classi;cation rates of the stand-alone
MLPNN presented in our previous study [13] were 90.63% for healthy subjects and 88.89% for
subjects having ophthalmic artery stenosis. Thus, the accuracy rates of the ME network structure
presented for this application were found to be higher than that of the stand-alone MLPNN used in
the previous study [13].
The performance of a test can be evaluated by plotting a ROC curve for the test. For a given
result obtained by a classi;er system, four possible alternatives exist that describe the nature of the
result: (i) true positive (TP), (ii) false positive (FP), (iii) true negative (TN), and (iv) false negative
(FN) [26]. In this study, a TP decision occured when the positive detection of the ME coincided
with a positive detection of the physician. A FP decision occured when the ME made a positive
detection that did not agree with the physician. A TN decision occured when both the ME and the
physician suggested the absence of a positive detection. A FN decision occured when the ME made
a negative detection that did not agree with the physician. A good test is one for which sensitivity
rises rapidly and 1-speci;city hardly increases at all until sensitivity becomes high. ROC curves
which are shown in Fig. 4 represent performances of the stand-alone MLPNN and ME network
structure on the ophthalmic arterial Doppler signals test ;le. Fig. 4 shows that the performance of
the ME is higher than that of the stand-alone MLPNN.
4.3. Application of mixture of experts to internal carotid arterial Doppler signals
The internal carotid arterial Doppler signals were obtained from 160 subjects. The group consisted
of 78 females and 82 males with ages ranging from 18 to 67 years and a mean age of 32.0 years
(SD 9.6). Toshiba 140A color Doppler ultrasonography was used during examinations and sonograms
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
577
1
0.9
0.8
Sensitivity
0.7
0.6
ME
MLPNN
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1-Specificity
Fig. 4. ROC curves of the stand-alone MLPNN and ME network structure used for the diagnosis of ophthalmic arterial
disorders.
were taken into consideration. According to the examination results, 59 of 160 subjects suFered from
internal carotid artery stenosis, 53 of them suFered from internal carotid artery occlusion, and the
rest were healthy subjects (control group) who had no arterial disease. The group suFering from
internal carotid artery stenosis consisted of 26 females and 33 males with a mean age 33.0 years
(SD 8.6, range 21–67), the group suFering from internal carotid artery occlusion consisted of 29
females and 24 males with a mean age 32.5 years (SD 8.5, range 20–65), and the healthy subjects
were 23 females and 25 males with a mean age 31.5 years (SD 9.4, range 18–65).
Internal carotid artery examinations were performed with a Doppler unit using a 5 MHz ultrasonic
transducer. The measurement system consisted of ;ve units. These were 5 MHz ultrasonic transducer,
analog Doppler unit (Toshiba 140A color Doppler ultrasonography), recorder (Sony), analog/digital
interface board (Sound Blaster Pro-16 bit), a personal computer with a printer. The ultrasonic transducer was applied on a horizontal plane to the neck using water-soluble gel as a coupling gel. Care
was taken not to apply pressure to the neck in order to avoid artifacts. The probe was most often
placed at an angle of 60◦ towards the internal carotid artery. The internal carotid arterial Doppler
signals were sampled in 5 kHz and framed by equal time intervals. The frame length was chosen as
256.
The ME architecture used for the diagnosis of internal carotid arterial disorders is shown in Fig.
2. Since we investigated three-group classi;cation exclusively, the ME was con;gured with three
local experts and a gating network which were in the form of MLPNNs. For this data, MLPNNs
with one hidden layer were superior to models with two hidden layers. The most suitable network
con;guration found was 10 neurons for the hidden layer and the number of output was 3. Samples
with target outputs healthy, internal carotid artery stenosis, and internal carotid artery occlusion were
given the binary target values of (0; 0; 1), (0; 1; 0), and (1; 0; 0; ), respectively.
In the ME, 60 of 160 subjects were used for training and the rest for testing. For obtaining
a better network generalization 12 training subjects were selected randomly to be used as a cross
validation set. The training set consisted of 20 subjects suFering from internal carotid artery stenosis,
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
80
150
60
100
Detail wavelet coefficients
Detail wavelet coefficients
578
40
20
0
-20
-40
0
-50
-100
-150
-60
0
(a)
50
20
40
60
80
100
120
140
0
20
(b)
Number of detail wavelet coefficients
40
60
80
100
120
Number of detail wavelet coefficients
Detail wavelet coefficients
40
30
20
10
0
-10
-20
-30
-40
-50
-60
0
(c)
20
40
60
80
100
120
140
Number of detail wavelet coefficients
Fig. 5. The detail wavelet coeNcients corresponding to the D1 frequency band of the internal carotid arterial Doppler
signals recorded from: (a) 33-year-old healthy subject (subject no: 10), (b) 35-year-old subject suFering from internal
carotid artery stenosis (subject no: 23), (c) 36-year-old subject suFering from internal carotid artery occlusion (subject
no: 28).
20 subjects suFering from internal carotid artery occlusion, and 20 healthy subjects. The testing set
consisted of 39 subjects suFering from internal carotid artery stenosis, 33 subjects suFering from
internal carotid artery occlusion, and 28 healthy subjects. The cross validation set consisted of four
subjects suFering from internal carotid artery stenosis, four subjects suFering from internal carotid
artery occlusion, and four healthy subjects.
The computed discrete wavelet coeNcients were used as the inputs of the MLPNNs employed
in the architecture of ME. In order to extract features, the wavelet coeNcients corresponding to
the D1 –D7 frequency bands of the internal carotid arterial Doppler signals were computed. For
each internal carotid arterial Doppler signal frame (256 samples), the detail wavelet coeNcients
(dk ; k = 1; 2; 3; 4; 5; 6; 7) at the ;rst, second, third, fourth, ;fth, sixth and seventh levels (128 + 64 +
32 + 16 + 8 + 4 + 2 coeNcients) were computed. Then 254 detail wavelet coeNcients were obtained
for each internal carotid arterial Doppler signal frame. In order to reduce the dimensionality of the
extracted feature vectors, statistics explained in Section 4.1 over the set of the wavelet coeNcients
was used. Then the MLPNNs had 41 inputs, equal to the number of input feature vectors. The detail
wavelet coeNcients corresponding to the D1 frequency band of the internal carotid arterial Doppler
signals obtained from 33-year-old healthy subject (subject no: 10), 35-year-old subject suFering
from internal carotid artery stenosis (subject no: 23), and 36-year-old subject suFering from internal
carotid artery occlusion (subject no: 28) are given in Figs. 5(a)–(c), respectively. It can be noted
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
579
that the detail wavelet coeNcients of the internal carotid arterial Doppler signals obtained from a
healthy subject (Fig. 5(a)) and subjects suFering from internal carotid arterial diseases (Figs. 5(b)
and (c)) are diFerent from each other.
Training of the ME was done in 300 epochs since the cross validation errors began to rise at
300 epochs. Owing to the values of MSE converged to small constants approximately zero in 300
epochs, training of the ME was determined to be successful. However, in our previous study [14]
the stand-alone MLPNN trained with the backpropagation algorithm had a slow convergence and
MSE converged to a small constant approximately zero in 5000 epochs. Since the EM algorithm
used in training the ME enabled a signi;cant speed up, the convergence rate of ME presented in
this study was found to be higher than that of the stand-alone MLPNN used in the previous study
[14].
In this application, there were three classes: healthy, internal carotid artery stenosis, internal carotid
artery occlusion. Classi;cation results of the ME were displayed by a confusion matrix. The confusion
matrix showing the classi;cation results of the ME is given below.
Confusion matrix
Output/desired
Result
(healthy)
Result
(internal carotid
artery stenosis)
Result
(internal carotid
artery occlusion)
Result (healthy)
Result
(internal carotid
artery stenosis)
Result
(internal carotid
artery occlusion)
27
1
0
38
0
1
0
1
32
According to the confusion matrix, one healthy subject was classi;ed incorrectly by the ME as
a subject suFering from internal carotid artery stenosis, one subject suFering from internal carotid
artery stenosis was classi;ed as a subject suFering from internal carotid artery occlusion, and one
subject suFering from internal carotid artery occlusion was classi;ed as a subject suFering from
internal carotid artery stenosis.
The test performance of the ME was determined by the computation of the following statistical
parameters:
Speci9city: number of correct classi;ed healthy subjects/number of total healthy subjects.
Sensitivity (internal carotid artery stenosis): number of correct classi;ed subjects suFering from
stenosis/number of total subjects suFering from stenosis.
Sensitivity (internal carotid artery occlusion): number of correct classi;ed subjects suFering from
occlusion/number of total subjects suFering from occlusion.
Total classi9cation accuracy: number of correct classi;ed subjects/number of total subjects.
The values of these statistical parameters are given in Table 3. As it is seen from Table 3, the
ME classi;ed healthy subjects, subjects suFering from internal carotid artery stenosis, and subjects
580
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
Table 3
The values of statistical parameters of the ME used for the diagnosis of internal carotid arterial disorders
Statistical parameters
Values
Speci;city
Sensitivity (internal carotid artery stenosis)
Sensitivity (internal carotid artery occlusion)
Total classi;cation accuracy
96.43%
97.44%
96.97%
97.00%
1
0.9
0.8
0.7
Sensitivity
0.6
ME
MLPNN
0.5
0.4
0.3
0.2
0.1
0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1-Specificity
Fig. 6. ROC curves of the stand-alone MLPNN and ME network structure used for the diagnosis of internal carotid arterial
disorders.
suFering from internal carotid artery occlusion with the accuracy of 96.43%, 97.44%, 96.97%,
respectively. The healthy subjects, subjects suFering from internal carotid artery stenosis, and subjects suFering from internal carotid artery occlusion were classi;ed with the accuracy of 97.00%.
The correct classi;cation rates of the stand-alone MLPNN presented in our previous study [14] were
95.24% for healthy subjects, 91.30% for subjects having internal carotid artery stenosis, and 91.67%
for subjects having internal carotid artery occlusion. Thus, the accuracy rates of the ME presented
for this application were found to be higher than that of the stand-alone MLPNN in the previous
study [14].
ROC curves which are shown in Fig. 6 represent performances of the stand-alone MLPNN and
ME network structure on the internal carotid arterial Doppler signals test ;le. Fig. 6 shows that the
performance of the ME is higher than that of the stand-alone MLPNN.
5. Conclusion
This paper presented the use of ME network structures to improve diagnostic accuracy of
ophthalmic and internal carotid arterial disorders since the overall structure predictive performance is
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
581
generally superior to any of the individual experts. Towards achieving the diagnosis of ophthalmic
arterial conditions four local experts and a gating network, which were in the form of MLPNNs,
were used in the con;guration of ME architecture. In order to diagnose internal carotid arterial conditions, three local experts and a gating network, which were in the form of MLPNNs, were used
in the con;guration of ME architecture. EM algorithm was used for training the ME networks so
that the learning process is decoupled in a manner that ;ts well with the modular structure. The ME
used for the diagnosis of ophthalmic arterial disorders was trained, cross validated and tested with
the extracted features using DWT of the ophthalmic arterial Doppler signals obtained from healthy
subjects, subjects suFering from ophthalmic artery stenosis, subjects suFering from ocular Behcet disease, and subjects suFering from uveitis disease. The ME used for the diagnosis of internal carotid
arterial disorders was trained, cross validated and tested with the extracted features using DWT of
the internal carotid arterial Doppler signals obtained from healthy subjects, subjects suFering from
internal carotid artery stenosis, and subjects suFering from internal carotid artery occlusion. The
classi;cation results, the values of statistical parameters, and ROC curves were used for evaluating
performances of the classi;ers. The accuracy rates achieved by the ME network structures presented
for the diagnosis of ophthalmic and internal carotid arterial disorders were found to be higher than
that of the stand-alone neural network models used in the previous studies.
Acknowledgements
This study has been supported by the Scienti;c Research Project of Gazi University (Project no:
07/2003-03).
References
[1] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts, Neural Comput. 3 (1)
(1991) 79–87.
[2] K. Chen, L. Xu, H. Chi, Improved learning algorithms for mixture of experts in multiclass classi;cation, Neural
Networks 12 (9) (1999) 1229–1252.
[3] X. Hong, C.J. Harris, A mixture of experts network structure construction algorithm for modelling and control, Appl.
Intell. 16 (1) (2002) 59–69.
[4] M.I. Jordan, R.A. Jacobs, Hierarchical mixture of experts and the EM algorithm, Neural Comput. 6 (2) (1994)
181–214.
[5] P. Mangiameli, D. West, An improved neural classi;cation network for the two-group problem, Comput. Oper. Res.
26 (5) (1999) 443–460.
[6] Y.H. Hu, S. Palreddy, W.J. Tompkins, A patient-adaptable ECG beat classi;er using a mixture of experts approach,
IEEE Trans. Biomed. Eng. 44 (9) (1997) 891–900.
[7] G. Viardot, R. Lengelle, C. Richard, Mixture of experts for automated detection of phasic arousals in sleep signals,
IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, Hammamet, Tunisia, 2002, pp. 551–555.
[8] W.G. Baxt, Use of an arti;cial neural network for data analysis in clinical decision making: the diagnosis of acute
coronary occlusion, Neural Comput. 2 (1990) 480–489.
[9] A.S. Miller, B.H. Blott, T.K. Hames, Review of neural network applications in medical imaging and signal processing,
Med. Biol. Eng. Comput. 30 (1992) 449–464.
[10] I.A. Basheer, M. Hajmeer, Arti;cial neural networks: fundamentals, computing, design, and application, J. Microbiol.
Methods 43 (1) (2000) 3–31.
582
'
I. G'uler, E.D. Ubeyl˙
* / Computers in Biology and Medicine 35 (2005) 565 – 582
[11] B.B. Chaudhuri, U. Bhattacharya, ENcient training and improved performance of multilayer perceptron in pattern
classi;cation, Neurocomputing 34 (2000) 11–27.
[12] I.A. Wright, N.A.J. Gough, F. Rakebrandt, M. Wahab, J.P. Woodcock, Neural network analysis of Doppler ultrasound
blood *ow signals: a pilot study, Ultrasound Med. Biol. 23 (5) (1997) 683–690.
[13] I. G-uler, E.D. Ubeyli,
Detection of ophthalmic artery stenosis by least-mean squares backpropagation neural network,
Comput. Biol. Med. 33 (4) (2003) 333–343.
[14] E.D. Ubeyli,
İ. G-uler, Neural network analysis of internal carotid arterial Doppler signals: predictions of stenosis
and occlusion, Expert Systems Appl. 25 (1) (2003) 1–13.
[15] I. G-uler, N.F. G-uler, The electronic detail of a pulsed Doppler blood *ow measurement system, Meas. Sci. Technol.
1 (10) (1990) 1087–1092.
[16] B. Sigel, A brief history of Doppler ultrasound in the diagnosis of peripheral vascular disease, Ultrasound Med.
Biol. 24 (2) (1998) 169–176.
[17] I. G-uler, F. HardalaVc, E.D. Ubeyli,
Determination of Behcet disease with the application of FFT and AR methods,
Comput. Biol. Med. 32 (6) (2002) 419–434.
[18] I. G-uler, E.D. Ubeyli,
Application of classical and model-based spectral methods to ophthalmic arterial Doppler
signals with uveitis disease, Comput. Biol. Med. 33 (6) (2003) 455–471.
[19] E.D. Ubeyli,
İ. G-uler, Spectral broadening of ophthalmic arterial Doppler signals using STFT and wavelet transform,
Comput. Biol. Med. 34 (4) (2004) 345–354.
[20] E.D. Ubeyli,
İ. G-uler, Comparison of eigenvector methods with classical and model-based methods in analysis of
internal carotid arterial Doppler signals, Comput. Biol. Med. 33 (6) (2003) 473–493.
[21] E.D. Ubeyli,
İ. G-uler, Spectral analysis of internal carotid arterial Doppler signals using FFT, AR, MA, and ARMA
methods, Comput. Biol. Med. 34 (4) (2004) 293–306.
[22] I. Daubechies, The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inform. Theory
36 (5) (1990) 961–1005.
[23] M. Akay, Wavelet applications in medicine, IEEE Spectrum 34 (5) (1997) 50–56.
[24] Y. Zhang, Y. Wang, W. Wang, B. Liu, Doppler ultrasound signal denoising based on wavelet frames, IEEE Trans.
Ultrason. Ferroelectrics, Frequency Control 48 (3) (2001) 709–716.
[25] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994.
[26] M.H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical
medicine, Clin. Chem. 39 (4) (1993) 561–577.
İnan G&uler graduated from Erciyes University in 1981. He took his M.S. degree from Middle East Technical University
in 1985, and his Ph.D. degree from İstanbul Technical University in 1990, all in Electronic Engineering. He is a professor
at Gazi University where he is Head of Department. His interest areas include biomedical systems, biomedical signal
processing, biomedical instrumentation, electronic circuit design, neural networks, and arti;cial intelligence. He has written
more than 100 articles on biomedical engineering.
&
Elif Derya Ubeyli
graduated from CVukurova University in 1996. She took her M.S. degree in 1998, all in electronic
engineering. She took her Ph.D. degree from Gazi University, electronics and computer technology. She is a research
assistant at the Department of Electronics and Computer Education at Gazi University. Her interest areas are biomedical
signal processing, neural networks, and arti;cial intelligence.