Computers in Biology and Medicine 35 (2005) 565 – 582 http://www.intl.elsevierhealth.com/journals/cobm A mixture of experts network structure for modelling Doppler ultrasound blood *ow signals Inan G-uler∗ , Elif Derya Ubeyl2̇ Department of Electronics and Computer Education, Faculty of Technical Education, Gazi University, 06500 Teknikokullar, Ankara, Turkey Received 12 January 2004; accepted 13 April 2004 Abstract Mixture of experts (ME) is a modular neural network architecture for supervised learning. This paper illustrates the use of ME network structure to guide modelling Doppler ultrasound blood *ow signals. Expectation –Maximization (EM) algorithm was used for training the ME so that the learning process is decoupled in a manner that ;ts well with the modular structure. The ophthalmic and internal carotid arterial Doppler signals were decomposed into time–frequency representations using discrete wavelet transform and statistical features were calculated to depict their distribution. The ME network structures were implemented for diagnosis of ophthalmic and internal carotid arterial disorders using the statistical features as inputs. To improve diagnostic accuracy, the outputs of expert networks were combined by a gating network simultaneously trained in order to stochastically select the expert that is performing the best at solving the problem. The ME network structure achieved accuracy rates which were higher than that of the stand-alone neural network models. ? 2004 Elsevier Ltd. All rights reserved. Keywords: Mixture of experts; Expectation–Maximization algorithm; Diagnostic accuracy; Discrete wavelet transform; Doppler signal; Ophthalmic artery; Internal carotid artery 1. Introduction There have recently been widespread interests in the use of multiple models for pattern classi;cation and regression in statistics and neural network communities. The basic idea underlying these methods is the application of a so-called divide-and-conquer principle that is often used to tackle a complex problem by dividing it into simpler problems whose solutions can be combined to yield ∗ Corresponding author. Tel.: +90-312-212-3976; fax: +90-312-212-0059. E-mail address: [email protected] (I. G-uler). 0010-4825/$ - see front matter ? 2004 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiomed.2004.04.001 566 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 a ;nal solution. Utilizing this principle, Jacobs et al. [1] proposed a modular neural network architecture called mixture of experts (ME). The ME models the conditional probability density of the target output by mixing the outputs from a set of local experts, each of which separately derives a conditional probability density of the target output. The ME weights the input space by using the posterior probabilities that expert networks generated for getting the output from the input. The outputs of expert networks are combined by a gating network simultaneously trained in order to stochastically select the expert that is performing the best at solving the problem [2,3]. As pointed out by Jordan and Jacobs [4], the gating network performs a typical multiclass classi;cation task [5–7]. Expectation–Maximization (EM) algorithm have been introduced to the ME architecture so that the learning process is decoupled in a manner that ;ts well with the modular structure [2–4]. The EM algorithm can be extended to provide an eFective training mechanism for the MEs based on a Gaussian probability assumption. Though originally the model structure is predetermined and the training algorithm is based on the Gaussian probability assumption for each expert model output, the ME framework is a powerful concept that can be extended to a wide variety of applications including medical diagnostic decision support system applications due to numerous inherent advantages such as (i) a global model can be decomposed into a set of simple local models, from which controller design is straightforward. Each model can represent a diFerent data source with an associated state estimator/predictor. In this case, the ME system can be viewed as a data fusion algorithm. (ii) The local models operate independently but provide output correlated information that can be strongly correlated with each other, so that the overall system performance can be enhanced in terms of reliability or fault tolerance. (iii) The global output of the ME system is derived as a convex combination of the outputs from a set of N experts, in which the overall system predictive performance is generally superior to any of the individual experts [2–5]. Neural networks have been successfully used in a variety of medical applications [8,9]. Recent advances in the ;eld of neural networks have made them attractive for analyzing signals. The application of neural networks has opened a new area for solving problems not resolvable by other signal processing techniques [10,11]. However, neural network analysis of Doppler shift signals is a relatively new approach [12–14]. Doppler ultrasound is widely used as a noninvasive method for the assessment of blood *ow both in the central and peripheral circulation [15,16]. It may be used to estimate blood *ow, to image regions of blood *ow and to locate sites of arterial disease as well as *ow characteristics and resistance of ophthalmic and internal carotid arteries [17–21]. Up to now, there is no study in the literature relating to the assessment of ME accuracy in analysis of Doppler shift signals. In this study, experimental results on ME predictions for diagnosis of ophthalmic arterial diseases and internal carotid arterial diseases were presented. In the con;guration of ME for the diagnosis of ophthalmic arterial disorders, we used four local experts and a gating network, which were in the form of multilayer perceptron neural networks (MLPNNs), since there were four possible outcomes of the diagnosis of ophthalmic arterial conditions (healthy, ophthalmic artery stenosis, ocular Behcet disease, uveitis disease). In the development of ME for the diagnosis of internal carotid arterial disorders, we used three local experts and a gating network, which were in the form of MLPNNs, because there were three possible outcomes of the diagnosis of internal carotid arterial conditions (healthy, internal carotid artery stenosis, internal carotid artery occlusion). We were able to achieve signi;cant improvement in accuracy by using the ME network structures compared to the stand-alone neural networks used in our previous studies [13,14]. ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 567 The outline of this study is as follows. In Section 2, we explain spectral analysis of signals using discrete wavelet transform (DWT) in order to extract features characterizing the behavior of the signal under study. In Section 3, we present description of neural network models including MLPNN and ME architecture used in this study. We also explain EM algorithm used for training the ME networks. In Section 4, we present the application results of ME networks to ophthalmic and internal carotid arterial Doppler signals. Finally, in Section 5 we conclude the study. 2. Spectral analysis using discrete wavelet transform The Doppler shift signal contains a wealth of information about blood *ow occurring within the sample volume of the Doppler ultrasonography. The most complete way to display this information is to perform spectral analysis. The wavelet transform (WT) provides very general techniques which can be applied to many tasks in signal processing. One very important application is the ability to compute and manipulate data in compressed parameters which are often called features [22–24]. Thus, the Doppler signal, consisting of many data points, can be compressed into a few parameters. These parameters characterize the behavior of the Doppler signal. This feature of using a smaller number of parameters to represent the Doppler signal is particularly important for recognition and diagnostic purposes. The WT can be thought of as an extension of the classic Fourier transform, except that, instead of working on a single scale (time or frequency), it works on a multi-scale basis. This multi-scale feature of the WT allows the decomposition of a signal into a number of scales, each scale representing a particular coarseness of the signal under study. The procedure of multiresolution decomposition of a signal x[n] is schematically shown in Fig. 1. Each stage of this scheme consists of two digital ;lters and two downsamplers by 2. The ;rst ;lter, g[ · ] is the discrete mother wavelet, high-pass in nature, and the second, h[ · ] is its mirror version, low-pass in nature. The downsampled outputs of ;rst high-pass and low-pass ;lters provide the detail, D1 and the approximation, A1 , respectively. The ;rst approximation, A1 is further decomposed and this process is continued as shown in Fig. 1. All wavelet transforms can be speci;ed in terms of a low-pass ;lter h, which satis;es the standard quadrature mirror ;lter condition: H (z)H (z −1 ) + H (−z)H (−z −1 ) = 1; (1) where H (z) denotes the z-transform of the ;lter h. Its complementary high-pass ;lter can be de;ned as G(z) = zH (−z −1 ): (2) A sequence of ;lters with increasing length (indexed by i) can be obtained: i Hi+1 (z) = H (z 2 )Hi (z); i Gi+1 (z) = G(z 2 )Hi (z); i = 0; : : : ; I − 1 (3) with the initial condition H0 (z) = 1. It is expressed as a two-scale relation in time domain hi+1 (k) = [h]↑2i ∗ hi (k); gi+1 (k) = [g]↑2i ∗ hi (k); (4) ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 568 g[n] D1 2 x[n] g[n] D2 2 A1 h[n] 2 h[n] 2 g[n] 2 h[n] 2 D3 A2 A3 Fig. 1. Subband decomposition of discrete wavelet transform implementation; g[n] is the high-pass ;lter, h[n] is the low-pass ;lter. where the subscript [ · ]↑m indicates the up-sampling by a factor of m and k is the equally sampled discrete time. The normalized wavelet and scale basis functions ’i; l (k), i; l (k) can be de;ned as ’i; l (k) = 2i=2 hi (k − 2i l); i; l (k) = 2i=2 gi (k − 2i l); (5) where the factor 2i=2 is an inner product normalization, i and l are the scale parameter and the translation parameter, respectively. The DWT decomposition can be described as a(i) (l) = x(k) ∗ ’i; l (k); d(i) (l) = x(k) ∗ i; l (k); (6) where a(i) (l) and di (l) are the approximation coeNcients and the detail coeNcients at resolution i, respectively [22–24]. 3. Description of neural network models 3.1. Multilayer perceptron neural network Arti;cial neural networks (ANNs) may be de;ned as structures comprised of densely interconnected adaptive simple processing elements (neurons) that are capable of performing massively parallel computations for data processing and knowledge representation. ANNs can be trained to recognize patterns and the nonlinear models developed during training allow neural networks to generalize their conclusions and to make application to patterns not previously encountered [10,11,25]. The MLPNNs are the most commonly used neural network arhitectures since they have features such as the ability to learn and generalize, smaller training set requirements, fast operation, ease of implementation. A MLPNN consists of (i) an input layer with neurons representing input variables to the problem, (ii) an output layer with neurons representing the dependent variables (what is being modeled), and (iii) one or more hidden layers containing neurons to help capture the nonlinearity ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 569 O(x) Gating Network X Expert Network 1 Expert Network N X X Fig. 2. The architecture of mixture of experts. in the data. The MLPNN is a nonparametric technique for performing a wide variety of detection and estimation tasks [10,11,25]. 3.2. Mixture of experts and expectation–maximization algorithm As illustrated in Fig. 2, the ME architecture is composed of a gating network and several expert networks. The gating network receives the vector x as input and produces scalar outputs that are partition of unity at each point in the input space. Each expert network produces an output vector for an input vector. The gating network provides linear combination coeNcients as veridical probabilities for expert networks and, therefore, the ;nal output of the ME architecture is a convex weighted sum of all the output vectors produced by expert networks. Suppose that there are N expert networks in the ME architecture. All the expert networks are linear with a single output nonlinearity that is also referred to as “generalized linear”. The ith expert network produces its output oi (x) as a generalized linear function of the input x: oi (x) = f(Wi x); (7) where Wi is a weight matrix and f(·) is a ;xed continuous nonlinearity. The gating network is also generalized linear function, and its ith output, g(x; vi ), is the multinomial logit or softmax function of intermediate variables i : e i g(x; vi ) = N ; (8) k k=1 e where i = viT x and vi is a weight vector. The overall output o(x) of the ME architecture is N g(x; vk )ok (x): o(x) = (9) k=1 The ME architecture can be given a probabilistic interpretation. For an input–output pair (x; y), the values of g(vi ; x) are interpreted as the multinomial probabilities associated with the decision that ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 570 terminates in a regressive process that maps x to y. Once the decision has been made, resulting in a choice of regressive process i, the output y is then chosen from a probability density P(y|x; Wi ), where Wi denotes the set of parameters or weight matrix of the i th expert network in the model. Therefore, the total probability of generating y from x is the mixture of the probabilities of generating y from each component densities, where the mixing proportions are multinomial probabilities: P(y|x; ) = N g(x; vk )P(y|x; Wk ); (10) k=1 where is the set of all the parameters including both expert and gating network parameters. Moreover, the probabilistic component of the model is generally assumed to be a Gaussian distribution in the case of regression, a Bernoulli distribution in the case of binary classi;cation, and a multinomial distribution in the case of multiclass classi;cation [1–3]. Based on the probabilistic model in Eq. (10), learning in the ME architecture is treated as a maximum likelihood problem. Jordan and Jacobs [4] have proposed an EM algorithm for adjusting the parameters of the architecture. Suppose that the training set is given as = {(xt ; yt )}Tt=1 . The EM algorithm consists of two steps. For the sth epoch, the posterior probabilities h(t) i (i = 1; : : : ; N ), which can be interpreted as the probabilities P(i|xt ; yt ), are computed in the E-step as h(t) i g(xt ; vi(s) )P(yt |xt ; Wi(s) ) = N k=1 g(xt ; vk(s) )P(yt |xt ; Wk(s) ) : (11) The M-step solves the following maximization problems: Wi(s+1) = arg max Wi T hi(t) log P(yt |xt ; Wi ) (12) t=1 and V (s+1) = arg max V N T t=1 k=1 hk(t) log g(xt ; vk ); (13) where V is the set of all the parameters in the gating network. Therefore, the EM algorithm is summarized as 1. For each data pair (xt ; yt ), compute the posterior probabilities h(t) i using the current values of the parameters. 2. For each expert network i, solve a maximization problem in Eq. (12) with observations T {(xt ; yt )}Tt=1 and observation weights {h(t) i }t=1 . 3. For the gating network, solve the maximization problem in Eq. (13) with observations T {(xt ; h(t) k )}t=1 . 4. Iterate by using the updated parameter values. In this framework, a number of relatively small expert networks can be used together with a gating network designed to divide the global classi;cation task into simpler subtasks (Fig. 2) [1–4]. In the present study, both the gating and expert networks were MLPNNs consisting of neurons arranged in contiguous layers. This con;guration occured on the theory that MLPNN has features such as the ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 571 ability to learn and generalize, smaller training set requirements, fast operation, ease of implementation. The ME network structure proposed for the diagnosis of ophthalmic arterial disorders and internal carotid arterial disorders were implemented by using MATLAB software package (MATLAB version 6.5 with neural networks toolbox). 4. Experimental results 4.1. Feature extraction using discrete wavelet transform Diagnosis of arterial diseases is feasible by analysis of spectral shape and parameters. Since *ow in arteries is pulsatile and the moving targets have a random spatial distribution, the Doppler signal is time-varying and random. It is known that the WT is better suited to analyzing nonstationary signals, since it is well localized in time and frequency. The property of time and frequency localization is known as compact support and is one of the most attractive features of the WT. The main advantage of the WT is that it has a varying window size, being broad at low frequencies and narrow at high frequencies, thus leading to an optimal time–frequency resolution in all frequency ranges [22–24]. Therefore, spectral analysis of the ophthalmic and internal carotid arterial Doppler signals was performed using the DWT as described in Section 2. Selection of appropriate wavelet and the number of decomposition levels is very important in analysis of signals using the WT. The number of decomposition levels is chosen based on the dominant frequency components of the signal. The levels are chosen such that those parts of the signal that correlate well with the frequencies required for classi;cation of the signal are retained in the wavelet coeNcients. In the present study, since the Doppler signals do not have any useful frequency components below 40 Hz, the number of decomposition levels was chosen to be 7. Thus, the ophthalmic and internal carotid arterial Doppler signals were decomposed into details D1 –D7 and one ;nal approximation, A7 . The ranges of various frequency bands are given in Table 1. Usually, tests are performed with diFerent types of wavelets and the one which gives maximum eNciency is selected for the particular application. The smoothing feature of the Daubechies wavelet of order 1 (db1) made it more suitable to detect changes of arterial Doppler signals. Therefore, the wavelet coeNcients were computed using the db1 in the present study. In order to investigate the eFect of other wavelets on classi;cations accuracy, tests were carried out using other wavelets also. Apart from db1, Symmlet of order 10 (sym10), Coi*et of order 4 (coif4), and Daubechies of order 8 (db8) were also tried. It was seen that the Daubechies wavelet oFers better accuracy than the others, and db1 is marginally better than db8. The discrete wavelet coeNcients were computed using MATLAB software package. Feature selection is an important component of designing the neural network based on pattern classi;cation since even the best classi;er will perform poorly if the features used as inputs are not selected well. The computed discrete wavelet coeNcients provide a compact representation that shows the energy distribution of the signal in time and frequency. Therefore, the computed detail wavelet coeNcients of the ophthalmic and internal carotid arterial Doppler signals of each subject were used as the feature vectors representing the signals. It was observed that the values of the coeNcients are very close to zero in A7 . So the coeNcients corresponding to the frequency band, A7 were discarded, thus reducing the number of feature vectors representing the signal. In order 572 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 Table 1 Ranges of frequency bands in wavelet decomposition Decomposed signal Frequency range (Hz) D1 D2 D3 D4 D5 D6 D7 A7 2500–5000 1250–2500 625–1250 312.5–625 156.25–312.5 78.13–156.25 39.07–78.13 0–39.07 to further reduce the dimensionality of the extracted feature vectors, statistics over the set of the wavelet coeNcients was used. The following statistical features were used to represent the time– frequency distribution of the Doppler signals: 1. 2. 3. 4. 5. 6. mean of the absolute values of the coeNcients in each subband; maximum of the absolute values of the coeNcients in each subband; average power of the wavelet coeNcients in each subband; standard deviation of the coeNcients in each subband; ratio of the absolute mean values of adjacent subbands; distribution distortion of the coeNcients in each subband. Features 1–3 represent the frequency distribution of the signal and the features 4–6 the amount of changes in frequency distribution. These feature vectors, calculated for the frequency bands D1 – D7 , were used for classi;cation of the ophthalmic and internal carotid arterial Doppler signals. In some applications, in order to further reduce the dimensionality of the extracted feature vectors, only some of the statistical features given in this section can be used to represent the time–frequency distribution of the signal under study. However, in our applications all of the statistical features (6 statistical features) were used to represent the ophthalmic and internal carotid arterial Doppler signals. 4.2. Application of mixture of experts to ophthalmic arterial Doppler signals The ophthalmic arterial Doppler signals were obtained from 214 subjects. The group consisted of 103 females and 111 males with ages ranging from 19 to 65 years and a mean age of 33.5 years (standard deviation-SD 9.8). Diasonics Synergy color Doppler ultrasonography was used during examinations and sonograms were taken into consideration. According to the examination results, 52 of 214 subjects suFered from ophthalmic artery stenosis, 54 of them suFered from ocular Behcet disease, 45 of them suFered from uveitis disease, and the rest were healthy subjects (control group) who had no ocular or systemic disease. The group suFering from ophthalmic artery stenosis consisted of 25 females and 27 males with a mean age 35.5 years (SD 8.3, range 23–65), the group suFering from ocular Behcet disease consisted of 25 females and 29 males with a mean age 35.5 years (SD ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 573 8.5, range 21–63), the group suFering from uveitis disease consisted of 22 females and 23 males with a mean age 34.5 years (SD 8.1, range 22–62), and the healthy subjects were 31 females and 32 males with a mean age 30.0 years (SD 9.3, range 19–64). Ophthalmic artery examinations were performed with a Doppler unit using a 10 MHz ultrasonic transducer. The measurement system consisted of ;ve units. These were 10 MHz ultrasonic transducer, analog Doppler unit (Diasonics Synergy), recorder (Sony), analog/digital interface board (Sound Blaster Pro-16 bit), and a personal computer with a printer. The ultrasonic transducer was applied on a horizontal plane to the closed eyelids using sterile methylcellulose as a coupling gel. Care was taken not to apply pressure to the eye in order to avoid artifacts. The probe was most often placed at an angle of 60◦ from the midline pointing towards the orbital apex. Good and consistent signals were obtained at 37–42 mm depth. The ophthalmic arterial Doppler signals were sampled in 5 kHz and framed by equal time intervals. The frame length was chosen as 256. The ME architecture used for the diagnosis of ophthalmic arterial disorders is shown in Fig. 2. Since we investigated four-group classi;cation exclusively, the ME was con;gured with four local experts and a gating network which were in the form of MLPNNs. ANN architectures are derived by trial and error and the complexity of the neural network is characterized by the number of hidden layers. There is no general rule for selection of appropriate number of hidden layers. A neural network with a small number of neurons may not be suNciently powerful to model a complex function. On the other hand, a neural network with too many neurons may lead to over;tting the training sets and lose its ability to generalize which is the main desired characteristic of a neural network. The most popular approach to ;nding the optimal number of hidden layers is by trial and error. Our architecture studies con;rmed that for the ophthalmic arterial Doppler signals, a minimal network has better generalization properties and results in higher classi;cation accuracy. For this data, MLPNNs with one hidden layer were superior to models with two hidden layers. The most suitable network con;guration found was 10 neurons for the hidden layer and the number of output was 4. Samples with target outputs healthy, ophthalmic artery stenosis, ocular Behcet disease, and uveitis disease were given the binary target values of (0; 0; 0; 1), (0; 0; 1; 0), (0; 1; 0; 0), and (1; 0; 0; 0), respectively. The adequate functioning of neural networks depends on the sizes of the training set and test set. In the ME, 80 of 214 subjects were used for training and the rest for testing. A practical way to ;nd a point of better generalization is to use a small percentage (around 20%) of the training set for cross validation. For obtaining a better network generalization 16 training subjects were selected randomly to be used as a cross validation set. The training set consisted of 20 subjects suFering from ophthalmic artery stenosis, 20 subjects suFering from ocular Behcet disease, 20 subjects suFering from uveitis disease, and 20 healthy subjects. The testing set consisted of 32 subjects suFering from ophthalmic artery stenosis, 34 subjects suFering from ocular Behcet disease, 25 subjects suFering from uveitis disease, and 43 healthy subjects. The cross validation set consisted of 4 subjects suFering from ophthalmic artery stenosis, 4 subjects suFering from ocular Behcet disease, 4 subjects suFering from uveitis disease, and 4 healthy subjects. The computed discrete wavelet coeNcients were used as the inputs of the MLPNNs employed in the architecture of ME. In order to extract features, the wavelet coeNcients corresponding to the D1 –D7 frequency bands of the ophthalmic arterial Doppler signals were computed. For each ophthalmic arterial Doppler signal frame (256 samples), the detail wavelet coeNcients (dk ; k = 1; 2; 3; 4; 5; 6; 7) at the ;rst, second, third, fourth, ;fth, sixth and seventh levels (128 + 64 + 32 + 16 + 574 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 50 15 Detail wavelet coefficients Detail wavelet coefficients 40 30 20 10 0 -10 -20 5 0 -5 -10 -15 -30 -40 -20 0 (a) 20 40 60 80 100 120 140 0 (b) Number of detail wavelet coefficients 3 8 2 6 4 2 0 -2 -4 -6 40 60 80 100 120 140 1 0 -1 -2 -3 -4 -5 -8 -10 -6 0 (c) 20 Number of detail wavelet coefficients 10 Detail wavelet coefficients Detail wavelet coefficients 10 20 40 60 80 100 120 Number of detail wavelet coefficients 140 0 (d) 20 40 60 80 100 120 140 Number of detail wavelet coefficients Fig. 3. The detail wavelet coeNcients corresponding to the D1 frequency band of the ophthalmic arterial Doppler signals recorded from: (a) 35-year-old healthy subject (subject no: 12), (b) 29-year-old subject suFering from ophthalmic artery stenosis (subject no: 17), (c) 37-year-old subject suFering from ocular Behcet disease (subject no: 35), (d) 34-year-old subject suFering from uveitis disease (subject no: 41). 8 + 4 + 2 coeNcients) were computed. Then 254 detail wavelet coeNcients were obtained for each ophthalmic arterial Doppler signal frame. In order to reduce the dimensionality of the extracted feature vectors, statistics explained in Section 4.1 over the set of the wavelet coeNcients was used. Then the MLPNNs had 41 inputs, equal to the number of input feature vectors. The detail wavelet coef;cients corresponding to the D1 frequency band of the ophthalmic arterial Doppler signals obtained from 35-year-old healthy subject (subject no: 12), 29-year-old subject suFering from ophthalmic artery stenosis (subject no: 17), 37-year-old subject suFering from ocular Behcet disease (subject no: 35), and 34-year-old subject suFering from uveitis disease (subject no: 41) are given in Figs. 3(a)–(d), respectively. It can be noted that the detail wavelet coeNcients of the ophthalmic arterial Doppler signals obtained from a healthy subject (Fig. 3(a)) and subjects suFering from ophthalmic arterial diseases (Figs. 3(b)–(d)) are diFerent from each other. The training holds the key to an accurate solution, so the criterion to stop training must be very well described. When the network is trained too much, the network memorizes the training patterns and does not generalize well. Cross validation is a highly recommended criterion for stopping the training of a network. When the error in the cross validation increases, the training should be stopped because the point of best generalization has been reached. Training of the ME was done in 400 epochs since the cross validation errors began to rise at 400 epochs. Owing to the values of MSE converged to small constants approximately zero in 400 epochs, training of the ME was determined to be successful. However, in our previous study [13] the stand-alone MLPNN trained ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 575 with the least-mean squares backpropagation algorithm had a slow convergence and MSE converged to a small constant of approximately zero in 3000 epochs. The backpropagation algorithm searched global optimal solution for the classi;cation problem so that the number of epochs required for convergence increased. However, in the ME the classi;cation problem was divided into simpler problems and then each solution was combined. In addition to this, the training algorithm of the ME is a general technique for maximum likelihood estimation that ;ts well with the modular structure and enables a signi;cant speed up over the backpropagation algorithm. Thus, the convergence rate of ME presented in this study was found to be higher than that of the stand-alone MLPNN used in the previous study [13]. In classi;cation, the aim is to assign the input patterns to one of several classes, usually represented by outputs restricted to lie in the range from 0 to 1, so that they represent the probability of class membership. While the classi;cation is carried out, a speci;c pattern is assigned to a speci;c class according to the characteristic features selected for it. In this application, there were four classes: healthy, ophthalmic artery stenosis, ocular Behcet disease, uveitis disease. Classi;cation results of the ME were displayed by a confusion matrix. The confusion matrix showing the classi;cation results of the ME is given below. Confusion matrix Output/desired Result (healthy) Result (ophthalmic artery stenosis) Result (ocular Behcet disease) Result (uveitis disease) Result (healthy) Result (ophthalmic artery stenosis) Result (ocular Behcet disease) Result (uveitis disease) 42 0 0 0 1 31 1 0 0 1 32 1 0 0 1 24 According to the confusion matrix, one healthy subject was classi;ed incorrectly by the ME as a subject suFering from ophthalmic artery stenosis, one subject suFering from ophthalmic artery stenosis was classi;ed as a subject suFering from ocular Behcet disease, one subject suFering from ocular Behcet disease was classi;ed as a subject suFering from ophthalmic artery stenosis, one subject suFering from ocular Behcet disease was classi;ed as a subject suFering from uveitis disease, and one subject suFering from uveitis disease was classi;ed as a subject suFering from ocular Behcet disease. The test performance of the ME was determined by the computation of the following statistical parameters: Speci9city: number of correct classi;ed healthy subjects/number of total healthy subjects. Sensitivity (ophthalmic artery stenosis): number of correct classi;ed subjects suFering from ophthalmic artery stenosis/number of total subjects suFering from ophthalmic artery stenosis. 576 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 Table 2 The values of statistical parameters of the ME used for the diagnosis of ophthalmic arterial disorders Statistical parameters Values Speci;city Sensitivity (ophthalmic artery stenosis) Sensitivity (ocular Behcet disease) Sensitivity (uveitis disease) Total classi;cation accuracy 97.67% 96.88% 94.12% 96.00% 96.27% Sensitivity (ocular Behcet disease): number of correct classi;ed subjects suFering from ocular Behcet disease/number of total subjects suFering from ocular Behcet disease. Sensitivity (uveitis disease): number of correct classi;ed subjects suFering from uveitis disease/number of total subjects suFering from uveitis disease. Total classi9cation accuracy: number of correct classi;ed subjects/number of total subjects. The values of these statistical parameters are given in Table 2. As it is seen from Table 2, the ME classi;ed healthy subjects, subjects suFering from ophthalmic artery stenosis, subjects suFering from ocular Behcet disease and subjects suFering from uveitis disease with the accuracy of 97.67%, 96.88%, 94.12% and 96.00%, respectively. The healthy subjects, subjects suFering from ophthalmic artery stenosis, subjects suFering from ocular Behcet disease and subjects suFering from uveitis disease were classi;ed with the accuracy of 96.27%. The correct classi;cation rates of the stand-alone MLPNN presented in our previous study [13] were 90.63% for healthy subjects and 88.89% for subjects having ophthalmic artery stenosis. Thus, the accuracy rates of the ME network structure presented for this application were found to be higher than that of the stand-alone MLPNN used in the previous study [13]. The performance of a test can be evaluated by plotting a ROC curve for the test. For a given result obtained by a classi;er system, four possible alternatives exist that describe the nature of the result: (i) true positive (TP), (ii) false positive (FP), (iii) true negative (TN), and (iv) false negative (FN) [26]. In this study, a TP decision occured when the positive detection of the ME coincided with a positive detection of the physician. A FP decision occured when the ME made a positive detection that did not agree with the physician. A TN decision occured when both the ME and the physician suggested the absence of a positive detection. A FN decision occured when the ME made a negative detection that did not agree with the physician. A good test is one for which sensitivity rises rapidly and 1-speci;city hardly increases at all until sensitivity becomes high. ROC curves which are shown in Fig. 4 represent performances of the stand-alone MLPNN and ME network structure on the ophthalmic arterial Doppler signals test ;le. Fig. 4 shows that the performance of the ME is higher than that of the stand-alone MLPNN. 4.3. Application of mixture of experts to internal carotid arterial Doppler signals The internal carotid arterial Doppler signals were obtained from 160 subjects. The group consisted of 78 females and 82 males with ages ranging from 18 to 67 years and a mean age of 32.0 years (SD 9.6). Toshiba 140A color Doppler ultrasonography was used during examinations and sonograms ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 577 1 0.9 0.8 Sensitivity 0.7 0.6 ME MLPNN 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1-Specificity Fig. 4. ROC curves of the stand-alone MLPNN and ME network structure used for the diagnosis of ophthalmic arterial disorders. were taken into consideration. According to the examination results, 59 of 160 subjects suFered from internal carotid artery stenosis, 53 of them suFered from internal carotid artery occlusion, and the rest were healthy subjects (control group) who had no arterial disease. The group suFering from internal carotid artery stenosis consisted of 26 females and 33 males with a mean age 33.0 years (SD 8.6, range 21–67), the group suFering from internal carotid artery occlusion consisted of 29 females and 24 males with a mean age 32.5 years (SD 8.5, range 20–65), and the healthy subjects were 23 females and 25 males with a mean age 31.5 years (SD 9.4, range 18–65). Internal carotid artery examinations were performed with a Doppler unit using a 5 MHz ultrasonic transducer. The measurement system consisted of ;ve units. These were 5 MHz ultrasonic transducer, analog Doppler unit (Toshiba 140A color Doppler ultrasonography), recorder (Sony), analog/digital interface board (Sound Blaster Pro-16 bit), a personal computer with a printer. The ultrasonic transducer was applied on a horizontal plane to the neck using water-soluble gel as a coupling gel. Care was taken not to apply pressure to the neck in order to avoid artifacts. The probe was most often placed at an angle of 60◦ towards the internal carotid artery. The internal carotid arterial Doppler signals were sampled in 5 kHz and framed by equal time intervals. The frame length was chosen as 256. The ME architecture used for the diagnosis of internal carotid arterial disorders is shown in Fig. 2. Since we investigated three-group classi;cation exclusively, the ME was con;gured with three local experts and a gating network which were in the form of MLPNNs. For this data, MLPNNs with one hidden layer were superior to models with two hidden layers. The most suitable network con;guration found was 10 neurons for the hidden layer and the number of output was 3. Samples with target outputs healthy, internal carotid artery stenosis, and internal carotid artery occlusion were given the binary target values of (0; 0; 1), (0; 1; 0), and (1; 0; 0; ), respectively. In the ME, 60 of 160 subjects were used for training and the rest for testing. For obtaining a better network generalization 12 training subjects were selected randomly to be used as a cross validation set. The training set consisted of 20 subjects suFering from internal carotid artery stenosis, ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 80 150 60 100 Detail wavelet coefficients Detail wavelet coefficients 578 40 20 0 -20 -40 0 -50 -100 -150 -60 0 (a) 50 20 40 60 80 100 120 140 0 20 (b) Number of detail wavelet coefficients 40 60 80 100 120 Number of detail wavelet coefficients Detail wavelet coefficients 40 30 20 10 0 -10 -20 -30 -40 -50 -60 0 (c) 20 40 60 80 100 120 140 Number of detail wavelet coefficients Fig. 5. The detail wavelet coeNcients corresponding to the D1 frequency band of the internal carotid arterial Doppler signals recorded from: (a) 33-year-old healthy subject (subject no: 10), (b) 35-year-old subject suFering from internal carotid artery stenosis (subject no: 23), (c) 36-year-old subject suFering from internal carotid artery occlusion (subject no: 28). 20 subjects suFering from internal carotid artery occlusion, and 20 healthy subjects. The testing set consisted of 39 subjects suFering from internal carotid artery stenosis, 33 subjects suFering from internal carotid artery occlusion, and 28 healthy subjects. The cross validation set consisted of four subjects suFering from internal carotid artery stenosis, four subjects suFering from internal carotid artery occlusion, and four healthy subjects. The computed discrete wavelet coeNcients were used as the inputs of the MLPNNs employed in the architecture of ME. In order to extract features, the wavelet coeNcients corresponding to the D1 –D7 frequency bands of the internal carotid arterial Doppler signals were computed. For each internal carotid arterial Doppler signal frame (256 samples), the detail wavelet coeNcients (dk ; k = 1; 2; 3; 4; 5; 6; 7) at the ;rst, second, third, fourth, ;fth, sixth and seventh levels (128 + 64 + 32 + 16 + 8 + 4 + 2 coeNcients) were computed. Then 254 detail wavelet coeNcients were obtained for each internal carotid arterial Doppler signal frame. In order to reduce the dimensionality of the extracted feature vectors, statistics explained in Section 4.1 over the set of the wavelet coeNcients was used. Then the MLPNNs had 41 inputs, equal to the number of input feature vectors. The detail wavelet coeNcients corresponding to the D1 frequency band of the internal carotid arterial Doppler signals obtained from 33-year-old healthy subject (subject no: 10), 35-year-old subject suFering from internal carotid artery stenosis (subject no: 23), and 36-year-old subject suFering from internal carotid artery occlusion (subject no: 28) are given in Figs. 5(a)–(c), respectively. It can be noted ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 579 that the detail wavelet coeNcients of the internal carotid arterial Doppler signals obtained from a healthy subject (Fig. 5(a)) and subjects suFering from internal carotid arterial diseases (Figs. 5(b) and (c)) are diFerent from each other. Training of the ME was done in 300 epochs since the cross validation errors began to rise at 300 epochs. Owing to the values of MSE converged to small constants approximately zero in 300 epochs, training of the ME was determined to be successful. However, in our previous study [14] the stand-alone MLPNN trained with the backpropagation algorithm had a slow convergence and MSE converged to a small constant approximately zero in 5000 epochs. Since the EM algorithm used in training the ME enabled a signi;cant speed up, the convergence rate of ME presented in this study was found to be higher than that of the stand-alone MLPNN used in the previous study [14]. In this application, there were three classes: healthy, internal carotid artery stenosis, internal carotid artery occlusion. Classi;cation results of the ME were displayed by a confusion matrix. The confusion matrix showing the classi;cation results of the ME is given below. Confusion matrix Output/desired Result (healthy) Result (internal carotid artery stenosis) Result (internal carotid artery occlusion) Result (healthy) Result (internal carotid artery stenosis) Result (internal carotid artery occlusion) 27 1 0 38 0 1 0 1 32 According to the confusion matrix, one healthy subject was classi;ed incorrectly by the ME as a subject suFering from internal carotid artery stenosis, one subject suFering from internal carotid artery stenosis was classi;ed as a subject suFering from internal carotid artery occlusion, and one subject suFering from internal carotid artery occlusion was classi;ed as a subject suFering from internal carotid artery stenosis. The test performance of the ME was determined by the computation of the following statistical parameters: Speci9city: number of correct classi;ed healthy subjects/number of total healthy subjects. Sensitivity (internal carotid artery stenosis): number of correct classi;ed subjects suFering from stenosis/number of total subjects suFering from stenosis. Sensitivity (internal carotid artery occlusion): number of correct classi;ed subjects suFering from occlusion/number of total subjects suFering from occlusion. Total classi9cation accuracy: number of correct classi;ed subjects/number of total subjects. The values of these statistical parameters are given in Table 3. As it is seen from Table 3, the ME classi;ed healthy subjects, subjects suFering from internal carotid artery stenosis, and subjects 580 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 Table 3 The values of statistical parameters of the ME used for the diagnosis of internal carotid arterial disorders Statistical parameters Values Speci;city Sensitivity (internal carotid artery stenosis) Sensitivity (internal carotid artery occlusion) Total classi;cation accuracy 96.43% 97.44% 96.97% 97.00% 1 0.9 0.8 0.7 Sensitivity 0.6 ME MLPNN 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1-Specificity Fig. 6. ROC curves of the stand-alone MLPNN and ME network structure used for the diagnosis of internal carotid arterial disorders. suFering from internal carotid artery occlusion with the accuracy of 96.43%, 97.44%, 96.97%, respectively. The healthy subjects, subjects suFering from internal carotid artery stenosis, and subjects suFering from internal carotid artery occlusion were classi;ed with the accuracy of 97.00%. The correct classi;cation rates of the stand-alone MLPNN presented in our previous study [14] were 95.24% for healthy subjects, 91.30% for subjects having internal carotid artery stenosis, and 91.67% for subjects having internal carotid artery occlusion. Thus, the accuracy rates of the ME presented for this application were found to be higher than that of the stand-alone MLPNN in the previous study [14]. ROC curves which are shown in Fig. 6 represent performances of the stand-alone MLPNN and ME network structure on the internal carotid arterial Doppler signals test ;le. Fig. 6 shows that the performance of the ME is higher than that of the stand-alone MLPNN. 5. Conclusion This paper presented the use of ME network structures to improve diagnostic accuracy of ophthalmic and internal carotid arterial disorders since the overall structure predictive performance is ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 581 generally superior to any of the individual experts. Towards achieving the diagnosis of ophthalmic arterial conditions four local experts and a gating network, which were in the form of MLPNNs, were used in the con;guration of ME architecture. In order to diagnose internal carotid arterial conditions, three local experts and a gating network, which were in the form of MLPNNs, were used in the con;guration of ME architecture. EM algorithm was used for training the ME networks so that the learning process is decoupled in a manner that ;ts well with the modular structure. The ME used for the diagnosis of ophthalmic arterial disorders was trained, cross validated and tested with the extracted features using DWT of the ophthalmic arterial Doppler signals obtained from healthy subjects, subjects suFering from ophthalmic artery stenosis, subjects suFering from ocular Behcet disease, and subjects suFering from uveitis disease. The ME used for the diagnosis of internal carotid arterial disorders was trained, cross validated and tested with the extracted features using DWT of the internal carotid arterial Doppler signals obtained from healthy subjects, subjects suFering from internal carotid artery stenosis, and subjects suFering from internal carotid artery occlusion. The classi;cation results, the values of statistical parameters, and ROC curves were used for evaluating performances of the classi;ers. The accuracy rates achieved by the ME network structures presented for the diagnosis of ophthalmic and internal carotid arterial disorders were found to be higher than that of the stand-alone neural network models used in the previous studies. Acknowledgements This study has been supported by the Scienti;c Research Project of Gazi University (Project no: 07/2003-03). References [1] R.A. Jacobs, M.I. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts, Neural Comput. 3 (1) (1991) 79–87. [2] K. Chen, L. Xu, H. Chi, Improved learning algorithms for mixture of experts in multiclass classi;cation, Neural Networks 12 (9) (1999) 1229–1252. [3] X. Hong, C.J. Harris, A mixture of experts network structure construction algorithm for modelling and control, Appl. Intell. 16 (1) (2002) 59–69. [4] M.I. Jordan, R.A. Jacobs, Hierarchical mixture of experts and the EM algorithm, Neural Comput. 6 (2) (1994) 181–214. [5] P. Mangiameli, D. West, An improved neural classi;cation network for the two-group problem, Comput. Oper. Res. 26 (5) (1999) 443–460. [6] Y.H. Hu, S. Palreddy, W.J. Tompkins, A patient-adaptable ECG beat classi;er using a mixture of experts approach, IEEE Trans. Biomed. Eng. 44 (9) (1997) 891–900. [7] G. Viardot, R. Lengelle, C. Richard, Mixture of experts for automated detection of phasic arousals in sleep signals, IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, Hammamet, Tunisia, 2002, pp. 551–555. [8] W.G. Baxt, Use of an arti;cial neural network for data analysis in clinical decision making: the diagnosis of acute coronary occlusion, Neural Comput. 2 (1990) 480–489. [9] A.S. Miller, B.H. Blott, T.K. Hames, Review of neural network applications in medical imaging and signal processing, Med. Biol. Eng. Comput. 30 (1992) 449–464. [10] I.A. Basheer, M. Hajmeer, Arti;cial neural networks: fundamentals, computing, design, and application, J. Microbiol. Methods 43 (1) (2000) 3–31. 582 ' I. G'uler, E.D. Ubeyl˙ * / Computers in Biology and Medicine 35 (2005) 565 – 582 [11] B.B. Chaudhuri, U. Bhattacharya, ENcient training and improved performance of multilayer perceptron in pattern classi;cation, Neurocomputing 34 (2000) 11–27. [12] I.A. Wright, N.A.J. Gough, F. Rakebrandt, M. Wahab, J.P. Woodcock, Neural network analysis of Doppler ultrasound blood *ow signals: a pilot study, Ultrasound Med. Biol. 23 (5) (1997) 683–690. [13] I. G-uler, E.D. Ubeyli, Detection of ophthalmic artery stenosis by least-mean squares backpropagation neural network, Comput. Biol. Med. 33 (4) (2003) 333–343. [14] E.D. Ubeyli, İ. G-uler, Neural network analysis of internal carotid arterial Doppler signals: predictions of stenosis and occlusion, Expert Systems Appl. 25 (1) (2003) 1–13. [15] I. G-uler, N.F. G-uler, The electronic detail of a pulsed Doppler blood *ow measurement system, Meas. Sci. Technol. 1 (10) (1990) 1087–1092. [16] B. Sigel, A brief history of Doppler ultrasound in the diagnosis of peripheral vascular disease, Ultrasound Med. Biol. 24 (2) (1998) 169–176. [17] I. G-uler, F. HardalaVc, E.D. Ubeyli, Determination of Behcet disease with the application of FFT and AR methods, Comput. Biol. Med. 32 (6) (2002) 419–434. [18] I. G-uler, E.D. Ubeyli, Application of classical and model-based spectral methods to ophthalmic arterial Doppler signals with uveitis disease, Comput. Biol. Med. 33 (6) (2003) 455–471. [19] E.D. Ubeyli, İ. G-uler, Spectral broadening of ophthalmic arterial Doppler signals using STFT and wavelet transform, Comput. Biol. Med. 34 (4) (2004) 345–354. [20] E.D. Ubeyli, İ. G-uler, Comparison of eigenvector methods with classical and model-based methods in analysis of internal carotid arterial Doppler signals, Comput. Biol. Med. 33 (6) (2003) 473–493. [21] E.D. Ubeyli, İ. G-uler, Spectral analysis of internal carotid arterial Doppler signals using FFT, AR, MA, and ARMA methods, Comput. Biol. Med. 34 (4) (2004) 293–306. [22] I. Daubechies, The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inform. Theory 36 (5) (1990) 961–1005. [23] M. Akay, Wavelet applications in medicine, IEEE Spectrum 34 (5) (1997) 50–56. [24] Y. Zhang, Y. Wang, W. Wang, B. Liu, Doppler ultrasound signal denoising based on wavelet frames, IEEE Trans. Ultrason. Ferroelectrics, Frequency Control 48 (3) (2001) 709–716. [25] S. Haykin, Neural Networks: A Comprehensive Foundation, Macmillan, New York, 1994. [26] M.H. Zweig, G. Campbell, Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine, Clin. Chem. 39 (4) (1993) 561–577. İnan G&uler graduated from Erciyes University in 1981. He took his M.S. degree from Middle East Technical University in 1985, and his Ph.D. degree from İstanbul Technical University in 1990, all in Electronic Engineering. He is a professor at Gazi University where he is Head of Department. His interest areas include biomedical systems, biomedical signal processing, biomedical instrumentation, electronic circuit design, neural networks, and arti;cial intelligence. He has written more than 100 articles on biomedical engineering. & Elif Derya Ubeyli graduated from CVukurova University in 1996. She took her M.S. degree in 1998, all in electronic engineering. She took her Ph.D. degree from Gazi University, electronics and computer technology. She is a research assistant at the Department of Electronics and Computer Education at Gazi University. Her interest areas are biomedical signal processing, neural networks, and arti;cial intelligence.
© Copyright 2026 Paperzz