32nd Annual International Conference of the IEEE EMBS Buenos Aires, Argentina, August 31 - September 4, 2010 Application of Crisp and Fuzzy Clustering Algorithms for Identification of Hidden Patterns from Plethysmographic Observations on the Radial Pulse Sunil Karamchandani, S.N.Merchant Indian Institute of Technology Bombay, Mumbai, India, 400076 U.B.Desai Indian Institute of Technology Hyderabad, India 502205 G.D.Jindal Bhabha Atomic Research Centre Mumbai, India, 400085 [email protected] [email protected] sunilk,[email protected] Abstract— Radial Pulse forms the most basic and essential physical sign in clinical medicine. The paper proposes the application of crisp and fuzzy clustering algorithms under supervised and unsupervised learning scenarios for identifying non-trivial regularities and relationships of the radial pulse patterns obtained by using the Impedance Plethysmographic technique. The objective of our paper is to unearth the hidden patterns to capture the physiological variabilities from the arterial pulse for clinical analysis, thus providing a very useful tool for disease characterization. A variety of fuzzy algorithms including Gustafson-Kessel (GK) and Gath-Geva (GG)have been intensively tested over a diverse group of subjects and over 4855 data sets. Exhaustive testing over the data set show that about 80 % of the patterns are successfully classified thus providing promising results. A Rank Index of 0.7739 is obtained under supervised learning, which provides an excellent conformity of our process with the results of plethysmographic experts. A correlation of the patterns with the diseases of heart, liver and lungs is judiciously performed. Index Terms: Peripheral Pulse Analyzer - Impedance Plethysmography - fuzzy clustering. I. I NTRODUCTION The pulse of the radial artery is an important diagnostic tool for all physicians. The radial artery is not only easily accessible but is in direct continuation of the heart and close to it. Examination of the pulse throws light on the gravity of illness and gives a guide line for prognosis. The pulse provides evidence of great value both to the state of the central circulatory system and the general pathophysiological conditions of the subject. Fluctuation in physiological conditions are reflected as a change in the morphology of the arterial pulse. The pulse identifies the presence and location of disorders in a patient’s body unlike ECG which mainly reflects the electrical activity of the heart, and thus it contains much more useful information than ECG [1]. In traditional Indian medicine the clinician palpates the area above the radial artery at the wrist location of the patient and monitors the rhythm, pulse pressure, pulse propagation and the elasticity of pulse for arriving at the diagnosis in the patient [2]. The diagnosis requires a long period of study and practice by the physician, without the benefit of any recording aids. To extract information from the radial pulse we use the principle of Impedance Plethysmography (IP), a convenient, inexpensive, painless and non invasive technique. IP provides an indirect 978-1-4244-4124-2/10/$25.00 ©2010 IEEE assessment of blood volume changes in any part of the body segment as a function of time [3]. Since blood is a good conductor of electricity, the amount of blood in a given body segment is reflected inversely as in the electrical impedance of the body segment. The pulsatile blood volume changes in the body segment caused by systemic blood circulation, therefore causes proportional change in the electrical impedance [4]. Using the technique of IP, we record the impedance changes in the radial artery as a measure of the blood flow. After peak detection, pulse waveforms, each consisting of sixty four sample points, are recorded, with twenty sample points prior to the peak and forty four thereafter. Over four thousand such data sets are collected with the help of more than three hundred subjects. We apply clustering algorithms to identify the groups of related data that can further be explored. II. DATA ACQUISITION Pulse signals from the radial artery are measured using the Peripheral Pulse analyzer developed at Bhabha Atomic Research Centre (B.A.R.C), Mumbai, India. The radial artery begins about 1 cm below the bend of the elbow and passes along the radial side of the forearm to the wrist, where its pulsation can be readily felt. With the subject in supine position, carrier electrodes are applied around the upper arm and the palm while sensing electrodes are applied on the distal segment around the wrist. Peripheral Pulse analyzer uses the principle of IP wherein a sinusoidal current of constant amplitude (2 mA) is allowed to flow across the wrist of the subject using band electrodes. The amplitude of the signal thus obtained is directly proportional to the electrical impedance of the body segment. The waveforms obtained are sampled at 100 Hz as a time series data. The data is recorded in normal and diseased subjects at the Biomedical Division, Modular Labs, B.A.R.C. for about four minutes on LabWindows platform. The subjects are in the range of about 18 to 60 years. Approximately 240 such samples are obtained from a single subject. The observed impedance signal is shown in figure 1. III. P ROPOSED C LUSTERING A LGORITHMS We propose to unearth suitable techniques for clustering of plethysmographic data. We assume two scenarios viz when 3978 to which the sum of distances from all objects in that cluster is minimized. The algorithm computes cluster centroid in such a way that it minimizes the Euclidean distance given by equation 1. k X k 2 X xi (j) − cj E = (1) j=1 i=1 Fig. 1. Acquisition of Pulse Patterns in LabWindows environment the true class labels of the data set are unknown (unsupervised learning) while in the second case we pose a pattern classification problem wherein an expert assigns class labels to the data set (supervised learning). In the first technique, we apply hard and fuzzy clustering methods for estimating the number of clusters in the data. The performance of these clustering techniques is validated with the help of standard indices. Crisp clustering methods include k-mean, silhouette index, and kmedoid methods while in case of fuzzy clustering we use fuzzy C-means (FCM), GK and GG clustering algorithms. The cluster validity is evaluated by computing the standard indices: Dunn’s Index (DI), Alternative Dunn’s Index (ADI), Partition Index (SC), Separation Index (S),and Xie and Beni’s Index (Xb) and their performance is compared. Based on the performance of the cluster validity indices we decide the number of hidden patterns in our data. In the second technique, each and every of the 4855 data sets has been individually labeled into eight different classes by a plethysmographic expert. Once the true class labels are known, k-means algorithm is applied by specifying eight clusters. The performance of the algorithm is judged using the Rand Index (RI). A. Unsupervised Clustering During unsupervised learning we use various cluster validation parameters. DI [5] is usually selected for compact and well separated clusters and does not work well with overlap clusters. The ADI [5] is used to make the calculations from DI simple. The optimal cluster consists of a lower value of DI and ADI. The Xb index [5] aims to quantify the ratio of the total variation within clusters and their separation, hence its minimum value decides the optimal number of clusters. SC [6] is the ratio of the sum of compactness and separation of the clusters and should have a lower value for good clustering. SC is useful when comparing different partitions having equal number of clusters. S [6] being the separation index should be as high as possible. 1) k-means and Silhouette Index: k-means algorithm seeks to partition the observations in the n × p matrix into k mutually exclusive clusters and returns a vector of indices indicating to which of the k clusters it has assigned each of the observations. The technique uses the output of the k-means clustering algorithm, comparing the change in within cluster dispersion to that expected under a uniform null distribution. Each cluster in the partition is defined by its member objects and by its centroid. The centroid for each cluster is the point E is the sum of squares error for all objects in the database 2 and kx0 (j) − cj k is the Euclidean distance measure between a data point xi (j) and the cluster centroid cj . The k mean method requires users to specify the number of clusters as one of the input parameters. To find satisfactory clustering result, a number of iterations have been done where the algorithm is executed with different values of k (number of clusters). Distortion of a cluster is the sum of squared distances between the objects in the cluster. The lower the value of this measure, the better the results of the cluster. However for our data the distortion value continues to decrease as the number of clusters increase. Thus this does not help us in deciding the the number of clusters. The silhouette index defined by [7] is used to validate the k mean algorithm. The silhouette is used to determine the degree of the similarity of data with respect to the data values in its own cluster versus the data in the other clusters. As seen from the figure 2, a silhouette plot has a range between -1 for data assigned to a wrong cluster to +1 for data points that are very far from neighboring clusters. A silhouette value of zero indicates that no proper distinction is made for classification. Subjectively for k = 4, 6 and 10 we observe that most of the silhouette values are greater than 0.6, hence they are examples of good clustering. The silhouette plot for k = 8 shows quite a few clusters having negative values indicating that eight is not the right number of clusters. We use the largest average silhouette width, to find the optimum value of clusters. As seen from the Table I, k = 4 has the largest average silhouette width hence we conclude that four different patterns exist in the radial pulse data set. However with k TABLE I M EAN S ILHOUETTE WIDTH FOR DIFFERENT C LUSTERS No. of Clusters (k) Mean Silhouette Width 3 4 5 6 7 8 9 10 0.3052 0.6046 0.4743 0.5150 0.4174 0.4614 0.4741 0.4988 = 4, major partition of the data set is in cluster three while for k = 10 data sets are equally distributed in all the clusters. Thus optimum cluster value obtained by the silhouette plot is not clearly defined. In Table II we provide the values of the validation parameters for various clusters. Thus the SI which is a measure of the silhouette width helps us in classifying our data set into four patterns. The Table II shows the cluster validity indices for the k -mean algorithm. The value of the indices SC and ADI suggest the optimal number of clusters as eight. Xb does not however provide any information about the clusters as it is monotonically decreasing. 3979 3) Fuzzy C-means Clustering: The name fuzzy suggests that the data point can belong to two or more clusters. FCM is based on the minimization of the objective function given by equation 2. Jm = n X c X 2 m uij xi − cj (2) i=1 j=1 (a) k = 4 Fuzzy partitioning is carried out through an iterative optimization of the objective function Jm , with the update of membership function uij and the cluster centers cj . We use the parameter epsilon which is termination parameter (1e-5) and a fuzziness parameter m = 2. We assume an initial value of the partition matrix U = [uij ] to calculate the cluster centers according to the equation 3. (b) k = 6 cj = ∀m=2 PN m i=1 uij xi PN u i=1 ij (3) After updating the partition matrix using equation 4, we continue the iteration procedure untill ||Uk+1 − Uk || < . (c) k = 8 Fig. 2. 1 uij = ∀m=2 P c (xi − cj / xi − ck)2/(m−1) k=1 (d) k = 10 (4) The final partition matrix is used to identify the data patterns. Silhouette Plot for different clusters (k) 2) k-medoid Clustering: A medoid is that sample of a cluster, whose average dissimilarity to all objects in the cluster is minimal [8]. This algorithm is derived from the k mean algorithm where the centroid is replaced by the medoid. Initially we choose k random data points to be the initial cluster medoids. Then we assign each data point to the cluster associated with the closest medoid. By minimizing the cost function viz the Minkowski distance metric, we recalculate the positions of the k medoids. This procedure is repeated iteratively till the medoids are fixed. As seen from the Table III the optimal value of the cluster is nine thus there is one pattern which the k mean algorithm is not able to identify. All the validation parameters are in complete agreement with the optimal number of clusters (there is hardly a variation in the values of S as they are very small). The parameter Xb is infinity in case of k medoid clustering and does not exist. TABLE IV VALIDATION OF FUZZY C- MEAN CLUSTERING Cluster DI ADI SC S Xb k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0.0879 0.0491 0.0445 0.0445 0.0438 0.0438 0.0438 0.0438 0.0782 0.0145 0.0145 0.0052 4.84e − 4 5.3e − 4 3.5e − 5 1.25e − 4 1.1059 0.8313 0.7950 0.8613 0.7198 0.7572 0.7696 0.7850 3.0863 2.10 2.2512 2.49 1.949 2.123 1.9866 2.11 1.5239 1.2543 0.9572 0.8128 0.7629 0.6530 0.5973 0.5288 The three cluster validation parameters, DI, ADI and Xb have no fixed local minimum and they are monotonically decreasing. The separation indices give an optimum value of the cluster as seven for SC and six for S. The cluster validity indices are as shown in the Table IV. 4) Fuzzy GK Clustering: The GK clustering algorithm used is a variation of the fuzzy C Means algorithm where we employ an adaptive distance norm using the Mahalanobis distance. As TABLE V VALIDATION OF FUZZY GK CLUSTERING TABLE II VALIDATION OF K - MEAN CLUSTERING Cluster DI ADI SC S Xb k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0.0876 0.0681 0.525 0.0438 0.0449 0.0449 0.0460 0.0505 0.0024 0.0029 0.0022 0.0022 0.0020 0.001 0.0016 0.0016 0.4865 0.3461 0.3527 0.3707 0.3770 0.3014 0.3196 0.3218 1.5512 1.0285 1.1027 1.23 1.2614 0.968 0.9639 1.045 2.6734 2.6529 2.3535 2.3535 2.2494 2.0993 2.0772 2.0555 TABLE III VALIDATION OF K - MEDOID CLUSTERING Cluster DI ADI SC S(*1e-4) k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0.0379 0.0462 0.0376 0.0433 0.0350 0.0376 0.0343 0.0409 0.0021 0.0021 0.0021 0.0024 0.0016 7.7e − 4 7.6e-4 7.6e − 4 0.8902 0.4811 0.5351 0.4337 0.4616 0.3560 0.3326 0.4077 2.277 1.49 1.9477 1.4875 1.57 1.193 1.0788 1.379 Cluster DI ADI SC S Xb k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0.0217 0.0217 0.0217 0.0217 0.0217 0.0223 0.0217 0.0217 0.0017 0.0017 0.0091 0.017 8.5e − 4 0.0059 8.04e − 4 8.02e − 4 817.25 472.38 313.82 221.09 160.4 122.3 263.9 205.6 0.2025 0.1171 0.0763 0.0527 0.0376 0.0283 0.064 0.0497 0.6707 0.5034 0.4030 0.3362 0.2884 0.2526 0.2237 0.2015 seen from the Table V, only the separation indices can provide the optimal number of clusters. So the GK algorithm provides consistent results as the FCM algorithm and is not much help in providing better clustering. 5) Fuzzy GG Clustering: The GG clustering algorithm, a further extension of the FCM algorithm uses a distance norm based on the fuzzy maximum likelihood estimates. As seen from Table VI only two validation parameters can be used for GG clustering. Due to close proximity in between the values 3980 all algorithms both supervised and unsupervised. We evaluate TABLE VII C ONFUSION M ATRIX FOR CALCULATION OF RI Cluster k =1 k =2 k =3 k =4 k =5 k =6 k =7 k =8 k k k k k k k k 504 229 241 206 27 1 2 0 1 0 0 16 0 9 86 81 24 0 0 41 17 44 8 0 364 10 17 386 145 121 170 9 0 1 0 41 380 75 76 39 0 0 0 22 66 287 179 240 91 121 112 34 0 0 0 0 147 88 109 49 11 0 1 0 = = = = = = = = 1 2 3 4 5 6 7 8 results of the clustering algorithm based on quality indices and select the clustering scheme that best fits the data. Fig. 3. Hidden patterns in the radial pulse of DI and very low values of ADI, no proper decision can be derived from the validation parameters, hence the GG method fails to be a good clustering algorithm. Normally this method of clustering yields good results for high dimension data but in our case it fails. B. Supervised Learning Under the guidance of a plethysmographic expert, eight different patterns are observed among the 4855 data sets that we have obtained. Figure 3 shows these patterns to which we will apply supervised clustering algorithm for classification. This classification serves an aid to the clinician to correlate with different diseases. For the validation of the supervised clustering algorithm we use the Rand Index (RI) which brings out the similarity between k-mean clustering and the true value of the class. If Ck is the clustering with k-mean and Tk is the true value of the class then RI is given by equation 5 n RI(Ck , Tk ) = (ns + nd )/ (5) 2 where ns is the number of pairs of the data set that are in the same cluster and nd is the number of decisions that they are in different clusters as determined by the k-mean algorithm. We execute the k-mean algorithm with cluster parameter as eight. The resulting rand index has a value of 0.7739 for k = 8 which indicates a good agreement between the classifications. These indicate that plethysmographic data has been classified into the above eight patterns. The confusion matrix obtained is as shown in Table VII. The rows give the clustering statistics as obtained from the k mean algorithm while the columns are the clusters as provided by the expert. In all the simulations, we have used a fixed seed so that uniform variation exist for IV. DATA A NALYSIS Data analysis have shown over 80 % correlation between the variability analysis of pulse patterns assessed by a group of plethysmographic experts. Normal subjects record patterns 1, 2 and 3 predominantly with brief interpositions of patterns 4 to 8. Patients suffering from disorders of lung, liver and heart record patterns 5 to 8 predominantly with brief interpositions of patterns 1 to 4. These characteristic pattern interpositions can be helpful in predictive diagnosis of disease in the subjects. V. C ONCLUSION IP has the advantage over the traditional methods because it requires no technical skill in reading the pulse. The conventional methods have several enlisted limitations [9] while our technique just requires that the person is calm and sits in the upright position. The morphology of the radial pulse will help us to provide a scientific validation to an abstract science. The main advantage of the k mean algorithm is its computational simplicity and low memory space requirements. The principal disadvantage of all unsupervised algorithms is the dependence of results on the selection of an initial set of centroids, medoids or the partition matrix. In our application, crisp clustering algorithms provide the best results both in supervised as well as unsupervised learning. TABLE VI VALIDATION OF FUZZY GG CLUSTERING Cluster DI ADI(*1e-45) k = 3 k = 4 k = 5 k = 6 k = 7 k = 8 k = 9 k = 10 0.0251 0.0255 0.0242 0.0266 0.0247 0.0251 0.0247 0.0251 2.5 2.165 2.66 3.209 3.889 4.57 5.12 5.53 3981 R EFERENCES [1] A.Joshi, S. Chandran, Arterial Pulse Rate Variability Analysis for Diagnoses, 19th Intl. Conference on Pattern Recognition, pp.1 - 4, 2008. [2] Abhinav, Meghna Sareen, Mahender Kumar, Sneh Anand,Nadi Yantra: A Robust System Design to Capture the Signals from the Radial Artery for Non-Invasive Diagnosis,The 2nd Intl. Conference on Bioinformatics and Biomedical Engineering, Shanghai, China, May 2008. [3] G.D. Jindal,T.S. Annanthakrishnan, S.K.Kataria, Electrical Impedance and Photo Plethysmography for Medical Applications,BARC/2005/E/025. [4] S.Karamchandani,M.Dixit,R.Jain, M.Bhowmick,Application of neural networks in the interpretation of impedance cardiovasograms for the diagnoses of peripheral vascular diseases,Conf Proc IEEE Eng Med Biol Soc,vol 7, pp. 7537-40,2005. [5] F. Hahne, W.Huber, R.Gentleman, S. Falcon, Bioconductor Case Studies, Springer, 2008. [6] J.Oliveira,W. Pedrycz,Advances in fuzzy clustering and its applications, Wiley and Sons, June 2007. [7] Separation index and partial membership for clustering,Computational Statistics and Data Analysis,Volume 50, Issue 3, pp.585-603, 2006. [8] N.Zullkurnain, A.A.Aburas, Investigation of Time Series Medical Data based on Wavelets and K-means clustering,Ariser,3, pp.112-122, 2007. [9] Aniruddha Joshi, Anand Kulkarni, Sharat Chandran, V.K.Jayaraman and B.D.Kulkarni, ”Nadi Tarangini: A pulse based diagnostic system”, Proceedings of the 29th Annual Conference IEEE, EMBS, 2007.
© Copyright 2025 Paperzz