Leaf Identification Based on K means clustering and Naïve Bayesian Classification Shilpa Ankalaki M.Tech Dept. CSE NMIT Bangalore-560064 India [email protected] Abstract— Recognition of plants has become an active area of research as most of the plant species are at the risk of extinction. This paper uses efficient features including moment invariants features are extracted during the feature extraction phase. The proposed system proposes K means clustering method for clustering the similar images and it also proposed Naïve Bayesian classification to classify the leaf image into the one of the cluster. Different distance methods can are be used to identify the closest match of the leaf. In proposed system Euclidian distance is used to search similar leaf in the cluster. Dr. Jharna Majumdar Dean R&D, Prof and Head CSE (PG), Nitte Meenakshi Institute of Technology, Bangalore,560064,India digital camera. In this way, we can get an image including only one leaf. Key words— Hu moment invariants, K means clustering. 1. INTRODUCTION The Plant is one of the most important forms of life on earth. Plants maintain the balance of oxygen and carbon dioxide of earth’s atmosphere [13]. The relations between plants and human beings are also very close. In addition, plants are important means of livelihood and production of human beings. . Plants are vitally important for environmental protection. However, it is an important and difficult task to recognize plant species on earth. Many of them carry significant information for the development of human society. The urgent situation is that many plants are at the risk of extinction [10]. So it is very necessary to set up a database for plant protection [3-4]. The proposed method mainly concentrates on leaf shape features regardless of color features, because the color of the leaf may change due to the climate change or due to the some disease so color feature are inefficient. 2. PROPOSED METHODOLOGY The proposed methodology mainly consists of 2 phases. i. Learning Phase ii. Identification phase Both phases involve four stages which are explained inconsecutive sub-sections. Fig1 shows the flow diagram of the learning phase and identification phase. 2.1 Image Acquisition Leaves are usually clustered so that it is difficult to automatically extract features of one leaf from the unneeded background. We created leaf Image plates. Put these leaves on the light panel, and then take the picture of the leaf with a Fig.1. Flow diagram of learning and identification phase 2.2 Image Preprocessing The raw data, depending on the data acquisition type is subjected to a number of pre processing steps to make it usable in the descriptive stages of analysis. Pre processing aims to produce image data that are easy for the Leaf Identification system and can operate quickly and accurately. 2.2.1Converting Color leaf image to Gray leaf image The colors of plant leaves are usually green. Moreover, the shades and the variety of changes of water, nutrient, atmosphere and season can cause change of the color, so the color feature has low reliability. Thus, we decided to recognize various plants by the grey-level image of plant leaf. Fig 2 shows the pre processing of leaf image. An RGB image is firstly converted into a grayscale image. Eq. (1) is the formula used to convert RGB value of a pixel into its grayscale value. Gray = 0.2989 * R + 0.5870 * G + 0.1140 *B (1) Where R, G, B correspond to the color of the pixel, respectively 1 2.2.2 Generation of Binary Image In the analysis of images, it is essential to separate the objects of interest from the rest. The techniques used to find the objects of interest from the rest are referred to as Thresholding techniques and the cluster of pixels corresponding to region of interest are known as foreground pixels and the rest of the pixels are known as background pixels. The image data is converted to a two level binary image having pixel values between 0 and 255 this is done using Thresholding, where all pixels above certain level are assigned 255 and rest of the pixels 0. The proposed methodology used statistical mean method and Otsu’s method for automatic Thresholding. Roundness = 4π ∗ Area Perimeter 2 (4) 2.3.5 Sphericity: Spericity [4][5] is the ratio of the radius of the incircle of the leaf object (ri) and the radius of the excircle of the leaf object (rc). Incircle and excircle are as shown in the Fig 3(a). 2.3.6 Principal axes: Principal axes of a given shape can be uniquely defined as the two segments of lines that cross each other orthogonally in the centroid of the shape and represent the directions with zero cross correlation .This way, a contour is seen as an instance from a statistical distribution. Fig 3 (b) shows the principal axes. 2.2.3 Generation of Boundary Image Boundary extraction can be applied to any image containing only boundary information. Once a single boundary point is found, the operation seeks to find all other pixels on that boundary. Boundary can be extracted using chain code technique. The system defines the boundary of the leaf in terms of x-y coordinates [11]. From a starting point, the system traces the boundary coordinates in a clockwise direction Fig.2 Pre Processing of leaf image 2.3 Feature Extraction Feature extraction involves the extraction of efficient leaf shape features these features are used by the classifier to classify the leaf image. Different types of features extraction are discussed below. 2.3.1 Aspect ratio: The aspect ratio[1] is ratio between the maximum length and the minimum length of the minimum bounding rectangle or ratio between length and width of the minimum bounding box of leaf image. It is scale invariant feature. Aspect ratio= Length/Width (2) 2.3.2 Rectangularity: Rectangularity is the measure of how closely the shape of leaf approaches to rectangle or it can be defined as the similarity between leaf and rectangle. To calculate the rectangularity first step is to create bounding box to the leaf image, and find the ratio of leaf area to the area of leaf bounding box. Rectangularity = Leafarea (Length∗Width) (3) 2.3.3. Perimeter: The total number pixels on the leaf boundary. 2.3.4 Roundness: Roundness [2][8] is the measure of how closely the shape of leaf approaches that of a circle. Difference between a leaf and a circle is calculated by using the Eq. 4 Fig.3 (a) Incircle and Excircle (b) Principal axes 2.3.7 Eccentricity: Eccentricity can be defined as the ratio of minor principal axes to major principal axes. 2.3.8 Tooth Feature: A tooth point [14] is a pixel on the contour that has a high curvature, i.e., it is a peak. To determine whether a point Pi on the contour is a tooth point or not, we examine the angle subtended at Pi by its neighbors Pi-k and Pi+k (where k is a threshold). Fig 4 shows an example. If the angle is within a particular range, then Pi is a tooth; otherwise, it is not. It is also possible for two different types of leaves to have nearly the same number of teeth at a particular threshold [12]; so we compute the toothbased features at multiple increasing threshold values. Fig.4 Tooth detection at two different thresholds 2.3.9 Leaf Vein Extraction: Leaf vein extraction [3][6] , is one of the important features of the leaf. Leaf veins can be extracted using the greyscale morphological operations. Algorithm for vein extraction is as follows. ALGORITHM: To find Veins of leaf Input : Grayscale Leaf Image Output : Leaf vein structure Step 1: Read Grayscale Image. Step 2: Let f be the Grayscale leaf image and b be the disk shape Structuring element. Structuring element of radius 2, 3, 4 or 5 can be used. Step 3: Perform the Erosion operation on the grayscale image using the disk shape structuring element that is find the minimum neighbor and replace it with origin of the structuring element. 2 Step 4: Perform the Dilation on the output image of the erosion using the disk shape structuring element that is find the maximum neighbor and replace it with origin of the structuring element. This process is called as Opening Morphological operation. Step 5: Subtract the original grayscale image from the result of opening operation this process is called as Top-hat Transformation. Step 6: Convert the result of Top-hat Transformation into binary image. Step 7: Perform the Av1/A, where Av1 is the vein area using structuring element (SE) 2, A is the leaf area. Step 8: The leaf vein structure can be extracted using different structuring element like disk shape structuring element of radius 2,3,4,5 and each vein structure of different structuring element is stored as feature. ⱷ4= (Moment30+Moment12)2+(Moment21+Moment03)2 To make the moments invariant to translation the image is shifted such that its centroid coincides with the origin of the coordinate system. The centroid of the image in terms of the moments is given by: XC=Moment10 /Moment00 , YC=Moment01 /Moment00 Then the central moments are defined as follows μpq=∑𝑥 ∑𝑦[ (𝑥 − 𝑥c)p,(y - yc)q I(x, y)] (5) To compute Hu moments using central moments the ⱷ terms in equation (5) need to be replaced by μ terms. It can be verified that μ00 = m00, μ10 = 0 = μ01. To make the moments invariant to scaling the moments are normalized by dividing by a power of μ00. The normalized central moments are defined using Eq. 6: 2.3.10 Moment Invariants: Moment invariants have been frequently used as features for image processing, remote Mpq= μpq /( μ00) where p+q)/2 (6) sensing, shape recognition and classification. Hu (Hu, 1962), proposed the first set out the mathematical foundation for two-dimensional moment invariants and 2.4 Classification demonstrated their applications to shape recognition [16]. All the features extracted from all leaves during learning These moment invariant values are invariant with respect to phase are stored in the database and applied unsupervised classification method to group the similar leaves that are translation, scale and rotation of the shape. For a digital image, [9][15] the moment of a pixel P(x, y) at present in the database. The proposed system introduced K location (x, y) is defined as the product of the pixel value means clustering method to cluster the similar leaves based with its coordinate distances i.e. m = x. y. P(x, y). The on the features on the database. K- Means takes the database moment of the entire image is the summation of the features as the input. User needs to specify the number of moments of all its pixels. More generally the moment of clusters required. The algorithm for the K-means clustering is as follows: order (p, q) of an image I (x, y) is given by Momentpq=∑𝑥 ∑𝑦[x p yq I(x, y)] Based on the values of p and q the following are defined: 𝑀𝑜𝑚𝑒𝑛𝑡00 = ∑ ∑[𝑥 0 𝑦 0 𝐼(𝑥, 𝑦)] = ∑ ∑[𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 Algorithm: The k-means algorithm for partitioning, where each cluster center is represented by the mean value of the objects in the cluster. Input: 𝑦 1 0 𝑀𝑜𝑚𝑒𝑛𝑡10 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥 ∗ 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 𝑀𝑜𝑚𝑒𝑛𝑡01 = ∑ ∑[𝑥 0 𝑦1 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑦 ∗ 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 K: the number of clusters, D: a data set containing n objects. Output: A set of k clusters. 1 1 𝑀𝑜𝑚𝑒𝑛𝑡11 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥𝑦 ∗ 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 Method: 𝑦 2 0 2 𝑀𝑜𝑚𝑒𝑛𝑡20 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 𝑀𝑜𝑚𝑒𝑛𝑡02 = ∑ ∑[𝑥 0 𝑦 2 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑦 2 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 2 1 𝑀𝑜𝑚𝑒𝑛𝑡21 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥 2 𝑦1 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 1 2 𝑀𝑜𝑚𝑒𝑛𝑡12 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥 1 𝑦 2 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 0 3 𝑀𝑜𝑚𝑒𝑛𝑡03 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑦 3 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 𝑦 3 0 𝑀𝑜𝑚𝑒𝑛𝑡30 = ∑ ∑[𝑥 𝑦 𝐼(𝑥, 𝑦)] = ∑ ∑[𝑥 3 𝐼(𝑥, 𝑦)] 𝑥 𝑦 𝑥 Step 1: Arbitrarily choose k objects from D as the initial cluster centers; Step 2: repeat Step 3: (re)assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster; Step 4: update the cluster means, i.e., calculate the mean value of the objects for each cluster; Step 5: until no change; 𝑦 At the end of the k-means clustering K number of clusters will be formed. The first four Hu invariant moments which are invariant to rotation are defined as follows: 2.5 Recognition Phase ⱷ1= Moment20+ Moment02 During the testing, first step is to read the test leaf image. Second step is preprocessing of given input leaf image, third ⱷ2= (Moment20-Moment02)2 + (2Moment11)2 2 2 step is feature extraction. Features of given leaf are stored in ⱷ3= (Moment30- 3Moment12) + (3Moment21 –Moment03) 3 the feature vector. Fourth step is to search the clusters for given leaf, for this purpose Naïve Bayesian classification is used. 2.5.1 Naïve Bayesian Classification: Step 7: To find the P(X/C) for continuous distribution needs to apply Gaussian distribution, i.e. to find the probability of the leaf that belongs to the cluster with respect to feature. It involves following steps. Naïve Bayesian classification is supervised classification; it Calculate mean for each cluster with respect to each feature. takes the prior knowledge from the clusters. It is possible to use the Naïve Bayesian classification without using the Calculate variance for each cluster with respect to each clustering method, but it is necessary to create the database feature and the following Gaussian distribution such that all the similar leaves in one class. So the proposed formula[17] as shown in Eq.10 system introduced unsupervised classification for clustering (𝑋−𝜇)2 1 − and supervised classification to classify the leaf image into 𝑔(𝑥, 𝜇, 𝜎) = 𝑒 2𝜎2 (10) √2𝜋𝜎 one of the respective class. Algorithm for the Naïve Finally probability of leaf belong to particular class is given Bayesian classification is as follows: in Eq.11 Algorithm: Naive Bayesian classifier Input: Leaf features of training data set 𝑃(𝑥𝑘 |𝐶𝑖 ) = 𝑔(𝑥𝑘 , 𝜇𝐶𝑖 , 𝜎𝐶𝑖 ) (11) Output: Classification of Test leaf The leaf image which is being tested belongs to particular Step 1: Apply any clustering method on the training data set cluster which has the highest probability. to form the cluster Step 2: Store the input leaf feature into feature vector. Euclidian Distance Step 3: Find the probability of each cluster that holds the To recognize the leaf within the cluster the Euclidian distance given leaf image based features. This probability can be method [7] [13] is used. To recognize the leaf, find the called as posterior probability. Posterior probability can be distance between the input leaf image feature vector to all calculated using Eq.7 leaves features that are present in particular cluster selected 𝑋 using Naïve Bayesian classifier. The leaf image that has the 𝑃 ( ) 𝑃(𝐶) 𝐶𝑖 𝐶 𝑃( ) = (7) minimum distance to the input leaf is selected as recognized 𝑋 𝑃(𝑋) leaf. Where X: feature vector of the given leaf. EXPERIMENTAL RESULTS C : Leaf clusters i: Number of Clusters The proposed methodology uses the invariant shape features The calculation of the P(C), P(X/C) and P(X) is given in the and Hu moment invariants. Hu moment invariants are further steps invariant to rotation, scaling and translation. In all the experiments, a leaf image contains a single leaf on an uniform background. First, we apply the Otsu or statistical Step 4: Calculate the class probability P(C) using Eq.8. It is threshold method to remove the background and keep only constant value. the mask corresponding to the leaf. A closed contour is then |𝐶𝑖,𝐷 | extracted from the leaf mask. Note that the input of all the 𝑃(𝐶𝑖 ) = (8) 𝐷 representations described above is a sequence of N boundary Where points regardless of other leaf features like texture and color. The proposed methodology able to identify the correct leaf D: total number of training tuples in the database. even though input image is rotated and scaled. The following table 1 shows the feature of the original image, |𝐶𝑖,𝐷 | : Number of training tuples of class Ci in D. same leaf image rotated at 45 degree and features when the size of the leaf reduced to half of the original image. Fig 5 Step 5: P(x) will be constant, so to maximize the probability shows the leaf image which is used to compare the results of its need to maximize the P(X/C)*P(C). P(X) can be feature under rotation and scaling. Table1 and 2 shows the calculated using Eq.9 as follows: features of original, rotated, resized and damaged leaf P(X)=P(Xk | C1)+…..+P(Xk|Ci) (9) image. Step 6: In order to reduce computation in evaluating P(X|Ci), the naive assumption of class conditional independence is made. This presumes that the values of the attributes are conditionally independent of one another, given the class label of the tuple. Thus, 𝑃(𝑋|𝐶𝑖 ) = 𝑃(𝑥1 , 𝑥2 … . 𝑥𝑘 |𝐶𝑖 ) = 𝑃(𝑥1 |𝐶𝑖 ) ∗ 𝑃(𝑥2 |𝐶𝑖 ) ∗ … ∗ 𝑃(𝑥𝑛 |𝐶𝑖 ) 𝑛 = ∏ 𝑃(𝑥𝑘 |𝐶𝑖 ) 𝑘=1 Where X is feature vector with {x1, x2, -----, xk} attributes, and k is the total number of attributes. Table1. Features of original image and rotated image Feature Name Original image Features feature rotated leaf Aspect Ratio 0.239521 0.946939 Roundness 0.136196 0.1614271 Spericity 0.056033 0.061743 Rectangularity 0.275299 0.129803 Eccentricity 0.014657 0.014501 Tooth Features at 2 3 angle 30 Tooth Feature at 6 6 of 4 angle 45 Tooth feature at angle 60 Moment 1 Moment 2 Moment3 Moment 4 Moment 5 Moment 6 Moment 7 8 8 0.844066 0.672120 0.028135 0.025953 0.000701 0.021267 -0.000187 0.847128 0.677481 0.030383 0.026138 0.000823 0.023151 0.000004 Table2 .Features of resized and damaged leaf image Features Resized image Damaged image Aspect Ratio 0.257485 0.793478 Roundness 0.142336 0.220737 Spericity 0.057785 0.096403 Rectangularity 0.271550 0.262685 Eccentricity 0.017142 0.038267 Tooth Features at 6 2 angle 30 Tooth Feature at 6 4 angle 45 Tooth feature at 10 6 angle 60 Moment 1 0.779586 0.608201 Moment 2 0.567475 0.317910 Moment3 0.019643 0.077895 Moment 4 0.017876 0.055264 Moment 5 0.000335 0.003626 Moment 6 0.013452 0.031154 Moment 7 -0.000100 0.001247 Fig. 6 sample leaves of database one leaf from one species Precision is defined as the ratio of number of relevant retrieved images to total number of retrieved images Precision= Number of relevant retrieved images Total number of retrieved images Error rate is defined as ratio of non-relevant images retrieved to the total number of images retrieved. Error Rate = Number of irrelevant images Total number of images retrieved CONCLUSION Fig. 5 Original leaf, rotated, resized and damaged leaf image used to compare the features in Table 1 and 2. Fig 6 shows the sample leaves of the database some of the leaves are taken from the flavia dataset. The proposed methodology identifies the leaf correctly even some part of the leaf get damaged as shown in the Fig 5. It is important to measure the effectiveness of the approach. Here recall, precision and error rate are calculated to measure the effectiveness of the method [18]. Recall is defined as ratio of number of relevant retrieved images to number of all relevant images. Recall = Number of relevant retrieved images Number of all relevant images The work described in this research has been concerned with the two challenging phases in image analysis applications which are feature extraction and classification phase. Since there is no general feature extraction method that is available for all type of images, an experiment needs to be conducted in order to determine the suitable methods for plant leaf images. Therefore, an investigation of some of the suitable shape features and moment invariants techniques was presented which were used to be implemented in the feature extraction of plant leaf images. The proposed methodology gives the 80 to 85% accuracy even in presence of damaged leaf. One of the disadvantages in this research is the use of limited sample of leaf images. So future scope of this research is the identification of compound with different background and performance improvement of identification system. REFERENCES 1. Chia-Ling Lee and Shu-Yuan Chen, “Classification for Leaf Images”, 16th IPPR Conference on Computer Vision, Graphics and Image Processing (CVGIP 2003) 5 2. Qingfeng Wu, Changle Zhou and Chaonan Wang, “Feature Extraction and Automatic Recognition of Plant Leaf Using Artificial Neural Network”, © A. Gelbukh, S. Torres, I. López (Eds.) Avances en Ciencias de la Computación, 2006, pp. 5-12. 3. S. G. Wu, F. S. Bao, E. Y Xu, Y-X. Wang, Y-F. Chang, & Q-L.Xiang, “A Leaf Recognition Algorithm for Plant Classification Using Probabilistic Neural Network”, IEEE 7th International Symposium on Signal Processing and Information Technology, Cairo, 2007. 4. J. Du, X. Wang, and G. Zhang, “Leaf shape based plant species recognition,” Applied Mathematics and Computation, vol. 185-2, pp. 883-893, February 2007. 5. David Knight, James Painte, Matthew Potter, “Automatic Plant Leaf Classification for a Mobile Field Guide”. 6. Xiaodong Zheng, Xiaojie Wang, “ Leaf Vein Extraction Based on Gray-scale Morphology”, I.J. Image, Graphics and Signal Processing, 2010, 2, 25-31 Published Online December 2010 in MECS (http://www.mecs-press.org/) 7. Chomtip Pornpanomchai, Chawin Kuakiatngam ,Pitchayuk Supapattranon, and Nititat Siriwisesokul, “Leaf and Flower Recognition System (e-Botanist)”, IACSIT International Journal of Engineering and Technology, Vol.3, No.4, August 2011 8. Abdul Kadir, Lukito Edi Nugroho, Adhi Susanto and Paulus Insap Santosa, “Leaf Classification Using Shape, Color, and Texture Features”, International Journal of Computer Trends and TechnologyJuly to Aug Issue 2011 9. Jyotismita Chaki and Ranjan Parekh, “Plant Leaf Recognition using Shape based Features and Neural Network classifiers”, (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 10, 2011 10. Prof. Meeta Kumar, Mrunali Kamble, Shubhada Pawar, Prajakta Patil, and Neha Bonde, “ Survey on Techniques for Plant Leaf Classification”, International Journal of Modern Engineering Research (IJMER) www.ijmer.com Vol.1, Issue.2, pp-538-544 ISSN: 2249-6645 11. Chomtip Pornpanomchai, Supolgaj Rimdusit, Piyawan Tanasap and Chutpong Chaiyod, “Thai Herb Leaf Image Recognition System (THLIRS)”, Kasetsart Journal: Natural Science May 2011 45 : 551 562 12. Akhil Arora, Ankit Gupta, Nitesh Bagmar, Shashwat Mishra, and Arnab Bhattacharya, “A Plant Identification System using Shape and Morphological Features on Segmented Leaflets: Team IITK, CLEF 2012” 13. Anant Bhardwaj, Manpreet Kaur, and Anupam Kumar, “Recognition of plants by Leaf Image using Moment Invariant and Texture Analysis”, International Journal Of Innovation And Applied Studies ISSN 20289324 Vol. 3 No. 1 May 2013, Pp. 237-248 © 2013 Innovative Space Of Scientific Research Journals. 14. Vijay Satti and Anshul Satya, “An Automatic Leaf Recognition System For Plant Identification Using Machine Vision Technology”, International Journal of Engineering Science and Technology (IJEST) ISSN : 0975-5462 Vol. 5 No.04 April 2013 15. Jyotismita Chaki and Ranjan Parekh, “Designing an Automated System for Plant Leaf Recognition”, International Journal of Advances in Engineering & Technology, Jan 2012. ©IJAET ISSN: 2231-1963 16. Laura Keyes, Adam Winstanley, “USING MOMENT INVARIANTS FOR CLASSIFYING SHAPES ON LARGE_SCALE MAPS”. 17. George H. John, Pat Langley, “Estimating Continuous Distributions in Bayesian Classification”, In Proceedings of Conference on uncertainty in Artificial Intelligence, Morgan Kaufmann Publishers, San Mateo,1995 18. Komal Asrani, Renu Jain, “Contour Based Retrieval for Plant Species”, I.J. Image, Graphics and Signal Processing, 2013, 9, 29-35 Published Online July 2013 in MECS (http://www.mecs-press.org/) 6
© Copyright 2026 Paperzz