Automatic Detection of Blood Vessels in Medical Images Nikolaos Tzoannou BSc Computer Science 2012/2013 The candidate confirms that the work submitted is their own and the appropriate credit has been given where reference has been made to the work of others. I understand that failure to attribute material which is obtained from another source may be considered as plagiarism. (Signature of student) _______________________________ 2013 Nikolaos Tzoannou Summary The examination and analysis of human tissue in order to diagnose diseases such as cancer is called histopathology which derives from the greek words of disease suffering (πάθος pathos) and tissue (ιστός histos). Histopathologists use virtual slides of human tissue which represent a section of the specific organ tissue which is to be examined. A way to diagnose diseases such as cancer which affect the circulatory system is to analyse the blood vessels of a specific tissue section. More specifically, a quantification analysis of blood vessels can provide a large amount of information to histopathologists. When a human is suffering from cancer there are noticeable differences in the structure and the number of blood vessels in the specific suffering organ. The digitisation of human tissue led to the development of several algorithms for microvessel analysis. This project presents and discusses the performance of a similar commercial algorithm which is used among others by the Leeds Institute of Molecular Medicine and proposes an improved pipeline to detect blood vessels in medical images of human tissue. The proposed algorithm is then subject to further evaluation and discussion. i 2013 Nikolaos Tzoannou Acknowledgements First of all, I would like to thank my supervisor, Dr Derek Magee for his advice, support and guidance throughout the project and especially his help during the implementation phase. Without his contribution, the delivery of this piece of work in time would be impossible. I would also like to thank Dr Darren Treanor and Alexander Wright of Leeds Institute of Molecular Medicine (LIMM) for helping me to understand and learn about histopathology, tissue staining and all these relevant technologies. Furthermore, I would like to thank my assessor, Professor Kristina Vuskovic for her valuable feedback on the mid-project report and during the progress meeting. Finally, I would like to thank my mother and my father for their invaluable support throughout my degree studies. I dedicate this piece of work to them. ii 2013 Nikolaos Tzoannou Contents 1 Introduction .................................................................................................................................... 1 1.1 Overview ................................................................................................................................. 1 1.2 Aim ......................................................................................................................................... 2 1.3 Objectives ............................................................................................................................... 2 1.4 Minimum Requirements ......................................................................................................... 3 1.5 Deliverables ............................................................................................................................ 3 1.6 Possible Extensions ................................................................................................................. 3 1.7 Schedule and Project Management ......................................................................................... 4 1.7.1 Initial Schedule ............................................................................................................... 4 1.7.2 Revised Schedule ............................................................................................................ 6 1.8 2 Relevance to Degree Programme ............................................................................................ 7 Background Research .................................................................................................................... 8 2.1 Image data ............................................................................................................................... 8 2.1.1 Tissue slides .................................................................................................................... 8 2.1.2 Staining ........................................................................................................................... 9 2.1.2.1 Haematoxylin and eosin (HEM&E) staining .............................................................. 9 2.1.2.2 Haematoxylin and Diaminobenzidine (HEM&DAB) staining ................................. 10 2.1.2.3 Multiple staining methods ......................................................................................... 10 2.1.3 Digitisation .................................................................................................................... 11 2.2 Cancer and the importance of microvessel analysis ............................................................. 11 2.3 Aperio software ..................................................................................................................... 11 2.4 Image pre-processing techniques .......................................................................................... 13 2.4.1 Segmentation by thresholding ....................................................................................... 13 2.4.2 Colour Deconvolution ................................................................................................... 13 2.5 Local pre-processing techniques ........................................................................................... 16 2.5.1 Gaussian smoothing filters ............................................................................................ 16 2.5.2 Edge detectors ............................................................................................................... 18 2.5.2.1 Introduction ............................................................................................................... 18 2.5.2.2 The Laplacian............................................................................................................ 18 2.5.2.3 Zero-crossings of the second derivative .................................................................... 19 iii 2013 Nikolaos Tzoannou 2.5.2.4 The Laplacian of Gaussian ........................................................................................ 19 2.5.2.5 Canny edge detection operator .................................................................................. 20 2.5.3 2.6 3 Morphological operations on binary images ................................................................. 21 2.5.3.1 Dilation and Erosion ................................................................................................. 21 2.5.3.2 Opening and Closing................................................................................................. 23 2.5.3.3 Skeletonisation .......................................................................................................... 23 Region Labeling .................................................................................................................... 24 Project Analysis & System Design .............................................................................................. 25 3.1 Schedule ................................................................................................................................ 25 3.2 Design methodology ............................................................................................................. 25 3.3 Implementation methodology ............................................................................................... 26 3.4 Programming Language ........................................................................................................ 27 3.5 Evaluation ............................................................................................................................. 28 3.5.1 Hausdorff distance ............................................................................................................... 28 4 5 6 Implementation ............................................................................................................................ 29 4.1 Introduction ........................................................................................................................... 29 4.2 Data collection ...................................................................................................................... 30 4.3 Load image into memory ...................................................................................................... 31 4.4 Colour deconvolution............................................................................................................ 31 4.5 Gaussian filter ....................................................................................................................... 31 4.6 Segmentation......................................................................................................................... 32 4.7 Skeletonisation ...................................................................................................................... 33 4.8 Edge extension ...................................................................................................................... 34 4.9 Connected components labelling .......................................................................................... 35 4.10 Edge detection ....................................................................................................................... 36 4.11 Pipeline output ...................................................................................................................... 37 Evaluation ..................................................................................................................................... 38 5.1 Ground truth extraction ......................................................................................................... 38 5.2 Similarity matrix ................................................................................................................... 39 5.3 Blood vessel matching .......................................................................................................... 40 5.4 Matching evidence ................................................................................................................ 41 5.5 Precision and Recall .............................................................................................................. 47 5.6 Observations ......................................................................................................................... 49 Conclusions ................................................................................................................................... 50 6.1 Objectives & Requirements .................................................................................................. 50 6.2 Possible extensions and future work ..................................................................................... 51 iv 2013 Nikolaos Tzoannou 6.3 Conclusion ............................................................................................................................ 52 Bibliography ........................................................................................................................................ 53 A Personal Reflection ..................................................................................................................... 56 B External material used ................................................................................................................ 58 C Detection Evidence ...................................................................................................................... 59 v 2013 Nikolaos Tzoannou Chapter 1 Introduction 1.1 Overview According to Cancer Research UK, cancer (of any type) was responsible for 28% of deaths in the UK in 2010[6]. It has been observed that more than 200 types of human cancer exist [7]. Malignant tumours, medically referred as malignant neoplasms, commonly known as cancer, are diseases which mainly involve the creation of new cells in a specific organ abnormally. This also involves blood vessels since the new created cells require blood supply to grow. Angiogenesis is the process through new blood vessels are created from existing ones [31]. This process is completely normal for a human being since is required in development and in wound healing among other functions. Although this process is vital for humans, it is also observed when malignant tumours start to grow. Due to the significance of tumour angiogenesis, it is now common to use angiogenesis inhibitors as part of cancer treatment [12]. By this way, blood supply of malignant tumours is blocked and therefore, expansion is limited. Histopathology, which is the examination of human tissue that suffers from some sort of disease, has been applied to the detection of tumour angiogenesis for several years now. A specific section of an organ tissue can be examined to determine factors such as the density and the quantity of blood vessels in order to detect tumour angiogenesis which may result in the diagnosis of cancer. In the past, histopathologists used to analyse these tissue slides manually using microscopes. The evolution of machine vision techniques along with the digitisation of sections of human tissue using immunohistochemistry as explained further in the slide preparation section, has made this process completely automatic by using certain algorithms and pipelines which analyse these medical images of human tissue. 1 2013 Nikolaos Tzoannou This project mainly presents a pipeline based on the commercial algorithm that the Leeds Institute of Molecular Medicine (LIMM) uses for microvessel detection developed by Aperio, a commercial ePathology organisation. The proposed algorithm aims to have an improved performance in terms of accuracy over the already existing pipeline especially in situations when Aperio’s algorithm performance is poor. 1.2 Aim The aim of this project is to develop a pipeline of image processing and machine vision techniques, which use digitised slides of human tissue to analyse and detect blood vessels. The proposed algorithm should be configurable depending on the type of histopathology images that are used. The aim is to overcome any flaws of the already existing algorithms and also to propose a solid pipeline which histopathologists can use to diagnose any type of cancer which involves angiogenesis. Figure 1.1 shows a tissue slide image along with the desired detection that this project aims to achieve. Figure 1.1: Virtual tissue slide before (left) and after the blood vessel detection (right) 1.3 Objectives The objectives of the project are to: • Research and understand various machine vision techniques and methodologies used in medical imaging and blood vessel analysis • Test existing commercial algorithms and analyse their performance • Experiment with different methods of microvessel detection and analysis 2 2013 Nikolaos Tzoannou • Design and implement an algorithm for blood vessel detection and analysis in medical images • Evaluate the algorithm using different methods. The evaluation is both quantitative (accuracy based on pathologist supplied ground truth) and qualitative meaning that the method meets requirements extracted by talking to pathologists about their needs. 1.4 Minimum Requirements The minimum requirements are: • A method for detection of blood vessels in a specific region • Design of quantification and characterisation of blood vessels • Quantitative accuracy evaluation using ground truth provided by expert pathologists 1.5 Deliverables • Implementation of the algorithm • MATLAB code • Evaluation using Ground Truth • Evaluation code 1.6 Possible Extensions • XML/text output of the algorithm • Visualisation of blood vessel detection • Better accuracy for borderline cases by using rotating matching filters • Quantification analysis of the tissue slide based on parameters such as size and shape of blood vessels. 3 2013 1.7 Nikolaos Tzoannou Schedule and Project Management 1.7.1 Initial Schedule The initial schedule was set during the second week of the project. It was a first attempt to break down the different stages of the project aligned to the predefined deadlines that have been given from the very beginning. The initial fragmentation of the project’s work load was based mainly on estimates about the difficulty and requirements in terms of time for each of the activities. It was crucial to make the schedule as flexible as possible in case changes would be necessary throughout the duration of the project. Having a schedule with detailed deadlines and milestones proved very helpful and vital for a project of this type. Figure 1.2 shows a graphical representation of the initial schedule during the 16 week period of the project along with the milestones that were set. W3 (04/02) W4 (11/02) W5 (18/02) W6 (25/02) W7 (04/03) W8 (11/03) |4Ws Break| W9 (15/04) W10 (22/04) W11 (29/04) W12 (06/05) Literature research Background reading Experimenting with different methods Research & discussion on how to improve the algorithm Discover flaws Implement improved algorithm Test Aperio alg using several images Testing and evaluation Design Improved algorithm Report Write-up & proof-reading Mid-Project report write up Mid-Project report submission Set Ground Truth Tweak, calibrate and improve Figure 1.2: Initial Project Schedule 4 Final Submission Evaluate results 2013 Nikolaos Tzoannou The milestones and deadlines of the project’s lifetime are described briefly below: Weeks 1 & 2: Define the aim of the project and set the minimum requirements. Organise the background reading and literature review and plan an initial schedule for the project. Background reading on cancer, histopathology techniques and staining methods. Week 3: Examine and test the performance of Aperio algorithm. Understand how it works and what methodology utilizes. Background reading and literature research on these methods. Week 4: Research upon any previous work in the field and similar projects. Read and understand the theory and methodology that has been used. Week 5: Experiment with different methods of computer vision that could be used in the proposed algorithm. As the majority of the background information has been collected, proceed with the mid-project report write-up. Week 6: Finalise the background research on relevant literature and complete the mid-project report write-up. Week 7: Design an initial implementation of the algorithm using vision methods taken from the relevant literature and previous similar work. Weeks 7-9 (including 4 weeks of Easter break): Implementation of the algorithm using an iterative approach. The implemented system is subject to several iterations during the development until the result had reached a satisfying level. Moreover, evaluation using ground truth, provided by the histopathologists, takes place during week 9. Weeks 10-11: Write up the final report which includes the implementation and evaluation of the algorithm along with the background reading chapter which was mainly written during weeks 5 and 6. Week 12: Final check and proof reading of the report. Submission deadline is on the Wednesday of this week. 5 2013 Nikolaos Tzoannou 1.7.2 Revised Schedule Due to several issues that presented during the implementation phase and because an iterative development approach was used as a guideline, the initial schedule had to be revised in order to meet the new criteria that appeared especially during the design and implementation phases. The specific reasons which led to the revision of the initial schedule are presented extensively in the Design chapter. Figure 1.3 shows a graphical representation of the revised schedule during the 16 week period of the project along with the milestones that were set. W3 (04/02) W4 (11/02) W5 (18/02) W6 (25/02) W7 (04/03) W8 (11/03) |4Ws Break| W9 (15/04) W10 (22/04) W11 (29/04) W12 (06/05) Literature research Background reading Experimenting with different methods Research & discussion on how to improve the algorithm Discover flaws Implement improved algorithm Test Aperio alg using several images Algorithm evaluation Design Improved algorithm Testing Redesign the algorithm Report Write-up & proofreading Mid-Project report write up Mid-Project report submission Set Ground Truth Tweak, calibrate and improve Final Submission Evaluate results Figure 1.3: Revised schedule The revised milestones and deadlines of the project’s lifetime are described briefly below: Weeks 1 to 5: Same as the initial schedule. Week 6: Finalise the background research on relevant literature and complete the mid-project report write-up. Start of the design phase with choosing appropriate methodologies. 6 2013 Nikolaos Tzoannou Week 7: Design an initial implementation of the algorithm using vision methods taken from the relevant literature and previous similar work. First iteration of the implementation phase including the reconsideration of the design. Weeks 7-9 (including 4 weeks of Easter break): Implementation of the algorithm using an iterative approach. The algorithm is redesigned several times until satisfying methods and techniques have been chosen. Each design iteration is followed by an implementation phase. Weeks 10-11: Quantitative and qualitative evaluation of the proposed algorithm. Write-up of the final report which includes the implementation and evaluation chapters along with the background reading chapter which was mainly written during weeks 5 and 6. Week 12: Final check and proof reading of the report. Submission deadline is on the Wednesday of this week. 1.8 Relevance to Degree Programme Every aspect of this project presents a significant relevance to the modules I studied during the BSc Computer Science course. The planning, the design and the development of the project were based on methodology taught in CR21 (Software Systems Engineering). Furthermore, the majority of computer vision techniques that were used, are part of AI31 (Computer Vision) module syllabus and most of the evaluation methods are in extension of AI20 (Artificial Intelligence) module. 7 2013 Nikolaos Tzoannou Chapter 2 Background Research In this chapter, background information is provided regarding the data used in the project. Moreover, some general information about the specific problem of this project is also presented here. More specifically, a thorough prior art research took place in order to identify possible methods and techniques already used in the past on similar projects, which will help understand and provide all the important background knowledge required to address the problem. 2.1 Image data This section presents the dataset that is used in the project. The data consist of high resolution images which represent a small section of human tissue. The process of acquiring tissue samples, staining and digitisation is described in detail in the following sections. These techniques are used by the Leeds Institute of Molecular Medicine (LIMM) in order to perform microvessel analysis and diagnose diseases. 2.1.1 Tissue slides The first step of this process is to surgically obtain a tissue sample from a specific human organ that histopathologists need to examine. This practice is called biopsy. Then, the tissue sample is enclosed into paraffin wax in order to be cut into very thin slices. The following steps of this process are to take 8 2013 Nikolaos Tzoannou these thin slices of tissue and wax, remove the wax using hot water and place the remaining slice of tissue on a glass slide where the process of staining follows. 2.1.2 Staining Staining is called the general practice that is used in microscopy where specific dyes are applied on tissue samples in order to highlight certain structures. These dyes consist of special chemical substances which react in certain ways when they are applied on certain biological matters such as proteins, nucleic acid etc. In the case of LIMM staining procedure, it is mostly common to use two specific staining methods which their names derive from the chemical substances that are used. 2.1.2.1 Haematoxylin and eosin (HEM&E) staining Haematoxylin and eosin (HEM&E) staining method is a very popular staining protocol which is being used for many years until now and is one of the most reliable methods to diagnose diseases such as cancer. Two chemical substances are used. The first one is Haematoxylin which is applied usually first and the second one is Eosin [23]. Haematoxylin which shows a blue-purple colour, stains nucleic acids, whereas Eosin has a pink colour and stains cytoplasm proteins [1]. Figure 2.1 shows an example of placenta tissue that has been stained using the H&E method. Figure 2.1: Placenta tissue stained with H&E method 9 2013 Nikolaos Tzoannou 2.1.2.2 Haematoxylin and Diaminobenzidine (HEM&DAB) staining Another very useful staining protocol, especially for the detection of blood vessels, is the use of Haematoxylin and Diaminobenzidine, an organic compound which when applied to tissue it presents an oxidation reaction to the iron ion which is contained in the heme groups of haemoglobin [25]. This reaction produces a brown colour. Tissue sample stained using this method show blue-purple stains which represent nucleic acids and brown colour stains for protein structures such as haemoglobin. Figure 2.2 shows a placenta tissue sample stained using the above protocol. Figure 2.2: Placenta tissue stained with H&DAB method 2.1.2.3 Multiple staining methods It is possible to apply more than one staining protocols on a single tissue sample when histopathologists need to visualise and detect multiple antigens. One of the main reasons to perform multi-staining is the detection of more than one antigen in the same cell [18]. To observe successfully multiple stains on the same tissue sample, the choice of staining techniques is important because of the possible cross-reaction between the different staining substances. Moreover, colour combinations are important in order to achieve higher contrast between them when two or more antigens are observed [18]. In this project, data produced by multi-staining methods are not used and the main focus is on H&DAB stained tissue samples. 10 2013 Nikolaos Tzoannou 2.1.3 Digitisation In order for these tissue samples to be analysed, they require firstly to be digitised. This is performed at LIMM by high resolution scanners provided by Aperio. These scanners produce high resolution virtual slides with 20x and 40x magnification capabilities [3] and image resolution which can be up to 55,000 x 50,000 pixels. Consequently, the size of such a virtual slide can be up to 1 GB. For this reason these slides are stored on servers at LIMM where researchers can access them remotely. LIMM servers also provide several installed Aperio algorithms for microvessel analysis and detection. Again, this process can be performed remotely. 2.2 Cancer and the importance of microvessel analysis As mentioned in the introduction of this report, cancer is called a broad group of diseases that mainly present unregulated cell growth in a certain human organ. One of the main symptoms of cancer is angiogenesis. Angiogenesis, which under healthy circumstances is completely normal and vital for the human, is the procedure of the creation of new blood vessels from existing ones. Angiogenesis is also observed in a tumours since the new created cells require blood supply in order to evolve. For this reason angiogenesis can be used as an early prognostic indicator of cancer in a specific organ. Microvessel density is highly associated with angiogenesis. It is a factor that can determine the existence of a tumour. Although changes in microvessel density, as a result of angiogenesis are a good indicator for the diagnosis of cancer, it is not by itself a proof for the existence of cancer [13]. With this to be considered, studies have proven the importance of angiogenesis and more specifically microvessel density and its association with epidermal growth factor receptor expression and tumour size, both being indications of cancer [20]. Another important application of microvessel analysis is the assessment of anti-angiogenic treatment of cancer. Anti-angiogenic treatment involves the administration of angiogenesis inhibitors as medicines to the patient. It has been shown that microvessel density shows a decrease after antiangiogenic treatment [29]. 2.3 Aperio software 11 2013 Nikolaos Tzoannou As mentioned before the data which is used consists of high resolution TIFF (SVS) images of around 1GB file size each. They are stored on a server at the Leeds Institute of Molecular Medicine (LIMM) where the Aperio software is also installed, which allows the user to perform microvessel analysis and other operations on the images on the same machine, by connecting to this server remotely. In addition, the user can also download small segments of the images for further investigation. Figure 2.3 shows an example of such an image along with the GUI of Aperio Imagescope program which is used to run microvessel analysis algorithms. Imagescope has several tools such as markers, region selection and pen tools which are used to annotate a specific region on the image. The algorithm takes several parameters as input such as filtering/smoothing level, dark staining threshold, light staining threshold, region joining parameter, vessel competition parameter, minimum vessel area threshold, maximum vessel area threshold, maximum vessel wall thickness, output histogram, number of bins, endothelial stain, background stain etc. The output parameters are: number of vessels, total analysis area, total stain area, average stain intensity, microvessel density, mean vessel area, median vessel area, standard deviation of vessel area, histogram results etc. Microvessel analysis using Aperio software consists of 5 stages [2]: I. II. Scan Digital Slide Colour Deconvolution III. Light and Dark Staining Thresholds IV. Completing Vessels V. Analysis Metrics Figure 2.3: Aperio Imagescope 12 2013 Nikolaos Tzoannou Figure 2.3 shows Aperio software performing the microvessel analysis algorithm on an HEM&E virtual slide using the default input parameters and generating a markup image. The markup image in this case consists of annotated regions (green colour). Although the algorithm manages to detect a number of vessels, it fails to detect another significant number of vessels and it requires a lot of tuning and tweaking to reach an acceptable level of performance. 2.4 Image pre-processing techniques 2.4.1 Segmentation by thresholding Segmentation by grey-level thresholding is the simplest method of dividing an image into regions with similar texture and defined borders. A very common application is the background or foreground pixel classification. To achieve this, every pixel of the image is converted initially into a single greyscale value by extracting the average value of the three colour values (Red Green Blue), in case of RGB images, and then a threshold value is defined. Then, we perform a search through every pixel of the image and if the pixel value is higher than the threshold, it is an object pixel [27]. Otherwise it is a background pixel. This method is very simplistic and rarely produces good results since a single threshold for a whole image is not sufficient because of the large variation in grey-level and background of the majority of the images in the real world. A way to overcome this problem has been described by Nobuyuki Otsu [22]. Otsu’s method basically proposes that every pixel in the image is classified as either foreground or background and then it computes an optimal threshold which minimises the intra-class difference of the foreground and background pixels. In order to achieve this, it searches for thresholds where the standard deviation of the weighted sum of the variances in the classes reaches a maximum and it takes the average of these two thresholds. This method yields satisfactory results even in cases where there is a large variation of greyscale changes in a single image and can be applied in many practical problems. 2.4.2 Colour Deconvolution Colour deconvolution is a basic technique which is used to determine the contribution of certain colours in images. In this case, colour deconvolution is used to calculate the effect of different stain levels on a virtual tissue slide. As mentioned before, hematoxylin, eosin and DAB stainings are used, 13 2013 Nikolaos Tzoannou which produce stains that are close to three different colours. Hematoxylin is blue, eosin is magenta (pink) and DAB is brown. The reason behind this procedure is that these different chemical substances indicate different biological materials. For example, hematoxylin is mainly used for the staining of the cell nuclei and eosin is used for the staining of the cytoplasm [24]. The majority of the slides usually have multiple stains in order to examine different structures and morphologies simultaneously. Briefly, colour deconvolution as described by Ruifrok and Johnston [24], is performed by applying an ortho-normal transformation of the RGB information matrix of the optical density (OD) of each channel for each stain. Ruifrok and Johnston show that the vector of the optical density of the three channels at a pixel is defined by the equation below y=CM (2.1) C is the 3x1 vector for the amount of the three different stains at that pixel and M is a normalised OD matrix for the combination of the three chemical substances. The theory behind colour deconvolution makes a basic assumption about the colour representation in these images. It is assumed that grey levels of the RGB channels are linear with the transmission (brightness) T. T is equal to the incident light over the transmitted light. This assumption is reasonable for this case. T = I0/I (2.2) Lambert-Beer’s law [15] describes that the intensity of light when passing through a specific stained specimen (Ic) is given by the equation 2.2. I0,C is the intensity of light when entering this specimen, A is the amount of stain and c is an absorption factor which characterises every stain for each of the three RGB channels. The subscript c indicates the detection channel. IC=I0,C exp(-AcC) (2.3) Because the grey level values are non-linear with the intensity values [15], the optical density for each RGB channel is defined as ODC =-log10(IC/I0,C)=A*cC (2.4) The equation 2.3 indicates the linear relationship between the OD of each channel and the concentration of absorbing material. For this reason, we use OD to separate the contribution of each stain colour in the image where every stain has a specific OD in each of the RGB channels. These values define the OD matrix of a specimen for each of the RGB channels. An example of such a matrix can be seen below. 14 2013 Nikolaos Tzoannou Table 2.1: Example of OD matrix for the RGB channels for each stain As mentioned before to perform colour deconvolution, an orthogonal transformation of the above matrix must take place. Before that, the matrix must be normalised in order to balance correctly the absorption factor of each stain. This is done by dividing each OD value by the length of the stain vector. For the example above, the normalised OD value would be equal to ̂ √ Similarly, a normalised OD matrix is constructed. For this example the matrix should be as shown at table 2.2. Table 2.2: Example of OD normalised matrix The colour deconvolution matrix D, which is the inverse of the above matrix, multiplied by the OD image as defined by the equation 2.1, yields to the orthogonal representation of the stains which form the image. For this example the colour deconvolution matrix can be seen below. Table 2.3: Example of colour deconvolution matrix The next four figures show how colour deconvolution seperates the contribution of the three different stains on a tissue slide which has been stained by HEM, Eosin and DAB. This analysis was performed by Aperio microvessel algorithm using different values of the endothelial stain red, green and blue parameters in order to obtain the three different contributions. 15 2013 Nikolaos Tzoannou Figure 2.4: Placenta tissue slide Figure 2.5: Hematoxylin stain (blue) Figure 2.6: DAB stain (brown) 2.5 Figure 2.7: Eosin stain (pink) Local pre-processing techniques 2.5.1 Gaussian smoothing filters A very important and useful local pre-processing method in computer vision is smoothening by using Gaussian filters [27]. The purpose of smoothing in general is to remove any noise and other small fluctuations that usually all images have. The drawback of this technique is the loss of important information since smoothing also blurs any sharp edges that may define structures important about the contents of the image. Smoothing techniques can be combined with the use of gradient operators, an edge enhancement technique, in order to indicate certain locations in the image. These gradient operators are using the local derivatives of the image function. These derivatives are usually bigger at certain locations of the image where significant changes in the image function take place and thus they can be indicated by gradient operators. A disadvantage of gradient operators is that they increase the level of noise in an image since they suppress low frequencies and noise is of high frequency. Smoothing is obtained by convolving each pixel of the grayscale image locally using a convolution 16 2013 Nikolaos Tzoannou mask (kernel) which is a Gaussian function. The equation of a Gaussian function in one dimension is given by equation 2.5 and in two dimensions is given by equation 2.6 [26][21] where σ is the standard deviation of the distribution. (2.5) (2.6) An application of this method was made in 1989 by Chaudhuri et al to detect blood vessels in retinal images [9]. Briefly, they used a matched filter based on two important properties of their dataset. Firstly, the blood vessels in retinal images have small curvatures and they can be approximated by linear segments. Secondly, these blood vessels appear to be darker than the background of the image and their grey level profile on the direction perpendicular to them, can be approximated by a Gaussian curve which is described by the equation below. A is the grey level intensity of the background, k is a measure of reflectance of the blood vessel and d is the distance between the point (x,y) and the centre of the blood vessel. ( ) ( ) (2.7) For this reason, a Gaussian kernel was applied to the image in 12 different orientations with 15 o angular difference between them. By this way, every possible direction of the blood vessels could be detected. The main advantage of this method is that it smoothens the image while preserving the continuous structure of the blood vessels and performs relatively well even when the contrast between blood vessels and background is low. The results of this implementation can be seen in figure 2.8. Figure 2.8: Typical retinal image (left) and the result of the matched filter application (right) 17 2013 Nikolaos Tzoannou 2.5.2 Edge detectors 2.5.2.1 Introduction Edge detection operators are a broad set of local pre-processing methods which are used to identify significant changes in the intensity (brightness) function in an image. Usually, these significant changes indicate edge pixels. Most of the times, edges of structures and objects in an image are independent from any changes of viewpoint and illumination. They provide us with solid structure information, important for the understanding of the content in an image. Although, sometimes due to the reduction of information caused by edge detectors, significant information might get lost which results to the misunderstanding of the content. Mathematically, an edge is vector variable attached to a single pixel which its magnitude indicates the magnitude of the gradient and direction shows the direction of maximum growth of the function. Typically, gradient direction is perpendicular to edge direction where edges are usually indicate region boundaries. Equations 2.8 and 2.9 show how the functions of gradient magnitude and gradient direction ψ are calculated. The notation arg(x,y) means the angle from axis x to the point (x,y) calculated in radians [27]. | ( )| √( ) ( ( ) , ) (2.8) (2.9) 2.5.2.2 The Laplacian Another linear differential operator that can be used when we are not interested in the direction of the edge is called the Laplacian. It takes into account only the edge magnitude and it is defined as ( ) ( ) ( ) . (2.10) These edge detectors are tuned depending on the type of edge profile that will be applied on. Figure 2.9 shows some typical edge profiles where these operators can be used [27]. 18 2013 Nikolaos Tzoannou Figure 2.9: Typical edge profiles 2.5.2.3 Zero-crossings of the second derivative An important category of edge detection operators is those which are based on the zero-crossings of the second derivative. The basic concept of these operators is that it is easier to find these points where the first derivative of the image function has a maximum value and consequently the second derivative should reach zero at that position. The typical procedure which is followed is that, firstly a smoothing filter is applied in order to remove noise and secondly, the second derivatives are computed. 2.5.2.4 The Laplacian of Gaussian In order to obtain smooth filtering while having the response of the filter to be from local points in the image we use a Gaussian filter. Because of the nature of the Gaussian distribution and since σ (standard deviation) is the only parameter of such a filter, it is proportional to the size of the neighbourhood which it operates. Furthermore, pixels located up to 3σ from the centre have influence on the filtering result. The combination of applying a Gaussian filter to smooth an image along with the Laplacian operator to detect the edges is called the Laplacian of Gaussian. The result is to obtain using convolution, a second derivative of a smoothed 2D function f(x,y) [27]. Equation 2.11 shows the mathematical expression of such an operator. ( ) 19 ( ). (2.11) 2013 Nikolaos Tzoannou 2.5.2.5 Canny edge detection operator The Canny edge detector is a special edge detection operator which follows a multi-stage algorithm in order to obtain the optimal edge detection [8]. Optimality in this aspect is defined by three criteria. The first one is the detection performance in a sense that every important edge should be detected. Secondly, the localization criterion suggests that the distance between the edge as detected by the algorithm and the actual edge should be minimal. The third and last criterion is that the response of a specific edge should be also minimal. A detected edge should be marked only once and any additional responses for the same edge should be rejected. The several stages of the Canny edge detector are the following. First of all, the image data should go through a noise reduction operation. The convolution with a Gaussian filter of a specific σ value is such an operation. The reason for this operation is because Canny detector is very sensitive to noise. The next stage is to identify the intensity gradient of the image. This can be done by any edge detector which calculates the first derivative values in both the horizontal and vertical direction. This information is necessary since Canny edge detector uses four filters to detect edges in the filtered image in four different directions (0, 45, 90 and 135 degrees) that edge directions are rounded to. The next stage involves a search through all four possible directions and a comparison between the gradient magnitude and the magnitude of the pixels in that direction. For example, when the rounded gradient angle is 90 degrees, if a point has a gradient magnitude greater than the magnitudes at the pixels in the east and west directions, is located on the edge. This stage is called the non-maximum suppression and results to a set of edge points. The following stage is called thresholding with hysteresis. False responses caused by noise in the image can produce false edge contours or even break up an edge contour into small segments. This effect can be surpassed by setting a low and a high threshold, so the operator always response between them. More specifically, individual responses higher than the threshold always define edge pixels and responses above the low threshold are likely to be edge pixels too, if they are connected to pixels with high responses. Otherwise, individual weak responses are usually noise. By the end of this last process we obtain a binary image where every pixel is either an edge pixel or a non-edge pixel. 20 2013 Nikolaos Tzoannou 2.5.3 Morphological operations on binary images There are some useful operations that can be performed on binary images which affect the form or shape of an object in an image. These operations help us to obtain specific information such as boundaries or skeletons of an object or a structure. 2.5.3.1 Dilation and Erosion The main morphological operations on binary images in computer vision are dilation and erosion [30]. Generally, dilation expands an object to a certain degree by filling holes and connecting neighbouring objects together. Erosion is the opposite operation where the boundaries of an object shrink to a certain point. Both operations are configurable in terms of the size of the neighbourhood that will be applied on. For both operations, firstly the structural element must be defined. Typically this has much smaller size than the original image. Figures 2.10 and 2.11 show an example of a binary image and a typical structural element. Figure 2.10: Example of a binary image Figure 2.11: Typical 3x3 structural element The mathematical morphology of these operations is based on the set theory where the union of two sets is the total of elements belonging to either one of the two sets and the intersection is the elements that belong to both sets at the same time. This can be easily expanded to represent pixels in a binary image. This can be seen in figures 2.12 and 2.13 [14]. 21 2013 Nikolaos Tzoannou Figure 2.12: Structural elements representing sets A (left) and B (right) Figure 2.13: Union representation (left) and intersection representation (right) of sets A and B To perform dilation, a structural element is applied on the image and if the pixel in the centre of the structural element matches a pixel of the image with the same value, the logical addition of the two sets takes place for this specific neighbourhood of pixels. Similarly, when performing erosion the logical disjunction of the two sets takes place. Examples of dilation and erosion for the binary image in figure 2.10 combined with the structural element in figure 2.11 can be seen in the next figure. Figure 2.14: Dilation (left) and erosion (right) of the binary image using the 3x3 structural element 22 2013 Nikolaos Tzoannou 2.5.3.2 Opening and Closing The morphological opening and closing operations are combined sequences of dilation and erosion. More specifically, opening is defined as the operation of an erosion followed by a dilation and closing as the operation of a dilation followed by an erosion. Opening can be used to remove pixels of regions which cannot contain the structural element. Closing is useful when we want to remove gaps or fill in holes that might exist inside an object. Although both techniques use the same two methods, erosion and dilation, they yield completely different results due to the order which they are applied. Examples of morphological opening and closing can be seen in figures 2.15 and 2.16. Figure 2.15: Example of morphological opening on a binary image using a 3x3 square structuring element Figure 2.16: Example of morphological closing on a binary image using a 3x3 square structuring element 2.5.3.3 Skeletonisation Skeletonisation on a binary image is the process of extracting a subset of the original object which shows the fundamental structure of it. The skeleton consists only of single pixels which are connected together, expanding towards every direction without having any gaps between them, according to the 23 2013 Nikolaos Tzoannou shape of the object which is being skeletonised. A common method to perform skeletonisation is to iterate a thinning algorithm, usually sequences of successive erosions, while preserving the connectivity of the pixels until no further change can be made. In general an optimal skeleton must comply with the three following criteria [19]: a. The skeleton should be connected and consist of single pixels only. b. Erosion beyond the point of which a feature is represented should be detected and avoided. c. The skeleton should not be affected by noise or any other small convexities which are not part of the object’s shape. Figure 2.17 shows an example of skeletonisation performed on a simple binary image. It is noticeable that the resulting skeleton is connected and indicates the shape of the original object. Figure 2.17: Example of skeletonisation on a simple binary image 2.6 Region Labeling Connected component labelling in binary images is an algorithmic method which is widely used in computer vision applications in order to detect connected regions and uniquely label them. Generally the algorithm goes through the image twice. Thus, it is called two-pass algorithm. During the first pass it identifies equivalences and labels each pixel temporarily. During the second pass it labels again each pixel with the final label of its equivalence class. The algorithm can use either 4connectivity, which means that connectivity checks are performed only between the north and the west neighbour of each pixel, or 8-connectivity where pixels north-east and north-west of current pixel are additionally checked. Because the algorithm creates continuously new labels, it can be significantly slow. In order to speed up this process, the algorithm uses a disjoint-set data structure [10]. This data structure provides an interface which makes it easier to keep track of equivalence relationships. 24 2013 Nikolaos Tzoannou Chapter 3 Project Analysis & System Design 3.1 Schedule In order for a project to be successful and deliver in time, it is crucial to have a solid plan which will act as a guide throughout the project’s lifetime. As mentioned in the introduction, the workload of the project was divided according to estimations of the amount of time needed for each of the different stages. Initially, the design stage of the project was scheduled for the 7th week. Due to certain characteristics of this project, the design phase continued in the weeks after. More specifically, most of the design of the algorithm took place along with the early stages of the implementation. There were two main reasons for this practice. Firstly, the initial design phase was based on estimations. Secondly, during the actual implementation, methods and techniques that were originally decided to be included in the pipeline, were proved inefficient which led to the need of redesigning important stages of the implemented system. This was the major change which occurred upon the initial plan for the whole project and affected significantly the implementation phase. Consequently, the implementation stage lasted longer, leaving less time for the evaluation and the report write-up as it was shown in the revised schedule. 3.2 Design methodology In order to design the proposed pipeline, several experiments and tests took place. Initially, Aperio algorithm was tested in order to determine its performance. Aperio provides the users with the pipeline that uses, although it is very abstract since it is written for histopathologists and physicians. Most of the technical details and methodologies are hidden from the users providing only a set of adjustable parameters. After extensive experimenting on several tissue slides with a variety of stains, 25 2013 Nikolaos Tzoannou it was found that Aperio software performed optimally only after excessive tweaking of its parameters. Furthermore, different parameterisation was needed for each case depending on the stain used, type of tissue and size of blood cells. The post-experiment analysis showed that Aperio algorithm failed to classify significantly large and connected blood vessels. An example of this poor performance was presented in Figure 2.3 of the background research chapter. The first design step was to experiment with the methods and techniques presented in Chapter 2 on the specific dataset of this project. Colour deconvolution was the first method that was tested extensively on a set of 10 different placenta tissue slides stained with HEM&DAB. Same as the Aperio algorithm, it was decided to include it as the initial pre-processing method of the implemented pipeline since it performed significantly well in extracting the desired stain from the image. Similarly, inspired by the algorithm regarding the detection of blood vessels in retinal images, which was presented before, a filtering method was found to be necessary. The reason for including a filter is that noise and unnecessary information needed to be removed while at the same the implemented system was aiming to detect certain structures in the images (i.e. closed elliptical shapes). The design of the Gaussian filter itself was made during the implementation process. Due to certain difficulties in obtaining optimal results, the filter had to be redesigned many times during the implementation of the algorithm. This was also the case for the morphological methods since they were depending heavily on the results of the previous stages of the pipeline. 3.3 Implementation methodology There were two important factors that determine the development methodology that was followed. First of all, the lack of any prior experience in large scale projects and more specifically in designing and implementing a system of such kind, made it hard to predict the necessary requirements and features of the implemented algorithm. Secondly, due to the complexity of the images in terms of the large variance in structure of blood vessels between different tissue slides along with the significant amounts of noise in the images, the product needed to be redesigned, implemented and tested several times to reach an acceptable level of performance. For the above reasons, iterative development was selected as the appropriate model of development. Iterative development is an agile software engineering methodology very popular in the recent years due to the flexibility it provides, especially in cases where limited factors can be predicted in advance. The main idea consists of repeated cycles (iterations) of the design, implementation, testing and possibly evaluation stages of the development of a system, in order to achieve the optimal final result 26 2013 Nikolaos Tzoannou [17]. Each iteration provides the developer with essential new information, obtained by using and testing the present version of the system, which can be used in the next improved version and usually each cycle adds new functional capabilities. Although the iterative model usually includes the reconsideration of the system requirements, in this case this was not necessary. The most important advantage of iterative approach is that it delivers an early version of the product sooner than other methodologies, which the developer can work on it and improve. In extend of this, it allows early improvements from the first iterations. Furthermore, improvements are accurate since they are based on the experience gained from the most recent testing phase. 3.4 Programming Language Choosing the suitable programming language for implementing the system was very crucial since it can affect the result of the project in a large scale. Throughout my studies during my Computer Science degree, several languages were used (Python, Java, C, Matlab etc), each of them being significantly better in certain projects and applications than the others mainly because of the different characteristics each one has, such as being object oriented or functional, high level or low level. Computer vision applications are usually implemented in C, C++ or Matlab with C being the fastest and most efficient since it is a low level language. However for this project MATLAB was chosen for the following reasons: My exposure to Matlab was significantly larger than any other language during my degree. More specifically, labs and assignments for the Computer Vision module used Matlab. Thus, my confidence in implementing the system in Matlab was much higher. Since Matlab is not as low level as C it requires less complex coding which means less time spending in coding and debugging, both procedures being very time-consuming when implementing a system in a low level language. The limited overall time for the project made it almost necessary to use Matlab. Matlab provides the user with many built-in functions for image processing such as the Image Processing Toolbox which makes it much easier to implement certain computer vision techniques. Matlab uses matrices (arrays) as the main data structure which makes it even more suitable for image processing, since images are in a sense, large matrices. Plotting abilities of Matlab makes it easier to visualise the results of the evaluation stage. 27 2013 3.5 Nikolaos Tzoannou Evaluation The quantitative evaluation of the algorithm involves the comparison of the results obtained from the implemented proposed algorithm with the Ground Truth as provided by expert pathologists. More specifically, the algorithm is tested using 10 different tissue slides stained with HEM&DAB. The final output of the implemented pipeline produces a text file which contains all the regions of the image which have been identified as blood vessels. Ground truth is obtained with the help of Aperio Imagescope. The same tissue slides that have been used to test the algorithm, are hand-annotated by pathologists, marking the blood vessels that actually exist and should be detected. Imagescope produces an XML file containing these annotated regions. Through an XML parser, another text file is produced containing this information about the regions in the same form of the results text file. These two text files are then read by the evaluation program which computes the Hausdorff distance between the Ground Truth and the algorithm result for each region (i.e. blood vessel). One of the final steps of the evaluation involves the completion of the matching between the ground truth regions and the detected regions and also the identification of any false detections of the implemented system. This is done by an implementation of the Hungarian algorithm which matches regions of the Ground Truth and regions of the results which have the lowest Hausdorff distance between them. Furthermore a threshold is defined where any matching above it, is classified as false matching. The final phase of the evaluation involves the visualisation and the statistical analysis of the results. 3.5.1 Hausdorff distance The Hausdorff distance is a mathematical metric which computes the distance between two subsets. It is the longest distance between any two points of these two subsets [28]. It is used in computer vision to determine the similarity of a structure to a specific template in binary images. Hence, if the Hausdorff distance between a set of points of a structure in a binary image and a set of points of a template is zero, the two sets are identical and they match. Similarly if it is very large, the two sets represent different shapes or structures. 28 2013 Nikolaos Tzoannou Chapter 4 Implementation 4.1 Introduction The initial design of the algorithm suggested that the pipeline would have as its initial stage the colour deconvolution of the tiusse slide. Thereafter, the desired stain would go through a filter which will enhance the structures we wish to detect. The next stages of the pipeline, involve the detection of the blood vessels using morphological operations on binary images. Finally the pipeline would output a text file containing information about the blood vessels that have been detected. In this chapter the implementation phase of the project is presented in detail. A flowchart presenting the algorithm’s pipeline can be seen in figure 4.1. As mentioned before, large part of the design took place along with the implementation, specially the design of the Gaussian filter and the morphological operations that were necessary. This iterative approach proved to be very efficient since the initial requirements were met, although it was time consuming which had as a result the implementation phase to last longer than the initially planned time. 29 2013 Nikolaos Tzoannou Figure 4.1: Pipeline flowchart of the proposed algorithm 4.2 Data collection LIMM stores hundreds of stained tissue slides on its servers providing the ability for researchers to examine them and run several experiments on them. For the implementation of the algorithm we used a virtual tissue of a pathological placenta stained with HEM&DAB, since the DAB stain is used to highlight the existence of blood vessels. Again for the implementation phase a small segment of the tissue slide was used instead. The resolution of the image was 686x991 pixels and because it was magnified 20 times, the actual size of the tissue segment was approximately 650 μm to 350 μm (1 μm = 10-4 cm). 30 2013 4.3 Nikolaos Tzoannou Load image into memory The first step of the pipeline is to load the tissue slide image into the memory. The image is stored in an array (matrix) data structure. The matrix has the same size as the image in pixels. Each element of the matrix represents a pixel in the image and contains three values, one for each colour (RGB). The data is now ready to be pushed to the next stage of the pipeline. 4.4 Colour deconvolution This stage of the pipeline involves the colour deconvolution of the original image in order to acquire the contribution of each stain. For the implementation, a colour deconvolution function written in C by Tom Macura which was compiled as a Matlab function, was used. This particular function provides the user with the ability to select, based on a specific argument, the type of stain that he/she is interested in. Figure 4.2 shows the original image of the tissue slide and the DAB stain grayscale image obtained from colour deconvolution. The stain image is stored in memory and it passes to the next step of the pipeline. Figure 4.2: Original tissue slide (left) and the DAB stain contribution (right) 4.5 Gaussian filter The next step of the algorithm involves the convolution of the stain image with a specific Gaussian filter. The equation 2.7 in chapter 2 describes the Gaussian function that is used. A standard deviation 31 2013 Nikolaos Tzoannou of σ = 0.1 and a distance of d=10 pixels were chosen since, after experimenting, it was found that with these values, even very small blood vessels (with a diameter ≈ 10 pixels) are enhanced and a significant amount of noise is removed without losing any important information. The results of the Gaussian filter convolution are shown in figure 4.3. Figure 4.3: The result of the Gaussian filter convolution (right) on the DAB stain image (left) 4.6 Segmentation The next stage of the pipeline is to pull the convolved image and create a binary image from it. The necessity of a binary image lies in the fact that it is much easier to perform morphological operations on them and detect structures and contours. The initial design of segmentation suggested choosing an appropriate threshold based on Otsu’s method as it is described in Chapter 2. Again, after several tests on different images it was found that due to the particularity of these images, Otsu’s method proposed a threshold which performed rather poorly in separating the foreground from the background. Hence, the segmentation step was redesign and a threshold based on experiments with different images was chosen. After extracting the binary image, it is stored into memory and pushed to the next step of the algorithm. The result of binarisation can be seen in figure 4.4. 32 2013 Nikolaos Tzoannou Figure 4.4: Segmentation of the grayscale image by thresholding 4.7 Skeletonisation After extracting the binary image, the next stage is the morphological operation of skeletonisation in order to extract the basic shape of the contours of the blood vessels. The result was to extract the majority of skeletons in the image without breaking apart any objects. However some objects lost their connectivity which led to the need of implementing a way of connecting any hanging edges and closing the open structures. The next figure shows the result of the morphological operation of skeletonisation on the binary image. Figure 4.5: The result of performing skeletonisation on the binary image 33 2013 4.8 Nikolaos Tzoannou Edge extension As mentioned before, due to the fact that some contours lost their connectivity, a method that will close any disconnected edges needed to be implemented. The basic idea behind the method that was implemented is to extend the end points of each edge, towards the direction they tend to have until it reaches another pixel of the foreground. More specifically, if a certain 3x3 pixel neighbourhood has a certain form that shows continuity to a certain direction, then this neighbourhood is reverted and by using a logical AND operation is added to the original neighbourhood. By this way, the extension of the edge towards the trending direction is accomplished. An example of this algorithm is shown below. reversion logical AND` Figure 4.6: Example of edge extension algorithm Figure 4.7 shows the results of the above algorithm on the skeletonised image obtained at the earlier stage of the pipeline. It can be seen that the majority of the disconnected contours are now connected. Furthermore the algorithm produced some unnecessary extra edges which will be removed at a latter stage. Figure 4.7: The skeletonised image before the edge extension (left) and after the extension (right) 34 2013 4.9 Nikolaos Tzoannou Connected components labelling The next step of the algorithm is to label the connected components in order to find and extract every connected region in the image. These regions should now represent the blood vessels as they have been detected. In order to implement this operation, the built-in connected components labelling function of the Image Processing Toolbox of Matlab was used. To avoid neighbouring objects to be considered as one, 4-connectivy was used instead of 8. An example of this operation is shown in figure 4.8. It is noticeable that if 8-connectivity was used, objects 2 and 3 would be considered as one single object [16]. Figure 4.8: Example of bwlabel Matlab function on a binary image containing 3 connected components The connected components regions as identified are shown in figure 4.9. For visualisation purposes, each region is labelled with a different colour. 35 2013 Nikolaos Tzoannou Figure 4.9: Labelled regions as they have been identified by Matlab’s connected components function 4.10 Edge detection At this stage of the algorithm, according to the initial design a Canny edge detector would be applied in order to detect the edges of every region and identify the blood vessels. During the first iterations of the implementation and after excessive testing, it was decided that Canny edge detector was not suitable. Its performance was poor since it failed to identify every edge and on top of that, a significant amount of the noise which was created during the extension phase was identified as edges. Instead of a Canny detection operator, an implementation of Moore-Neighbour tracing algorithm to trace the exterior boundaries of the detected regions was used. The idea behind the Moore-Neighbour tracing algorithm is the following [11]: A neighbourhood of 8 pixels around a starting pixel is defined. These are the 8 directions including the diagonals. Cycle either clockwise or anticlockwise (it makes no difference as long as it is consistent throughout the algorithm) through the 8 pixel neighbourhood beginning from the starting pixel. The first foreground pixel that is found, mark it as a boundary pixel. Cycle again through the neighbourhood of the previously detected boundary pixel. 36 2013 Nikolaos Tzoannou Repeat the same process until every pixel in the image has been searched in a similar manner 4.11 Pipeline output The last stage of the implemented pipeline is to visualise the results for the qualitative evaluation of the procedure and the extraction of the detected region information to a text file for the quantitative evaluation. In order to extract the data needed for the quantitative evaluation, a loop through each region was implemented extracting the coordinates of every point of the detected boundaries and storing them to a text file. The text file contains three columns. The first two columns are the x and y coordinates of a pixel and the third column contains a number representing the region that this pixel belongs. Figure 4.10 shows the visualisation of the final output of the pipeline plotted on top of the original tissue image. Figure 4.10: Final output of the implemented algorithm for blood vessel detection. The marked regions in green colour annotate the detected blood vessels. 37 2013 Nikolaos Tzoannou Chapter 5 Evaluation For the quantitative evaluation of the implemented system a piece of code was written which receives as input the two text files, one for the Ground Truth and one for the results of the detection, containing the region information. From these two files, the evaluation program computes the similarity between the two sets and matches the detected blood vessels to the ones from the Ground Truth. This evaluation program was also implemented in Matlab since it is easier to handle and perfrom operations in large matrices. In this chapter, a detailed description of the evaluation process is presented along with some statistical analysis of the evaluation results. 5.1 Ground truth extraction As we described before, hand-annotated images of the tissue slides were produced prior the detection, which contain the Ground truth for this specific segment of tissue. A total of 10 different segments of tissue slides were used for the evaluation of the proposed algorithm. Using Aperio Imagescope an XML file for each tissue slide was extracted containing the coordinates of the pixels of every blood vessel according to the ground truth. Then, using an XML parser written in C by Frank Vanden Berghen [4], 10 text files have been created from the XML files. 38 2013 5.2 Nikolaos Tzoannou Similarity matrix With the use of the coordinate information for both the ground truth and the detected blood vessels, the evaluation program produces a similarity matrix based on the Hausdorff distance between every blood vessel from the ground truth and every vessel of the detected regions set. In detail the process is the following. Firstly, the two sets are read from the two text files; one for the ground truth and one for the detected blood vessels. An example of the format of these files is shown below. Table 5.1: Example of the format used for the region information text files. The first column contains the X coordinate, the second column the Y coordinate and the third column indicates the region that this point belongs. The similarity program outputs a two-dimensional matrix where each i,j element of the matrix indicates the Hausdorff distance between the ith region of the ground truth set and the j th region of the detected vessels set. If the Hausdorff distance between two regions of the two sets is zero, it means that the two regions are exactly the same and this particular blood vessel has been 100% detected properly. Similarly, if the Hausdorff distance is very large, the two regions are irrelevant. A visualaisation of a similarity matrix is show in figure 5.1. The vertical axis represents the detected blood vessels set and the horizontal axis represents the ground truth set. Warm colours indicate a high Hausdorff distance value which means low similarity between the two regions. Similarly, cold colours indicate low Hausdorff distance value, meaning high similarity between the two regions. 39 2013 Nikolaos Tzoannou Figure 5.1: Visualisation of the similarity matrix between the two sets 5.3 Blood vessel matching The next stage of the quantitative evaluation process is the optimal matching between the two sets of blood vessels. In order to accomplish this, a Matlab implementation of the Hungarian algorithm is used written by Markus Buehren [5]. This algorithm matches each blood vessel of the ground truth set to a vessel of the results set. The implementation of this algorithm returns an assignment column vector which contains the matching blood vessel of the results set to each vessel of the ground truth set. An example of this vector can be seen in figure 5.2. In this example, the 9th region (blood vessel) of the results set is matched to the 1st region (blood vessel) of the ground truth set, the 76th to the 2nd etc. It is noticeable that in line 18 the value is zero. This means that no blood vessel from the detected ones matches with the 18th vessel of the ground truth. For optimal results, a threshold is applied on the similarity matrix in order to remove completely irrelevant regions. This threshold was found experimentally and was usually between 20 to 40 pixels depending on the image. Consequently, if the Hausdorff distance was larger than this threshold, the value for these two regions in the similarity 40 2013 Nikolaos Tzoannou matrix was set to infinity. By this way, there is no attempt at all to match this particular pair of regions by the matching algorithm. Figure 5.2: Example of assignment column vector as produced by the Hungarian algorithm implementation 5.4 Matching evidence In order to have a representative analysis of the evaluation, some statistical features need to be extracted and discussed. First of all, 10 different tissue slide segments were used. For each of these images, the similarity matrix and the matching allocation were computed. From the number of repetitions of 0 in the assignment vector, we can determine an initial amount of false positives (i.e. blood vessels that have been falsely detected, having no matching in the ground truth set). The remaining blood vessels appear to have a matching in the ground truth set. 41 2013 Nikolaos Tzoannou In the next pages, a statistical analysis for the 10 image samples that were used for the evaluation is presented. This includes evidence of the detection and the matching along with a table containing the numbers of True Positives (detected and matched blood vessels), False Positives (detected but not matched blood vessels), detected blood vessels in total (TP + FP) and False Negatives (undetected blood vessels according to the ground truth). Figure 5.3: Visualisation of matching and detection for tissue sample (a). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). Figure 5.4: Visualisation of matching and detection for tissue sample (b). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). 42 2013 Nikolaos Tzoannou Figure 5.5: Visualisation of matching and detection for tissue sample (c). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). Figure 5.6: Visualisation of matching and detection for tissue sample (d). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). 43 2013 Nikolaos Tzoannou Figure 5.7: Visualisation of matching and detection for tissue sample (e). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). Figure 5.8: Visualisation of matching and detection for tissue sample (f). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). 44 2013 Nikolaos Tzoannou Figure 5.9: Visualisation of matching and detection for tissue sample (g). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). Figure 5.10: Visualisation of matching and detection for tissue sample (h). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). 45 2013 Nikolaos Tzoannou Figure 5.11: Visualisation of matching and detection for tissue sample (i). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). Figure 5.12: Visualisation of matching and detection for tissue sample (j). On the left image are shown the detected blood vessels, annotated in green colour, after thresholding the Haussdorf distance in the similarity matrix. In yellow colour is shown the ground truth. On the right image, annotated in pink colour are the matched to the ground truth blood vessels (true positives). 46 2013 Nikolaos Tzoannou For each tissue sample the evaluation program outputs its results which are presented in the table 5.2. From this information a statistical analysis is also presented using the concepts of precision and recall in order to evaluate the accuracy of the algorithm. Tissue Ground Truth samples Detected in Matched blood Unmatched Undetected total (TP+FP) vessels (TP) blood blood vessels vessels (FP) (FN) a 77 144 61 83 16 b 39 100 35 65 4 c 33 83 27 56 6 d 52 102 52 50 0 e 35 75 27 48 8 f 43 132 40 92 3 g 27 53 22 31 5 h 31 66 30 36 1 i 33 53 29 24 4 j 31 71 27 44 4 Table 5.2: Table of True Positives, False Positives, detected blood vessels in total (TP + FP) and False Negatives for each tissue sample 5.5 Precision and Recall In statistics and especially in classification problems, precision and recall are widely used to evaluate the performance of a classification process. In this context precision is defined as and recall as . Consequently, when precision is equal to 1, it means that there no false positives at all. When recall is equal to 1, it means that FN = 0, hence the algorithm has detected successfully every blood vessel in the ground truth. These two statistical measures are rarely presented in isolation. Another measure it is used, which is called the f-measure (or f-score). F47 2013 Nikolaos Tzoannou measure combines precision and recall to compute a harmonic mean in order to measure the accuracy . The of the classification. The definition of f-measure is the following. following table contains the precision and recall values along with the f-score for each tissue sample. Figure 5.13 shows a very informative Recall-Precision diagram including the f-measure values as contour lines for each of the tissue samples. Tissue sample Precision Recall F-score a 0.42 0.79 0.55 b 0.35 0.89 0.50 c 0.32 0.81 0.46 d 0.51 1.00 0.67 e 0.36 0.77 0.49 f 0.30 0.93 0.45 g 0.41 0.81 0.55 h 0.45 0.96 0.62 i 0.54 0.88 0.67 j 0.38 0.87 0.53 Table 5.3: Precision, Recall and F-score table for each tissue sample 48 2013 Nikolaos Tzoannou Figure 5.13: Precision – Recall diagram containing the f-score for each tissue sample 5.6 Observations It is noticeable that the proposed algorithm in almost all cases over-detects blood vessels, producing a large amount of false positives along with the true detected vessels. A possible reason for this behaviour could be the design of the Gaussian filter in the early stages of the pipeline. It seems that it needs further re-adjustments to successfully remove any noise and enhance the desired structures. Consequently, the edge extension stage of the pipeline, has created false edges from this noise which resulted in producing new connected components which the algorithm falsely detects as blood vessels. On the other hand, in almost all cases the algorithm manages to detect the vast majority of the blood vessels which are included in the ground truth set. For example, in tissue sample (d) the algorithm reaches a recall value of 100% which means that every blood vessel in the ground truth set has been successfully detected. 49 2013 Nikolaos Tzoannou Chapter 6 Conclusions The overall aim of this project was to present a pipeline of computer vision and image processing techniques in order to detect blood vessels in medical images of a certain type. The qualitative evaluation showed that the implemented algorithm performs well since there are very few undetected blood vessels. The quantitative evaluation showed that the algorithm succeeds in detecting the vast majority of blood vessels in the tissue slide, although it produces a significant amount of false positives. 6.1 Objectives & Requirements The objectives and minimum requirements defined in chapter 1 have been completed. The extensions described in chapter 1 have also been completed to a certain degree. More specifically, the aim and objectives have been met as described below: Research and understand the computer vision methods and techniques used in the field of histopathology and blood vessel detection has been accomplished as described extensively in chapter 2. The testing and the qualitative evaluation of the commercial algorithm which LIMM uses has also been accomplished as presented in chapter 2. 50 2013 Nikolaos Tzoannou Experimenting with different methods of microvessel detection and analysis for learning, understanding and design purposes of the proposed algorithm has taken place. Chapters 2 and 3 describe this process thoroughly. The design and the implementation of the proposed algorithm have been accomplished. The implementation meets the minimum requirement which is a method for detecting blood vessels in medical images. Moreover, it has met the extensions up to a certain degree, since it visualises the detection and also outputs a text file with quantitative information of the detection. The design and implementation phases are presented thoroughly in chapters 3 and 4. Evidence of blood vessel detection is presented in Appendix C. Quantitative accuracy evaluation using ground truth provided by expert pathologists has been accomplished as described in chapter 5. Qualitative evaluation has also taken place by providing to pathologists examples of the algorithm’s output. 6.2 Possible extensions and future work One possible extension for further development of the system would be to produce an XML file from the output of the detection algorithm. This XML file, assuming that agrees with the format used by Aperio Imagscope could be used for importing the annotated regions and visualise them on Imagescope. This feature can be very useful for the pathologists, considering that they rely heavily on this piece of software to analyse and visualise blood vessel detection. Another extension would be the calculation of the area (in μm2) which is covered by blood vessels in this specific tissue segment. Pathologists could use this information to diagnose angiogenesis in a certain organ by comparing tissue samples from different time periods. The algorithm could be extended to detect other structures besides blood vessels, assuming that there is a suitable staining method to perform colour deconvolution and some ground truth is provided to design the necessary filters. A last possible extension could be to provide a supervised machine learning interface where the system will learn from the results that produces which then are evaluated by the user. By this way, the detection and classification of blood vessels would be significantly improved in terms of accuracy. 51 2013 6.3 Nikolaos Tzoannou Conclusion This project was about developing and evaluating an alternative pipeline to the algorithm used by histopathologists at LIMM for blood vessel detection. The implemented algorithm uses a series of computer vision methods which were described extensively and proved to be suitable for this application. The evaluation of the algorithm showed that meets the original criteria and it performs as expected, although there is still room for improvement. 52 2013 Nikolaos Tzoannou Bibliography [1] Andrew H. Fischer, Kenneth A. Jacobson, Jack Rose and Rolf Zeller. Hematoxylin and Eosin Staining of Tissue and Cell Sections. Cold Spring Harbor Protocols, doi:10.1101/pdb.prot 4986, 2008. [2] Aperio Algorithm User Guide: Microvessel Analysis, the University of Chicago, 2011 http://htrc.uchicago.edu/APERIO/downloads/HTRC_microvesselUG.pdf [3] Aperio ScanScope Systems. http://www.aperio.com/lifescience/capture [4] Ir. Frank Vanden Berghen. Small, simple, cross-platform, free and fast C++ XML Parser. http://www.applied-mathematics.net/tools/xmlParser.html [5] Markus Buehren. Functions for the rectangular assignment problem, Matlab central file exchange, 2004. http://www.mathworks.com/matlabcentral/fileexchange/6543 [6] Cancer mortality for all cancers combined. Cancer Research UK. http://www.cancerresearchuk.org/cancer-info/cancerstats/mortality/all-cancerscombined/newpagetemp/ [7] How many different types of cancer are there? Cancer Research UK. http://www.cancerresearchuk.org/cancer-help/about-cancer/cancer-questions/how-many-differenttypes-of-cancer-are-there/ [8] Canny J. A Computational Approach to Edge Detection, IEEE Trans. Pattern Analysis and Machine Intelligence, Vol.8(6), pages 679–698, 1986. [9] Subhasis Chaudhuri, Shankar Chatterjee, Norman Katz, Mark Nelson, Michael Goldbaum. Detection of Blood Vessels in Retinal Images Using Two-Dimensional Matched Filters. IEEE Transactions on Medical Imaging, Vol.8, No. 3, pages 263-269, 1989. [10] Michael B. Dillencourt, Hannan Samet, Markku Tamminen. A general approach to connectedcomponent labelling for arbitrary image representations. Journal of the ACM, Vol.39 Issue 2, pages 253-280, 1992. 53 2013 Nikolaos Tzoannou [11] Gonzalez, R. C., R. E. Woods, S. L. Eddins. Digital Image Processing Using MATLAB, New Jersey, Pearson Prentice Hall, 2004. [12] Erika Check Hayden. Cutting off cancer's supply lines. Nature International weekly journal of science, 458, pages 686-687, 2009 . [13] Lynn Hlatky, Philip Hahnfeldt and Judah Folkman. Clinical Application of Antiangiogenic Therapy: Microvessel Density, What It Does and Doesn't Tell Us. Journal of the National Cancer Institute, Vol. 94, Issue 12, pages 883-893, 2002. [14] Introduction to morphology operations on images. Computer vision talks, http://computer-visiontalks.com/2011/02/introduction-to-morphology-operations-on-images/ [15] Jahne B. Practical handbook on image processing for scientific applications CRC Press, Boca Raton, Florida, pages 76-82, 1997. [16] Label connected components in 2-D binary image. Matlab documentation centre, http://www.mathworks.co.uk/help/images/ref/bwlabel.html [17] Craig Larman , Victor R. Basili. Iterative and Incremental Development: A Brief History. IEEE Computer Society, Vol.36, Issue 6, pages 47-56, 2003. [18] Chris van der Loos. User Protocol: Practical Guide to Multiple Staining. University of Amsterdam, Academic Medical Centre. http://www.biotechniques.com/multimedia/archive/00074/CRI-FP-Microscopy_74545a.pdf [19] G. S. Ng, R. W. Zhou, C Quek. A Novel Single Pass Thinning Algorithm. Journal Pattern Recognition Letters, Vol.16 Issue 12, pages 1267-1275, 1995. [20] Y Nieto, J Woods, F Nawaz, A Baron, R B Jones, E J Shpall and S Nawaz. Prognostic analysis of tumour angiogenesis, determined by microvessel density and expression of vascular endothelial growth factor, in high-risk primary breast cancer patients treated with high-dose chemotherapy. British Journal of Cancer, Vol.97, pages 391-397, 2007. [21] Mark S. Nixon, Alberto S. Aguado. Feature Extraction and Image Processing, page 88, Academic Press, 2008. [22] Nobuyuki Otsu, A threshold selection method from gray-level histograms. IEEE Trans. Sys., Man., Cyber. 9 (1), pages 62–66, 1979. 54 2013 Nikolaos Tzoannou [23] Routine Mayer's Hematoxylin and Eosin Stain (H&E), Manual of Histologic Staining Methods of the Armed Forces Institute of Pathology (Third Edition). American Registry of Pathology ( Luna, Lee G., HT(ASCP) (editor)), New York, 1960. [24] Arnout Ruifrok, Dennis Johnston. Quantification of histochemical staining by color deconvolution. Analytical and Quantitative Cytology and Histology, Vol.23, Issue 4, 2001. [25] P. Sahs. DAB: An Advancement in Blood Print Detection. Journal of Forensic Identification, Vol.42, No. 5, page 412, 1992. [26] Shapiro, L. G., Stockman, G. C. Computer Vision, pages 137, 150. Prentence Hall, 2001. [27] Milan Sonka, Vaclav Hlavac, Roger Boyle. Image Processing: Analysis and Machine Vision (Third edition), 2008. [28] R. Tyrrell Rockafellar, Roger J-B Wets. Variational Analysis, page 117, Springer-Verlag, 2005. [29] Roland T. Ullrich, Jan F. Jikeli, Michael Diedenhofen, Philipp Böhm-Sturm, Maike Unruh, Stefan Vollmar, Mathias Hoehn. In-Vivo Visualization of Tumor Microvessel Density and Response to Anti-Angiogenic Treatment by High Resolution MRI in Mice. PLoS ONE 6(5): e19592, doi:10.1371/journal.pone.0019592, 2011. [30] Umbaugh Scot E. Computer Vision and Image Processing, Prentice Hall, NJ, 1998. [31] Bruce R. Zetter. Angiogenesis and Tumor Metastasis. Harvard Medical School, Boston, 1998 http://www2.uah.es/farmamol/Public/AnnReviews/PDF/Medicine/Angiogenesis_metastasis.pdf 55 2013 Nikolaos Tzoannou Appendix A Personal Reflection This project was the most challenging part of my degree studies. The piece of work I had to produce required endless hours of effort and the lack of any prior experience in delivering large scale projects like this makes the whole process much harder. Through this process I learnt that the successful delivery is result of a complicated combination of several factors. First of all, it is important to choose a project that really interests you. This means that you need to research thoroughly and very carefully every aspect of the project before taking the final decision. Since, it is a very long process, keeping yourself motivated to work and stay in schedule is very crucial. Secondly, you should follow the guidance provided by your supervisor. As mentioned before, final year project students are totally inexperienced in projects like this. Your supervisor is there to provide guidance, help and advice. Without his crucial contribution, it would be almost impossible to complete the project successfully. Time management is equally important as keeping yourself motivated during the project’s lifetime. These two factors are related. Keeping yourself motivated most likely will keep you in schedule. But again, due to the lack of experience, even if you work constantly during these months, you might find yourself stuck in a particular problem, spending more time than you should on a specific process. Consequently, you will spend less time on processes which could be more important. The most significant lesson I learnt is that this project is not only about designing and producing a solution to a problem. In my point of view, the most important aspect of the final year project is the learning experience that provides you. More specifically, an extensive background reading and research is needed in order to be able to understand the problem completely and be able to produce a solution. But the learning experience is continuous throughout the project’s lifetime. You need to experiment and apply in practice everything you learnt. Moreover, the need for redesigning and re56 2013 Nikolaos Tzoannou implementing which most likely will occur at least once, extends this learning experience until the very end of the project. In addition to the specific knowledge in a certain field like computer vision, learning is extended to more general skills as well. During a project such as this one, you learn how to be methodical and consistent, and also you improve your presentation and writing skills. You learn how to break down a large piece of work and organise your time and also how to follow guidelines provided by your supervisor. All these skills are crucial and necessary for our future employability, for any work environment, regardless the actual nature and field of the job. Overall, it was a learning experience for me in all these aspects described above, providing both field specific knowledge and general skills which is part of the successful outcome of the project. If you haven’t learnt anything during this project, then you did something completely wrong. A personal way to evaluate the outcome of such a project as a final year student is to ask yourself after the completion, “What have I learnt?” The more answers you get, the more successful was the project. 57 2013 Nikolaos Tzoannou Appendix B External material used B.1 Colour deconvolution Matlab mex c function written by Tom Macura. http://svn.openmicroscopy.org.uk/svn/ome/trunk/src/matlab/OME/Filters/colour_deconvolution/colou r_deconvolution.c B.2 Implementation of the Hungarian algorithm in Matlab written by Markus Buehren. http://www.mathworks.com/matlabcentral/fileexchange/6543 B.3 XML parser written in C by Frank mathematics.net/tools/xmlParser.html 58 Vanden Berghen. http://www.applied- 2013 Nikolaos Tzoannou Appendix C Detection Evidence Figure C.1: Successful blood vessel detection for samples a (top left), b (top right), c (bottom left), d (bottom right) 59 2013 Nikolaos Tzoannou Figure C.2: Successful blood vessel detection for samples e (top left), f (top right), g (middle left), h (middle right), i (bottom left), j (bottom right) 60
© Copyright 2026 Paperzz