ARTICLE IN PRESS Pattern Recognition ( ) – www.elsevier.com/locate/patcog Convex hull based skew estimation Bo Yuana,∗ , Chew Lim Tanb a Centre for Remote Imaging, Sensing and Processing, National University of Singapore, Singapore b Department of Computer Science, School of Computing, National University of Singapore, Singapore Received 16 August 2005; received in revised form 27 January 2006; accepted 1 February 2006 Abstract Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation needs proper page segmentation, especially for document images with multiple skews that are common in scanned images from thick bound publications in 2-up style or postal envelopes with various printed labels. Even if only a single skew is concerned for a document image, the presence of minority regions of different skews or undefined skew such as noise may severely affect the estimation for the dominant skew. Page segmentation, on the other hand, may need to know the exact skew angle of a page in order to work properly. This paper presents a skew estimation method with built-in skew-independent segmentation functionality that is capable of handling document images with multiple regions of different skews. It is based on the convex hulls of the individual components (i.e. the smallest convex polygon that fully contains a component) and that of the component groups (i.e. the smallest convex polygon that fully contain all the components in a group) in a document image. The proposed method first extracts the convex hulls of the components, segments an image into groups of components according to both the spatial distances and size similarities among the convex hulls of the components. This process not only extracts the hints of the alignments of the text groups, but also separate noise or graphical components from that of the textual ones. To verify the proposed algorithms, the full sets of the real and the synthetic samples of the University of Washington English Document Image Database I (UW-I) are used. Quantitative and qualitative comparisons with some existing methods are also provided. 䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. Keywords: Document processing; Skew estimation; Component grouping; Page segmentation; Convex hulls; Segregation effect; UW-I 1. Introduction Printed documents are customarily rectangular. Ideally, text lines in documents are horizontal or vertical relative to the edges of the pages. Due to the imprecision or difficulty in the placement of the original documents in the scanning process, the captured edges of the documents may not always align with the edges of the images. This amount of misalignment is usually referred to as the skew angle of an image. Skew estimation is one of the important processing steps in document image understanding. There are some indepth reviews [1–4] and comparative evaluations [5] available for the large array of techniques that have been developed in the research literature [6–26]. ∗ Corresponding author. Tel.: +65 65165389. E-mail address: [email protected] (B. Yuan). There are various hints of skew in a textual document image. The most explored reference of orientation is the straight text lines. To approximate these text lines, various strategies are deployed, among which the most popular are the projection-profile based [6–11], the Hough-transform based [12–16], the nearest-neighborhood based [17–19], the morphological operation based [20–22], and the spatial frequency based [23–26]. Different skew estimation methods compete on the ground of detection accuracy, time and space efficiencies, abilities to detect the existence of multiple skews in the same image, and robustness in noisy environments and scan-introduced distortions. A typical projection-profile based skew estimation method uses a single point, called fiducial point, to represent each component in an image. The set of fiducial points are projected onto a 1-D accumulator array along an angle and a chosen premium function is evaluated on the accumulator 0031-3203/$30.00 䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2006.02.016 ARTICLE IN PRESS 2 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 1. The convex hulls of the components with their vertices and centroids marked. This is a clip of the sample A00O from UW-I. array. If the projection is successively rotated in a range, a series of premium profiles are obtained. The premium function should reach the extremes when the projection is along the text lines. The detection speed can be accelerated in two rounds, either from coarse to fine rotation or from subsampled to full resolution. A typical Hough-transform based skew estimation method selects a set of fiducial points {x, y} to represent the components and then maps them to the parameter space (Hough space) with certain parameterization. If the normal parameterization is used ( = x cos + y sin ), a single fiducial point x–y is mapped to a sinusoidal curve in the quantized – parameter space by scanning the whole range of parameter . If the mapped curves are accumulated in the 2-D parameter space, the global maxima {max , max } correspond to the prominent text line orientations of the image. A typical nearest-neighborhood based skew estimation method explores the spatial clue to establish groups of components that are supposed to belong to a text line. The positions of grouped components are then used to approximate the orientation of the text line. Since the skew estimation process is based on local groups, the precision of the estimated skew angle is usually not as high as those that work on longer distances on global scale. A typical morphological operation based skew estimation method uses pixel level morphological operations to group or erase neighboring foreground pixels in order to form stripes that represent the text lines. Subsequent line fitting is used to find the elongation of the stripes in order to estimate the major orientations of the stripes thus that of the text lines. A typical spatial frequency based skew estimation method treats the text lines in a textual document image as textures or patterns. The Fourier transform or other waveforms are used to reveal such global trend from the frequency domain. This class of methods usually depends on the availability of dominant text lines. In principle, the above-mentioned classes of skew estimation methods can detect the existence of multiple skews in a document image. However, they cannot give locations of the skews without the help of page segmentation methods. Even with the help of page segmentation methods, they work the same way in the segments as they do on the non-segmented page. In other words, they are designed on principles of single skew estimation. This paper presents a multi-skew estimation method that detects the component groups in a document image and estimates the skews of the detected groups. It is based on the convex hulls of the components and their groups. The convex hull of a component, as shown in Fig. 1, is the smallest convex polygon that contains all the points in the component. The convex hull of a group of components, as shown in Fig. 2, is the smallest convex polygon that contains all the components in the group. Unlike other methods that rely on the hints of text lines, this method explores the hints of skew mainly from the alignment of text blocks. Texts are intentionally aligned with the edges of their blocks. Every text block, be it a paragraph or a column, form one or more straight edges. These alignments are generally termed left ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 3 Fig. 2. The convex hulls of the component groups with their vertices marked for the same clip in Fig. 1. The direct links from component to component in the same group are drawn to illustrate the actual grouping process. This initial grouping result will be consolidated by the containment and intersection criteria on the convex hulls of all the groups. aligned, right aligned or adjusted if the text in a block forms a straight edge at the left side, or right side or both sides of the block. Text lines are usually straight. In order to obtain text blocks, a component grouping function is used to separate the components into groups by their relative distances and sizes. Then, the convex hulls of the component groups are extracted, of which the slopes of the edges are used to estimate the dominant skew angle of the group. This method is robust, even for documents that contain a large amount of graphical components or noise. This noise immunity comes from two properties of the proposed method: (1) the grouping function can separate components of distinctively different sizes and shapes; and (2) the edges of the convex hulls of a component group can predict whether the contents of the group is well-aligned or not. This proposed method is general-purpose, full-ranged (±45◦ , assuming the pages are rectangular), automatic in parameter setting, and highly competitive in detection accuracy and execution speed. 2. Component grouping Component grouping extracts the text blocks, from which skew can be estimated, and by which interferences from outside the text blocks are minimized. The extracted connected components of an image are first grouped by a component grouping function, which is based on the spatial distances and size similarities among the components. The geometry of the extracted groups is then analyzed to derive hints for skew estimation. 2.1. The proposed grouping function Given a component c1 of area s1 and another component c2 of area s2 , if the Euclidian distance between the centroids of c1 and c2 is less than or equal to the grouping function in Eq. (1), the two components are said to have a directly link. A component group is a set of components among which there always exists at least one path of direct links for any two components. The k in Eq. (1) is a scalar parameter that can be adjusted for establishing direct links among components. ks 1 s2 f (s1 , s2 ) = . (1) s1 + s 2 The grouping function in Eq. (1) has several desirable properties: • It is a distance measure; • It is symmetric for c1 and c2 ; • It is rotation invariant, if the aliasing effect is discounted; • It is magnification or resolution invariant, if the aliasing effect is discounted; • There are several options for choosing the areas of components in Eq. (1), such as the total number of pixels ARTICLE IN PRESS 4 B. Yuan, C.L. Tan / Pattern Recognition ( ) – guarantees that the grouping result is independent of the seed component from which the grouping starts. In the end, the result of the initial grouping is merged by the containment and intersection criteria on the convex hulls of the component groups. 2.2. The advantages of using convex hulls Fig. 3. A reference implementation of the components grouping algorithm in pseudo code. In principle, this is a partition algorithm with a binary predicate. [27], the area of the convex hull, or the area of the upright bounding-box of the component; • It favors close components with similar sizes. The textual components are less likely to be grouped with the graphical elements due to the significant difference between their sizes. Therefore, this proposed method is highly noise resistant. Fig. 3 provides a reference implementation of the component-grouping algorithm using Eq. (1). Its time complexity in the worse case is O(n2 ) when all components are isolated, where n is the total number of components in the image. In reality, the time complexity is much better than this because one a component A is in the same group as a component B and the component B is in the same group as a component C, by definition the components A and C belong to the same group thus there is no need to test the two. The reference implementation guarantees that for any two components in a group, only one path of direct links is established in the grouping process, which is shown as acyclic trees in Fig. 2. The reference implementation also Textual components in the same group usually have similar areas. For instance, characters in a paragraph are usually of the same size. This is part of the rationale for involving areas of components in Eq. (1). There are several choices of representing areas of components, such as the connected components, the upright bounding boxes of the connected components, and the best-fit bounding boxes of the connected components. We introduce here yet another choice—the convex hulls of the connected components. This choice has several advantages over the other choices. The areas of convex hulls have much closer values than that of the commonly used connected components. Furthermore, the areas of convex hulls keep constant in any skew angles due to the rotation-invariant property, unlike the popular upright bounding boxes of the connected components whose values change with skew angles. This can be easily observed from Fig. 1. In Eq. (1), the distances among components are measured from their fiducial points, which in our case are the centroids of the convex hulls of the components. There are other choices of fiducial points, such as the bottom centers of the upright bounding-boxes of the components [7], or the horizontal tangent points of the bottoms of the components [9]. However, these two choices are limited to images with only small skew angles. On the contrary, the centroids of convex hulls are rotation-invariant, thus are valid at any skew angles. Finally, the comparison between connected components and their convex hulls is also in the latter’s favor. This is because the areas and shapes of the convex hulls of components vary significantly less than that of components, so do their centroids. This can also be observed from Fig. 1. A quantitative comparison is given in Fig. 4. Fig. 4 shows the areas distributions of the components (top) and their convex hulls (bottom) for the sample A00O (in the background) from UW-I. Compared to using the components, the distribution using the convex hulls shows apparent “Segregation Effect”, which is manifested by the wider and deeper valleys in the area distribution. These valleys show that the areas of the convex hulls of the components have a more clustered distribution than the sizes of the components. This is because the individual components may have very different sizes, but their convex hulls may not. This is obvious when comparing Figs. 1 and 2, especially the last line. The segregation effect is more apparent in the Chinese text because in Chinese and many East Asian languages the characters are rectangular as Fig. 5 demonstrates. This is especially so in images of low resolutions ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 5 Fig. 4. The areas distributions of the components (top) and their convex hulls (bottom) of the sample A00O from UW-I. The background is the original image represented by its components (top) and convex hulls (bottom) of the corresponding half. when the strokes of a character touch one another to form a single component. The segregation effect also provides a better condition for selecting thresholds for component filters. For a group with ng components, its density g is given by Eq. (2) as the ratio of the sum of the areas of the convex hulls of the components si in the group to the area of the convex hull of the group sg . g = i=0 sg si . N−1 wtd = 2.3. The choice of the parameter k ng −1 For an image with N component groups, the weighted density wtd is given by Eq. (3) as the sum of the densities of the groups g weighted by their areas sg . (2) g=0 g sg N−1 g=0 sg N−1 ng −1 = g=0 i=0 N−1 g=0 sg si . (3) Fig. 6 shows how different values of the parameter k produce different component grouping results. When k is 0, all the components are isolated, thus the weighted density is 1.0. With the increase of k, the components evolve from smaller groups progressively to larger groups, which can be conveniently represented via the following symbolic representations: characters → words → text lines → ARTICLE IN PRESS 6 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 5. The areas distributions of the components (top) and their convex hulls (bottom) of a Chinese newspaper clip. The background is the original image represented by its components (top) and convex hulls (bottom) of the corresponding half. paragraphs → columns → a single group. The weighted density decreases with the increase of k because the convex hulls of the merged groups include more and more spaces—the spaces in between components, text lines, paragraphs and columns. Fig. 6 can be used to determine the value of k automatically for any document image, since all documents have the weighted-density-versus-k curve with similar characteristics, regardless of what fonts, sizes, or line spacing used. The purpose is to find out the smallest k at which the formation of paragraphs stabilizes. Given a document image, start at the initial k value at 35 and decrement, calculate the difference in the weighted density, and test if this value exceeds a threshold value, say 0.05. Note that in grouping with descending k, all subsequent groupings are in fact splitting. Therefore, detection is only needed within the current groups of components; no cross-group detection is needed, resulting in a much faster detection speed than detection with ascending k from a small value. To validate the above-mentioned k selection scheme, all the 979 images scanned from real publications and the 168 samples synthesized from LaTeX documents in UW-I are used. The top chart in Fig. 7 is the result of using the components directly, while the bottom chart is that of using the convex hulls of the components. The curve of using the convex hulls shows sharp peaks that are almost the same for both the real and the synthesized images. In contrast, the curve of using the components directly shows diffused peaks that ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 7 Fig. 6. Various components grouping stages for the sample A00O from UW-I: (foreground) the weighted density of the groups versus the k value; (background) the component groups and their convex hulls at k = 6, 12 and 20. differ significantly for the real and the synthesized images. This is yet another evidence for favoring the convex hulls of the components over the components themselves. These results also suggest that if the convex hulls of the components are used, it is sufficient that one k value (for example, 16) is used for all the sample images in the UW-I. Actually, for any batch-scanned images from the same sources, such as journals or newspapers in a period where the typesetting styles keep consistent, one k value is sufficient. Note that this component grouping mechanism is independent of the image scanner’s resolution, if it is not too low when the pixel aliasing and the component touching problems become severe. The proposed skew estimation method works well in this range of grouping. If the weighted group density falls outside this range, the k value needs to be increased (above this range) or decreased (below this range). This k adjustment loop terminates when the weighted density stabilizes within the range, or when k reaches a preset upper limit (40 in our case). The k adjustment loop causes regrouping of components thus the computational cost, especially when multiple rounds are involved. However, if a proper initial k value is used, the k adjustment loop is rarely, if not at all, triggered for normal typeset documents. 3. Skew estimation 2.4. The k adjustment loop The purpose of adding a k-adjustment loop in the component grouping process, which is shown in the flowchart of Fig. 8, is to compensate for possible under-/over-grouping. Although a single k value is effective for most documents with normal typesetting, there are always exceptions. For instance, the line spacing in certain documents can be larger than double, or two nearby paragraphs accidentally merge because of the noise in between them. In such cases, k needs to be dynamically adjusted from its initial value. The adjustment of k is based on the same principle as the initial k value is selected using the weighted-density-versusk curve in Fig. 6. Based on our estimates, the weighted group density is between 20% and 50% for normal documents. The orientation of a document can be estimated from the layouts of its text. Text by proper typesetting has straight baselines as well as paragraph alignments (left, right or adjusted). By effectively grouping the textual components in document images, the alignment along the edges of the groups closely resembles the layout of the original documents. The convex hulls of the component groups are a good choice for revealing the edge alignments and directions. Fig. 8 shows the processing stages of the convex hull based skew estimation method. 3.1. The edge slope histogram Among the edges of the convex hull of a component group, only the long edges that have parallel or perpendicular ARTICLE IN PRESS 8 B. Yuan, C.L. Tan / Pattern Recognition ( ) – 300 Latex-168 Real-979 Number of samples 250 200 150 100 50 0 0 5 10 15 20 25 30 35 k 300 Latex-168 Real-979 Number of samples 250 200 150 100 50 0 0 5 10 15 20 25 30 35 k Fig. 7. The frequency distribution of the smallest k at which the formation of the paragraphs stabilizes for the 979 real images and the 168 synthetic images from UW-I. Using the convex hulls of the components (bottom) is superior to using the components directly (top). counterparts in the same convex hull are significant in revealing the rectangular shape of the text paragraph or column that this group represents. This detection principle is integrated into the design of the mapping from the edge slopes of the convex hull to the bins in the histogram. The slope histogram of the edges of the convex hulls has Nbin = 9000 bins representing tangents in [−1, 1), which corresponds to skew angles in [−45◦ , 45◦ ) with an angular resolution of 0.01◦ /bin on average. Suppose (xi , yi ) and (xi+1 , yi+1 ) are the two vertices of an edge of a convex hull in floating-points, the slope of the edge is quantized to the bin whose index is ⎧ ⎪ yi+1 − yi Nbin ⎪ ⎪ , 1+ ⎪ ⎨ 2 xi+1 − xi Ibin = ⎪ Nbin xi+1 − xi ⎪ ⎪ ⎪ , 1− ⎩ 2 yi+1 − yi yi+1 − yi 1, x i+1 − xi yi+1 − yi > 1. x i+1 − xi (4) It is assumed that paragraphs and columns in a textual document are rectangular. If one side is outside the range of [−45◦ , 45◦ ), the other must be inside. The significance of Eq. (4) is that parallel edges will fall into the same bin in the histogram, and so will perpendicular ones. The histogram will have prominent peaks in these directions. ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition Connected-component Connected-component analysis analysis Band-pass Band-pass components components filtering filtering The k adjustment loop Convex Convex hull hull detection detection for for each each components components – 9 (6) The convolved histogram in Eq. (7) is used to search for the highest peak that corresponds to the dominant skew angle of a component group. Component Component grouping grouping kk adjustment adjustment h [i] = h[(i − j − + Nbin ) mod Nbin ]s[j + ]. (7) j =− Grouping Grouping acceptable? acceptable? No The modular operation in Eq. (7) indicates the wrapping of values at the two endpoints of the histogram. If a document page is considered to have only one dominant skew angle, there needs only one slope histogram on which all the edge slopes are accumulated. A convolution with Eq. (7) is applied on the slope histogram, from which the highest peak is extracted to determine the dominant skew angle of this page. Yes Edge Edge slope slope accumulation accumulation for for each each group group Histogram Histogram convolution convolution Peak Peak evaluation evaluation for for each each group group Skew(s) Skew(s) 4. Experimental results Fig. 8. The flowchart of the convex hull based skew estimation model. The k adjustment loop compensates for possible under-/over-grouping. If a proper initial k value is used, the k adjustment loop is rarely, if not at all, triggered for normal typeset documents. In order to give more weight to long edges, the amount of increment in the bin Ibin is set to the length of the edge: inc(Ibin ) = ) the half-size of the kernel. ⎧ ⎪ ⎪ ⎪ ⎪ (j − )2 ⎪ ⎨ exp − , 0 j 2, 22bin s[j ] = ⎪ ⎪ ⎪ ⎪ 0, otherwise. ⎪ ⎩ Input Input image image Convex Convex hull hull detection detection for for each each groups groups ( (xi+1 − xi )2 + (yi+1 − yi )2 . (5) The slope histogram is obtained by computing the slopes of all the edges of the convex hulls of the component groups and quantizing them to one of the bins in the histogram by Eqs. (4) and (5). 3.2. The search for peaks For each component group, the accumulated slope histogram usually contains multiple peaks. Due to the detection limitation of the convex hulls of the component groups, the parallel or perpendicular edges may spread into neighboring bins rather than a single bin. To bring out the peak of this concentration, the slope histogram h[i], where i ∈ [0, Nbin ), is convolved with a finite, symmetric kernel generated from an un-normalized normal distribution in Eq. (6), where 2bin is the variance and is a positive integer that represents To evaluate the effectiveness and robustness of the proposed component grouping and skew estimation method, the real images from the University of Washington English Document Image Database I (UW-I) are used. In this database, total 979 samples are scanned from real printed journals. Many samples contain large area of disjoint, non-textual components that are the results of binarization on photographic objects, or the artifacts of the scanning process. In this experiment, only a [10, 3000) band-pass size filter and an aspect-ratio filter are used to remove noise. This is one of the measures to improve computing efficiency, not a prerequisite for the proposed method to work. The initial value of the parameter k is set to 16 according to the results in Fig. 7. Fig. 9 shows the accumulated percentage of samples versus the absolute detection error, while Fig. 10 gives the regression analysis of this suite test. The linear correlation coefficient from Fig. 10 is 92.09%. If the labeled outliers are excluded, the linear correlation coefficient becomes 95.45%. Fig. 11 is the sample A002 that is labeled in Fig. 10. The ground truth is 0.4◦ , and the detected skew angle is −2.54◦ for the left page and 0.28◦ for the right page. Obviously, the ground truth is meant for the right page and the detected most prominent value is for the left. This is can be seen as a multi-skew detection example. Fig. 12 is the sample A03I that is labeled in Fig. 10. The ground truth is −0.65◦ , and the detected skew angle is 1.06◦ . ARTICLE IN PRESS 10 B. Yuan, C.L. Tan / Pattern Recognition ( ) – 100% Accumulated percentage of samples 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 Absolute error (degrees) Fig. 9. The accumulated percentage of samples versus the absolute error on the 979 real document images in UW-I. 3.5 3 2.5 2 Skew angle (degrees) 1.5 A03I 1 J00B 0.5 A05G 0 N042 -0.5 -1 -1.5 -2 -2.5 A002 -3 -3.5 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3 3.5 Ground truth (degrees) Fig. 10. Regression analysis using the 979 real document images in UW-I. The linear correlation coefficient is 92.1%. The ground truth is meant for the right page and the detected skew angle represents the left page that is the dominant in terms of size and content. Fig. 13 is the sample A05G that is labeled in Fig. 10. The ground truth is −2.12◦ , and the detected skew angle is 0.14◦ . There is no hint of a skew angle as −2.12◦ by visual ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 11 Fig. 11. The sample A002 from UW-I (labeled in Fig. 10). The ground truth is 0.4◦ , and the detected skew angle is −2.54◦ (highest peak) for the left half and 0.28◦ (second highest peak) for the right half of the image. The components in gray are those filtered out by the size filter or the aspect-ratio filter, while the components in black are those grouped by the grouping function. The edges and vertices of their convex hulls are drawn in gray. inspection. There is observable warping along the spine of the facing pages. Fig. 14 is the sample J00B that is labeled in Fig. 10. The ground truth is −0.48◦ , and the detected skew angle is 0.95◦ for the left page and −0.52◦ for the right page. The parameter k increases from the initial 16 to the final 35. By visual inspection, the ground truth represents the dominant right page, while the detected most significant skew angle is for the left page with more groups. Fig. 15 is the sample N042 that is labeled in Fig. 10. The ground truth is 0.79◦ , and the detected skew angle is 0.00◦ . This example reveals one of the intrinsic limitations of angle measurement in imaging grids—the inadequacy of angular resolution at short distances. The angular resolution can be ARTICLE IN PRESS 12 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 12. The sample A03I from UW-I (labeled in Fig. 10). The ground truth is −0.65◦ , and the detected skew angle is 1.06◦ . estimated by the arctangent value of the reverse of distance in pixels: 0.57◦ at 100 pixels; 0.12◦ at 500 pixels; 0.06◦ at 1000 pixels; and 0.01◦ at 5730 pixels. The longer the distance, the higher the angular resolution becomes. For the short lengths of the edges that are parallel or perpendicular to one another in this sample, the detected angle becomes 0.00◦ . Fig. 16 is the sample H04I from UW-I. The ground truth is −0.10◦ , and the detected skew angle is −0.19◦ . In this sample, the graphical components are overwhelming in number over the textual ones, yet the grouping method and the convex hull based skew estimation method still function cor- rectly. It shows, together with the sample I047 in Fig. 17, that the proposed method is rather robust in the existence of excessive noise. Fig. 18 is the sample A06M from UW-I. The ground truth is −3.00◦ , and the detected skew angle is −2.75◦ . There is apparent warping along the spine of the original document. However, this does not impose serious problem for the correct detection of the skew angle. This is because even though the warping does change the shape and edges of the groups, the left edges (no significant warping) and the bottoms edges (against the warping direction) are much less affected, if affected at all. ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 13 Fig. 13. The sample A05G from UW-I (labeled in Fig. 10). The ground truth is −2.12◦ , and the detected skew angle is 0.14◦ . There is no hint of −2.12◦ by visual inspection. 5. Comparisons and conclusions In this paper, we compare our results with two published work using the same UW-I database by Chen et al. [20] (the creators of the UW-I databases) and by Bloomberg et al. [10]. Chen’s method is based on a recursive morphological transform on a down-sampled image with a regression method for parameter fitting. Bloomberg’s method is projection-profile based that counts pixels along varying scanning lines. Since both teams did not provide numerical results in their papers, the comparisons have to be made from the readouts of the charts on their published papers. The three participants (Chen, Bloomberg and us) all use the full set of 979 samples from UW-I against the provided ground truth. Table 1 shows the accumulated percentage of samples versus the absolute detection error, which is defined as the absolute difference between the detected skew angle and the given ground truth, of the participating methods. As shown in Table 1, within 0.1◦ of absolute error, the best performer is that of Chen (manual mode, 2 × 3 structuring element) at 86%, followed by the convex hull based at 61%, then that of Chen in auto mode at 55%, and that of Bloomberg (quarter-sized) at 44%. All the participants ARTICLE IN PRESS 14 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 14. The sample J00B from UW-I (labeled in Fig. 10). The ground truth is −0.48◦ , and the detected skew angle is 0.95◦ for the left page and −0.52◦ for the right page. The most prominent peak is for the left page with more groups. The value of the parameter k has been increased from 16 to 35 automatically. are able to detect about 99% of the samples within 0.5◦ in their best parameter settings. Fig. 19 shows the best performances of the participants listed, and the numerical values are in Table 1. There are some concerns about the results of Chen’s method. Considering the difficulty of the real document samples in UW-I, their results are exceptional: 86% of the samples are detected within 0.1◦ of absolute error with the manually tuned optimal parameters. However, as we understand from their original paper [20], their machine-learning algorithm uses the same set of samples for both training and testing. The appropriate procedure is to divide the whole sample set into two parts, one part for training and the other part for testing. Even though they created new samples by ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 15 Fig. 15. The sample N042 from UW-I (labeled in Fig. 10). The ground truth is 0.79◦ , and the detected skew angle is 0.0◦ . This sample reveals the limitation of angular resolution at short distances, which is true for any skew estimation method. rotating the original samples with known angles, this does not introduce data independency. Therefore, it is not clear how well their method works for images other than that of UW-I on which their parameters are tuned. There is another measurement, the regression analysis, on the results of the second test. This is only provided for our method. It is not available from the other two parties. The purpose of this measurement is that the test images all have very small skew angles (< 3◦ ). A method needs to prove that the results it produces are not random small values. Furthermore, the regression analysis chart reveals all the outliers with large errors, which cannot be seen from the accumu- ARTICLE IN PRESS 16 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 16. The sample H04I from UW-I. The ground truth is −0.10◦ , and the detected skew angle is −0.19◦ . This is one of the samples that demonstrate the robustness of the proposed skew estimation method in the presence of excessive noises. lated percentage chart provided above. We provide the results of both the accumulated percentage of samples and the regression analysis in order to provide detailed information about our method for scrutiny. Based on the available experimental results in Table 1, the conclusion can be made that our convex hull based method has the best performance in terms of detection accuracy and reliability. Bloomberg’s method at 2× reso- ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 17 Fig. 17. The sample I047 from UW-I. The ground truth is 0.50◦ , and the detected skew angle is 0.25◦ . This is one of the samples that demonstrate the robustness and versatility of the convex hull based model in selecting hints for skew estimation. lution reduction has close performance within larger range of absolute error. As for Chen’s method, if they can address our concerns about their training methodology and confirm the same results as shown in their papers, the performance with the manually tuned optimal parameter setting is hard to surpass, while not quite so in their automatic mode. It is also helpful to compare the convex hull based skew estimation method in this paper with the fiducial line based skew estimation method in our previous paper as appeared in Ref. [28]. The two methods are both general-purpose skew estimation methods for textual documents. The major differences between the two are: • Principles: The method in Ref. [28] is based on the alignment of individual text lines, while the method in this paper is based on the layout of text blocks. • Accuracy: The two methods have comparable accuracy, with slightly better results from the method in this paper. The difference is mainly due to some of the “difficult” samples in UW-I. • Efficiency: The two methods have comparable execution speed, with slightly better results from the method in this paper. ARTICLE IN PRESS 18 B. Yuan, C.L. Tan / Pattern Recognition ( ) – Fig. 18. The sample A06M from UW-I. The ground truth is −3.00◦ , and the detected skew angle is −2.75◦ . The warping along the spine of the original document does not impede the correct detection of the skew angle. Table 1 Performances comparison using the 979 real document images in UW-I. Shaded rows are the best performances from Chen, Bloomberg and ours Chen (2 × 3, manual) Chen (2 × 2, manual) Chen (2 × 3, auto) Chen (2 × 2, auto) Bloomberg (2× reduction) Bloomberg (8× reduction) Convex hull based 0.0◦ 0.1◦ 0.2◦ 0.3◦ 0.4◦ 0.5◦ – – – – – – 4% 86% 75% 55% 24% 44% 37% 61% 93% 88% 78% 43% 75% 64% 86% 97% 93% 89% 61% 93% 82% 95% 99% 95% 93% 74% 98% 90% 98% 99% 97% 95% 83% 99% 95% 99% Sources: Figs. 1 and 3 in Ref. [20]; Figs. 3 and 6 in Ref. [10]. Charts digitization uncertainty: ±0.5%. ARTICLE IN PRESS B. Yuan, C.L. Tan / Pattern Recognition ( ) – 19 100% Accumulated percentage of samples 90% 80% 70% Chen (2×3, manual) Chen (2×3, auto) 60% Bloomberg (2× reduction) Convex hull based 50% 40% 0.1 0.2 0.3 0.4 0.5 Absolute error (degrees) Fig. 19. The 3-party performance evaluation using the test suite UW-I. See Table 1 for the numerical values, the original sources, and the uncertainty of the data acquisition process. • Excessive noise: Both can work on images with excessive noise, but in different ways. The method in Ref. [28] depends on the alignment of text lines and misalignment of noise, while the method in this paper depends on the separation of noise and text by the component grouping. • Multiple skews: Both can detect the existence of multiple skews in an image, but only the method in this paper can locate the regions with different skews. • Cross-column correlation: Some documents have two or more columns in which the baselines of text are not aligned (collinear), or different font sizes are used across the columns. The method in Ref. [28] may fail on this type of documents, especially when mono-spaced fonts (courier, etc.) are used. The method in this paper has no difficulty in dealing with this type of documents because the columns are separated by the component grouping process. • Component touching: This can be serious in documents that are scanned at low resolution or the printing quality of the original document is poor. When the component touching is severe, the method in Ref. [28] can still process in the background mode, but the method in this paper may have grouping error. The benefits of using the convex hulls in this paper can be extended to other applications. For instance, if the fidu- cial line based skew estimation method in Ref. [28] uses the centroids of the convex hulls rather than the centroids of the connected components, there are observable improvements in both the accumulated percentage of samples and regression analysis using the same set of UW-I samples, and the overall execution speed can be improved as well. The proposed skew estimation method is highly competitive in execution speed. It takes 1617 seconds to process the 979 real images (2592 × 3300 pixels each) in UW-I (about 1.7 s/sample) excluding image I/O on the Java 5 platform on a 2.3 GHz Pentium IV personal computer. References [1] G. Nagy, Twenty years of document image analysis in PAMI, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (2000) 38–62. [2] L. O’Gorman, R. Kasturi, Document Image Analysis, IEEE Computer Society Press, Los Alamitos CA, 1995. [3] R. Cattoni, T. Coianiz, S. Messelodi, C.M. Modena, Geometric layout analysis techniques for document image understanding: a review, ITC-IRST Technical Report #9703-09, 1998. [4] J.J. Hull, in: J.J. Hull, S.L. Taylor (Eds.), Document Analysis Systems II, World Scientific, Singapore, 1998, pp. 40–64. [5] A.D. Bagdanov, Evaluation of document image skew estimation techniques, Proc. SPIE 2660 (1996) 343–353. [6] W. Postl, Detection of linear oblique structures and skew scan in digitized documents, in: Proceedings of the Eighth International Conference on Pattern Recognition, Paris, October 1986, pp. 687–689. ARTICLE IN PRESS 20 B. Yuan, C.L. Tan / Pattern Recognition [7] H.S. Baird, The skew angle of printed documents, in: Proceedings of SPSE 40th Annual Conference and Symposium on Hybrid Imaging Systems, Rochester, NY, May 1987, pp. 21–24. [8] Y. Nakano, Y. Shima, H. Fujisawa, J. Higashino, M. Fujinawa, An algorithm for skew normalization of document images, in: Proceedings of the 10th International Conference on Pattern Recognition, Atlantic City, NJ, 1990, pp. 8–13. [9] A.L. Spitz, Skew determination in CCITT group 4 compressed images, in: Proceedings of the First Annual Symposium on Document Analysis and Information Retrieval, Las Vegas, 16–18 March 1992, pp. 11–25. [10] D.S. Bloomberg, G.E. Kopec, L. Dasari, Measuring document image skew and orientation, Document Recognition II, Proceedings of SPIE, vol. 2422, San Jose, CA, 6–7 February 1995, pp. 302–316. [11] N. Liolios, N. Fakotakis, G. Kokkinakis, On the generalization of the form identification and skew detection problem, Pattern Recogn. 35 (1) (2002) 253–264. [12] S.N. Srihari, V. Govindaraju, Analysis of textual images using the Hough transform, Mach. Vision Appl. 2 (3) (1989) 141–153. [13] S. Hinds, J. Fisher, D. D’Amato, A document skew detection method using run-length encoding and the Hough transform, in: Proceedings of the 10th International Conference on Pattern Recognition, Atlantic City, NJ, 17–21 June 1990, pp. 464–468. [14] D.S. Le, G.R. Thoma, H. Wechsler, Automated page orientation and skew angle detection for binary document images, Pattern Recogn. 27 (10) (1994) 1325–1344. [15] B. Yu, A.K. Jain, A robust and fast detection algorithm for generic documents, Pattern Recogn. 29 (10) (1996) 1599–1629. [16] U. Pal, B.B. Chaudhuri, An improved document skew angle estimation technique, Pattern Recogn. Lett. 17 (8) (1996) 899–904. [17] A. Hashizume, P.S. Yeh, A. Rosenfeld, A method of detecting the orientation of aligned components, Pattern Recogn. Lett. 4 (1986) 125–132.s [18] L. O’Gorman, The document spectrum for page layout analysis, IEEE Trans. Pattern Anal. Mach. Intell. 15 (11) (1993) 1162–1173. ( ) – [19] R. Smith, A simple and efficient skew detection algorithm via text row accumulation, in: Proceedings of the Third International Conference on Document Analysis and Recognition, Montreal, Canada, August 1995, pp. 1145–1148. [20] S. Chen, R.M. Haralick, An automatic algorithm for text skew estimation in document images using recursive morphological transforms, in: Proceedings of the IEEE International Conference on Image Processing, Austin, TX, 13–16 November 1994, pp. 139–143. [21] B. Gatos, N. Papamarkos, C. Chamzas, Skew detection and text line position determination in digitized documents, Pattern Recogn. 30 (9) (1997) 1505–1519. [22] S. Lu, B.M. Chen, C.C. Ko, Document image rectification using fuzzy sets and morphological operators, in: Proceedings of the IEEE International Conference on Image Processing, Singapore, 24–27 October 2004, pp. 2877–2880. [23] J. Sauvola, M. Pietikäinen, Skew angle detection using texture direction analysis, in: Proceedings of the Ninth Scandinavian Conference on Image Analysis, Uppsala Sweden, June 1995, pp. 1099–1106. [24] C. Sun, D. Si, Skew and slant correction for document images using gradient direction, in: Proceedings of the Fourth International Conference on Document Analysis and Recognition, Ulm Germany, 18–20 August 1997, pp. 142–146. [25] H.K. Aghajan, T. Kailath, SLIDE: subspace-based line detection, IEEE Trans. Pattern Anal. Mach. Intell. 16 (11) (1994) 1057–1073. [26] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, Skew angle estimation in document processing using Cohen’s class distributions, Pattern Recogn. Lett. 2 (1999) 1305–1311. [27] B. Yuan, C.L. Tan, A multi-level component grouping algorithm and its applications, in: Proceedings of the Eighth International Conference on Document Analysis and Recognition, Seoul, Korea, 29 August–1 September 2005, pp. 1178–1181. [28] B. Yuan, C.L. Tan, Fiducial line based skew estimation, Pattern Recogn. 38 (12) (2005) 2333–2350. About the Author — B. YUAN received the B.Sc. and M.Sc. degrees in Nuclear Physics in 1985 and 1988 from Peking University, China. He received his M.Sc. and Ph.D. degrees in Computer Science in 2000 and 2006 from National University of Singapore. His current research interests include automatic and semi-automatic extraction of man-made objects from satellite images. He is currently a research scientist in the Centre for Remote Imaging, Sensing and Processing (CRISP), National University of Singapore. About the Author — C.L. TAN received the B.Sc. (Hons.) degree in Physics in 1971 from University of Singapore, the M.Sc. degree in Radiation Studies in 1973 from University of Surrey, UK, and the Ph.D. degree in Computer Science in 1986 from University of Virginia, USA. His research interests include document image and text processing, neural networks and genetic programming. He has published more than 200 research publications in these areas. He is an associate editor of Pattern Recognition. He has served on the program committees of many international conferences and workshops, including the International Conference on Document Analysis and Recognition (ICDAR) 2005, International Workshop on Graphics Recognition (GREC) 2005, and the International Conference on Pattern Recognition (ICPR) 2006. He is currently an Associate Professor in the Department of Computer Science, School of Computing, National University of Singapore.
© Copyright 2026 Paperzz