Convex hull based skew estimation

ARTICLE IN PRESS
Pattern Recognition
(
)
–
www.elsevier.com/locate/patcog
Convex hull based skew estimation
Bo Yuana,∗ , Chew Lim Tanb
a Centre for Remote Imaging, Sensing and Processing, National University of Singapore, Singapore
b Department of Computer Science, School of Computing, National University of Singapore, Singapore
Received 16 August 2005; received in revised form 27 January 2006; accepted 1 February 2006
Abstract
Skew estimation and page segmentation are the two closely related processing stages for document image analysis. Skew estimation
needs proper page segmentation, especially for document images with multiple skews that are common in scanned images from thick
bound publications in 2-up style or postal envelopes with various printed labels. Even if only a single skew is concerned for a document
image, the presence of minority regions of different skews or undefined skew such as noise may severely affect the estimation for the
dominant skew. Page segmentation, on the other hand, may need to know the exact skew angle of a page in order to work properly. This
paper presents a skew estimation method with built-in skew-independent segmentation functionality that is capable of handling document
images with multiple regions of different skews. It is based on the convex hulls of the individual components (i.e. the smallest convex
polygon that fully contains a component) and that of the component groups (i.e. the smallest convex polygon that fully contain all the
components in a group) in a document image. The proposed method first extracts the convex hulls of the components, segments an image
into groups of components according to both the spatial distances and size similarities among the convex hulls of the components. This
process not only extracts the hints of the alignments of the text groups, but also separate noise or graphical components from that of the
textual ones. To verify the proposed algorithms, the full sets of the real and the synthetic samples of the University of Washington English
Document Image Database I (UW-I) are used. Quantitative and qualitative comparisons with some existing methods are also provided.
䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Document processing; Skew estimation; Component grouping; Page segmentation; Convex hulls; Segregation effect; UW-I
1. Introduction
Printed documents are customarily rectangular. Ideally,
text lines in documents are horizontal or vertical relative to
the edges of the pages. Due to the imprecision or difficulty
in the placement of the original documents in the scanning
process, the captured edges of the documents may not always align with the edges of the images. This amount of
misalignment is usually referred to as the skew angle of an
image. Skew estimation is one of the important processing
steps in document image understanding. There are some indepth reviews [1–4] and comparative evaluations [5] available for the large array of techniques that have been developed in the research literature [6–26].
∗ Corresponding author. Tel.: +65 65165389.
E-mail address: [email protected] (B. Yuan).
There are various hints of skew in a textual document
image. The most explored reference of orientation is the
straight text lines. To approximate these text lines, various
strategies are deployed, among which the most popular are
the projection-profile based [6–11], the Hough-transform
based [12–16], the nearest-neighborhood based [17–19], the
morphological operation based [20–22], and the spatial frequency based [23–26]. Different skew estimation methods
compete on the ground of detection accuracy, time and
space efficiencies, abilities to detect the existence of multiple skews in the same image, and robustness in noisy environments and scan-introduced distortions.
A typical projection-profile based skew estimation method
uses a single point, called fiducial point, to represent each
component in an image. The set of fiducial points are projected onto a 1-D accumulator array along an angle and a
chosen premium function is evaluated on the accumulator
0031-3203/$30.00 䉷 2006 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2006.02.016
ARTICLE IN PRESS
2
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 1. The convex hulls of the components with their vertices and centroids marked. This is a clip of the sample A00O from UW-I.
array. If the projection is successively rotated in a range, a
series of premium profiles are obtained. The premium function should reach the extremes when the projection is along
the text lines. The detection speed can be accelerated in
two rounds, either from coarse to fine rotation or from subsampled to full resolution.
A typical Hough-transform based skew estimation method
selects a set of fiducial points {x, y} to represent the components and then maps them to the parameter space (Hough
space) with certain parameterization. If the normal parameterization is used ( = x cos + y sin ), a single fiducial
point x–y is mapped to a sinusoidal curve in the quantized – parameter space by scanning the whole range
of parameter . If the mapped curves are accumulated in
the 2-D parameter space, the global maxima {max , max }
correspond to the prominent text line orientations of
the image.
A typical nearest-neighborhood based skew estimation
method explores the spatial clue to establish groups of components that are supposed to belong to a text line. The positions of grouped components are then used to approximate
the orientation of the text line. Since the skew estimation
process is based on local groups, the precision of the estimated skew angle is usually not as high as those that work
on longer distances on global scale.
A typical morphological operation based skew estimation method uses pixel level morphological operations to
group or erase neighboring foreground pixels in order to
form stripes that represent the text lines. Subsequent line fitting is used to find the elongation of the stripes in order to
estimate the major orientations of the stripes thus that of the
text lines.
A typical spatial frequency based skew estimation method
treats the text lines in a textual document image as textures
or patterns. The Fourier transform or other waveforms are
used to reveal such global trend from the frequency domain.
This class of methods usually depends on the availability of
dominant text lines.
In principle, the above-mentioned classes of skew estimation methods can detect the existence of multiple skews in a
document image. However, they cannot give locations of the
skews without the help of page segmentation methods. Even
with the help of page segmentation methods, they work the
same way in the segments as they do on the non-segmented
page. In other words, they are designed on principles of single skew estimation.
This paper presents a multi-skew estimation method that
detects the component groups in a document image and estimates the skews of the detected groups. It is based on the
convex hulls of the components and their groups. The convex hull of a component, as shown in Fig. 1, is the smallest
convex polygon that contains all the points in the component. The convex hull of a group of components, as shown
in Fig. 2, is the smallest convex polygon that contains all
the components in the group. Unlike other methods that
rely on the hints of text lines, this method explores the hints
of skew mainly from the alignment of text blocks. Texts are
intentionally aligned with the edges of their blocks. Every
text block, be it a paragraph or a column, form one or more
straight edges. These alignments are generally termed left
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
3
Fig. 2. The convex hulls of the component groups with their vertices marked for the same clip in Fig. 1. The direct links from component to component
in the same group are drawn to illustrate the actual grouping process. This initial grouping result will be consolidated by the containment and intersection
criteria on the convex hulls of all the groups.
aligned, right aligned or adjusted if the text in a block forms
a straight edge at the left side, or right side or both sides of
the block. Text lines are usually straight. In order to obtain
text blocks, a component grouping function is used to separate the components into groups by their relative distances
and sizes. Then, the convex hulls of the component groups
are extracted, of which the slopes of the edges are used to
estimate the dominant skew angle of the group. This method
is robust, even for documents that contain a large amount
of graphical components or noise. This noise immunity
comes from two properties of the proposed method: (1) the
grouping function can separate components of distinctively
different sizes and shapes; and (2) the edges of the convex hulls of a component group can predict whether the
contents of the group is well-aligned or not. This proposed
method is general-purpose, full-ranged (±45◦ , assuming
the pages are rectangular), automatic in parameter setting,
and highly competitive in detection accuracy and execution
speed.
2. Component grouping
Component grouping extracts the text blocks, from which
skew can be estimated, and by which interferences from outside the text blocks are minimized. The extracted connected
components of an image are first grouped by a component
grouping function, which is based on the spatial distances
and size similarities among the components. The geometry
of the extracted groups is then analyzed to derive hints for
skew estimation.
2.1. The proposed grouping function
Given a component c1 of area s1 and another component
c2 of area s2 , if the Euclidian distance between the centroids
of c1 and c2 is less than or equal to the grouping function
in Eq. (1), the two components are said to have a directly
link. A component group is a set of components among
which there always exists at least one path of direct links
for any two components. The k in Eq. (1) is a scalar parameter that can be adjusted for establishing direct links among
components.
ks 1 s2
f (s1 , s2 ) =
.
(1)
s1 + s 2
The grouping function in Eq. (1) has several desirable
properties:
• It is a distance measure;
• It is symmetric for c1 and c2 ;
• It is rotation invariant, if the aliasing effect is discounted;
• It is magnification or resolution invariant, if the aliasing
effect is discounted;
• There are several options for choosing the areas of components in Eq. (1), such as the total number of pixels
ARTICLE IN PRESS
4
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
guarantees that the grouping result is independent of the
seed component from which the grouping starts. In the end,
the result of the initial grouping is merged by the containment and intersection criteria on the convex hulls of the
component groups.
2.2. The advantages of using convex hulls
Fig. 3. A reference implementation of the components grouping algorithm
in pseudo code. In principle, this is a partition algorithm with a binary
predicate.
[27], the area of the convex hull, or the area of the upright bounding-box of the component;
• It favors close components with similar sizes. The textual components are less likely to be grouped with the
graphical elements due to the significant difference between their sizes. Therefore, this proposed method is
highly noise resistant.
Fig. 3 provides a reference implementation of the
component-grouping algorithm using Eq. (1). Its time complexity in the worse case is O(n2 ) when all components are
isolated, where n is the total number of components in the
image. In reality, the time complexity is much better than
this because one a component A is in the same group as a
component B and the component B is in the same group
as a component C, by definition the components A and C
belong to the same group thus there is no need to test the
two. The reference implementation guarantees that for any
two components in a group, only one path of direct links
is established in the grouping process, which is shown as
acyclic trees in Fig. 2. The reference implementation also
Textual components in the same group usually have similar areas. For instance, characters in a paragraph are usually of the same size. This is part of the rationale for involving areas of components in Eq. (1). There are several
choices of representing areas of components, such as the
connected components, the upright bounding boxes of the
connected components, and the best-fit bounding boxes of
the connected components. We introduce here yet another
choice—the convex hulls of the connected components. This
choice has several advantages over the other choices. The areas of convex hulls have much closer values than that of the
commonly used connected components. Furthermore, the areas of convex hulls keep constant in any skew angles due
to the rotation-invariant property, unlike the popular upright
bounding boxes of the connected components whose values
change with skew angles. This can be easily observed from
Fig. 1.
In Eq. (1), the distances among components are measured
from their fiducial points, which in our case are the centroids of the convex hulls of the components. There are other
choices of fiducial points, such as the bottom centers of the
upright bounding-boxes of the components [7], or the horizontal tangent points of the bottoms of the components [9].
However, these two choices are limited to images with only
small skew angles. On the contrary, the centroids of convex hulls are rotation-invariant, thus are valid at any skew
angles. Finally, the comparison between connected components and their convex hulls is also in the latter’s favor. This
is because the areas and shapes of the convex hulls of components vary significantly less than that of components, so
do their centroids. This can also be observed from Fig. 1. A
quantitative comparison is given in Fig. 4.
Fig. 4 shows the areas distributions of the components
(top) and their convex hulls (bottom) for the sample A00O
(in the background) from UW-I. Compared to using the
components, the distribution using the convex hulls shows
apparent “Segregation Effect”, which is manifested by the
wider and deeper valleys in the area distribution. These valleys show that the areas of the convex hulls of the components have a more clustered distribution than the sizes of
the components. This is because the individual components
may have very different sizes, but their convex hulls may
not. This is obvious when comparing Figs. 1 and 2, especially the last line. The segregation effect is more apparent
in the Chinese text because in Chinese and many East Asian
languages the characters are rectangular as Fig. 5 demonstrates. This is especially so in images of low resolutions
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
5
Fig. 4. The areas distributions of the components (top) and their convex hulls (bottom) of the sample A00O from UW-I. The background is the original
image represented by its components (top) and convex hulls (bottom) of the corresponding half.
when the strokes of a character touch one another to form
a single component. The segregation effect also provides
a better condition for selecting thresholds for component
filters.
For a group with ng components, its density g is given
by Eq. (2) as the ratio of the sum of the areas of the convex
hulls of the components si in the group to the area of the
convex hull of the group sg .
g =
i=0
sg
si
.
N−1
wtd =
2.3. The choice of the parameter k
ng −1
For an image with N component groups, the weighted
density wtd is given by Eq. (3) as the sum of the densities
of the groups g weighted by their areas sg .
(2)
g=0 g sg
N−1
g=0 sg
N−1 ng −1
=
g=0
i=0
N−1
g=0 sg
si
.
(3)
Fig. 6 shows how different values of the parameter k
produce different component grouping results. When k is
0, all the components are isolated, thus the weighted density is 1.0. With the increase of k, the components evolve
from smaller groups progressively to larger groups, which
can be conveniently represented via the following symbolic representations: characters → words → text lines →
ARTICLE IN PRESS
6
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 5. The areas distributions of the components (top) and their convex hulls (bottom) of a Chinese newspaper clip. The background is the original
image represented by its components (top) and convex hulls (bottom) of the corresponding half.
paragraphs → columns → a single group. The weighted density decreases with the increase of k because the convex hulls
of the merged groups include more and more spaces—the
spaces in between components, text lines, paragraphs
and columns.
Fig. 6 can be used to determine the value of k automatically for any document image, since all documents have
the weighted-density-versus-k curve with similar characteristics, regardless of what fonts, sizes, or line spacing used.
The purpose is to find out the smallest k at which the formation of paragraphs stabilizes. Given a document image,
start at the initial k value at 35 and decrement, calculate
the difference in the weighted density, and test if this value
exceeds a threshold value, say 0.05. Note that in grouping
with descending k, all subsequent groupings are in fact splitting. Therefore, detection is only needed within the current
groups of components; no cross-group detection is needed,
resulting in a much faster detection speed than detection
with ascending k from a small value.
To validate the above-mentioned k selection scheme, all
the 979 images scanned from real publications and the 168
samples synthesized from LaTeX documents in UW-I are
used. The top chart in Fig. 7 is the result of using the components directly, while the bottom chart is that of using the
convex hulls of the components. The curve of using the convex hulls shows sharp peaks that are almost the same for both
the real and the synthesized images. In contrast, the curve
of using the components directly shows diffused peaks that
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
7
Fig. 6. Various components grouping stages for the sample A00O from UW-I: (foreground) the weighted density of the groups versus the k value;
(background) the component groups and their convex hulls at k = 6, 12 and 20.
differ significantly for the real and the synthesized images.
This is yet another evidence for favoring the convex hulls
of the components over the components themselves. These
results also suggest that if the convex hulls of the components are used, it is sufficient that one k value (for example,
16) is used for all the sample images in the UW-I. Actually, for any batch-scanned images from the same sources,
such as journals or newspapers in a period where the typesetting styles keep consistent, one k value is sufficient. Note
that this component grouping mechanism is independent of
the image scanner’s resolution, if it is not too low when the
pixel aliasing and the component touching problems become
severe.
The proposed skew estimation method works well in this
range of grouping. If the weighted group density falls outside
this range, the k value needs to be increased (above this
range) or decreased (below this range). This k adjustment
loop terminates when the weighted density stabilizes within
the range, or when k reaches a preset upper limit (40 in our
case).
The k adjustment loop causes regrouping of components
thus the computational cost, especially when multiple rounds
are involved. However, if a proper initial k value is used, the
k adjustment loop is rarely, if not at all, triggered for normal
typeset documents.
3. Skew estimation
2.4. The k adjustment loop
The purpose of adding a k-adjustment loop in the component grouping process, which is shown in the flowchart of
Fig. 8, is to compensate for possible under-/over-grouping.
Although a single k value is effective for most documents
with normal typesetting, there are always exceptions. For
instance, the line spacing in certain documents can be larger
than double, or two nearby paragraphs accidentally merge
because of the noise in between them. In such cases, k needs
to be dynamically adjusted from its initial value.
The adjustment of k is based on the same principle as the
initial k value is selected using the weighted-density-versusk curve in Fig. 6. Based on our estimates, the weighted group
density is between 20% and 50% for normal documents.
The orientation of a document can be estimated from the
layouts of its text. Text by proper typesetting has straight
baselines as well as paragraph alignments (left, right or
adjusted). By effectively grouping the textual components
in document images, the alignment along the edges of the
groups closely resembles the layout of the original documents. The convex hulls of the component groups are a good
choice for revealing the edge alignments and directions.
Fig. 8 shows the processing stages of the convex hull based
skew estimation method.
3.1. The edge slope histogram
Among the edges of the convex hull of a component
group, only the long edges that have parallel or perpendicular
ARTICLE IN PRESS
8
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
300
Latex-168
Real-979
Number of samples
250
200
150
100
50
0
0
5
10
15
20
25
30
35
k
300
Latex-168
Real-979
Number of samples
250
200
150
100
50
0
0
5
10
15
20
25
30
35
k
Fig. 7. The frequency distribution of the smallest k at which the formation of the paragraphs stabilizes for the 979 real images and the 168 synthetic
images from UW-I. Using the convex hulls of the components (bottom) is superior to using the components directly (top).
counterparts in the same convex hull are significant in
revealing the rectangular shape of the text paragraph or
column that this group represents. This detection principle is integrated into the design of the mapping from
the edge slopes of the convex hull to the bins in the
histogram.
The slope histogram of the edges of the convex hulls has
Nbin = 9000 bins representing tangents in [−1, 1), which
corresponds to skew angles in [−45◦ , 45◦ ) with an angular
resolution of 0.01◦ /bin on average.
Suppose (xi , yi ) and (xi+1 , yi+1 ) are the two vertices of an edge of a convex hull in floating-points,
the slope of the edge is quantized to the bin whose
index is
⎧
⎪
yi+1 − yi
Nbin
⎪
⎪
,
1+
⎪
⎨ 2
xi+1 − xi
Ibin =
⎪
Nbin
xi+1 − xi
⎪
⎪
⎪
,
1−
⎩ 2
yi+1 − yi
yi+1 − yi 1,
x
i+1 − xi
yi+1 − yi > 1.
x
i+1 − xi
(4)
It is assumed that paragraphs and columns in a textual
document are rectangular. If one side is outside the range of
[−45◦ , 45◦ ), the other must be inside. The significance of
Eq. (4) is that parallel edges will fall into the same bin in the
histogram, and so will perpendicular ones. The histogram
will have prominent peaks in these directions.
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
Connected-component
Connected-component analysis
analysis
Band-pass
Band-pass components
components filtering
filtering
The k adjustment loop
Convex
Convex hull
hull detection
detection for
for each
each components
components
–
9
(6)
The convolved histogram in Eq. (7) is used to search for
the highest peak that corresponds to the dominant skew angle
of a component group.
Component
Component grouping
grouping
kk adjustment
adjustment
h [i] =
h[(i − j − + Nbin ) mod Nbin ]s[j + ]. (7)
j =−
Grouping
Grouping acceptable?
acceptable?
No
The modular operation in Eq. (7) indicates the wrapping
of values at the two endpoints of the histogram.
If a document page is considered to have only one dominant skew angle, there needs only one slope histogram on
which all the edge slopes are accumulated. A convolution
with Eq. (7) is applied on the slope histogram, from which
the highest peak is extracted to determine the dominant skew
angle of this page.
Yes
Edge
Edge slope
slope accumulation
accumulation for
for each
each group
group
Histogram
Histogram convolution
convolution
Peak
Peak evaluation
evaluation for
for each
each group
group
Skew(s)
Skew(s)
4. Experimental results
Fig. 8. The flowchart of the convex hull based skew estimation model.
The k adjustment loop compensates for possible under-/over-grouping. If
a proper initial k value is used, the k adjustment loop is rarely, if not at
all, triggered for normal typeset documents.
In order to give more weight to long edges, the amount
of increment in the bin Ibin is set to the length of the
edge:
inc(Ibin ) =
)
the half-size of the kernel.
⎧
⎪
⎪
⎪
⎪
(j − )2
⎪
⎨ exp −
, 0 j 2,
22bin
s[j ] =
⎪
⎪
⎪
⎪
0,
otherwise.
⎪
⎩
Input
Input image
image
Convex
Convex hull
hull detection
detection for
for each
each groups
groups
(
(xi+1 − xi )2 + (yi+1 − yi )2 .
(5)
The slope histogram is obtained by computing the slopes
of all the edges of the convex hulls of the component groups
and quantizing them to one of the bins in the histogram by
Eqs. (4) and (5).
3.2. The search for peaks
For each component group, the accumulated slope histogram usually contains multiple peaks. Due to the detection
limitation of the convex hulls of the component groups, the
parallel or perpendicular edges may spread into neighboring
bins rather than a single bin. To bring out the peak of this
concentration, the slope histogram h[i], where i ∈ [0, Nbin ),
is convolved with a finite, symmetric kernel generated from
an un-normalized normal distribution in Eq. (6), where 2bin
is the variance and is a positive integer that represents
To evaluate the effectiveness and robustness of the proposed component grouping and skew estimation method, the
real images from the University of Washington English Document Image Database I (UW-I) are used. In this database,
total 979 samples are scanned from real printed journals.
Many samples contain large area of disjoint, non-textual
components that are the results of binarization on photographic objects, or the artifacts of the scanning process.
In this experiment, only a [10, 3000) band-pass size filter
and an aspect-ratio filter are used to remove noise. This is
one of the measures to improve computing efficiency, not
a prerequisite for the proposed method to work. The initial
value of the parameter k is set to 16 according to the results
in Fig. 7.
Fig. 9 shows the accumulated percentage of samples
versus the absolute detection error, while Fig. 10 gives
the regression analysis of this suite test. The linear correlation coefficient from Fig. 10 is 92.09%. If the labeled
outliers are excluded, the linear correlation coefficient
becomes 95.45%.
Fig. 11 is the sample A002 that is labeled in Fig. 10. The
ground truth is 0.4◦ , and the detected skew angle is −2.54◦
for the left page and 0.28◦ for the right page. Obviously,
the ground truth is meant for the right page and the detected
most prominent value is for the left. This is can be seen as
a multi-skew detection example.
Fig. 12 is the sample A03I that is labeled in Fig. 10. The
ground truth is −0.65◦ , and the detected skew angle is 1.06◦ .
ARTICLE IN PRESS
10
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
100%
Accumulated percentage of samples
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
Absolute error (degrees)
Fig. 9. The accumulated percentage of samples versus the absolute error on the 979 real document images in UW-I.
3.5
3
2.5
2
Skew angle (degrees)
1.5
A03I
1
J00B
0.5
A05G
0
N042
-0.5
-1
-1.5
-2
-2.5
A002
-3
-3.5
-3.5
-3
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
Ground truth (degrees)
Fig. 10. Regression analysis using the 979 real document images in UW-I. The linear correlation coefficient is 92.1%.
The ground truth is meant for the right page and the detected
skew angle represents the left page that is the dominant in
terms of size and content.
Fig. 13 is the sample A05G that is labeled in Fig. 10.
The ground truth is −2.12◦ , and the detected skew angle is
0.14◦ . There is no hint of a skew angle as −2.12◦ by visual
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
11
Fig. 11. The sample A002 from UW-I (labeled in Fig. 10). The ground truth is 0.4◦ , and the detected skew angle is −2.54◦ (highest peak) for the
left half and 0.28◦ (second highest peak) for the right half of the image. The components in gray are those filtered out by the size filter or the
aspect-ratio filter, while the components in black are those grouped by the grouping function. The edges and vertices of their convex hulls are drawn
in gray.
inspection. There is observable warping along the spine of
the facing pages.
Fig. 14 is the sample J00B that is labeled in Fig. 10.
The ground truth is −0.48◦ , and the detected skew angle is
0.95◦ for the left page and −0.52◦ for the right page. The
parameter k increases from the initial 16 to the final 35. By
visual inspection, the ground truth represents the dominant
right page, while the detected most significant skew angle
is for the left page with more groups.
Fig. 15 is the sample N042 that is labeled in Fig. 10. The
ground truth is 0.79◦ , and the detected skew angle is 0.00◦ .
This example reveals one of the intrinsic limitations of angle
measurement in imaging grids—the inadequacy of angular
resolution at short distances. The angular resolution can be
ARTICLE IN PRESS
12
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 12. The sample A03I from UW-I (labeled in Fig. 10). The ground truth is −0.65◦ , and the detected skew angle is 1.06◦ .
estimated by the arctangent value of the reverse of distance
in pixels: 0.57◦ at 100 pixels; 0.12◦ at 500 pixels; 0.06◦
at 1000 pixels; and 0.01◦ at 5730 pixels. The longer the
distance, the higher the angular resolution becomes. For the
short lengths of the edges that are parallel or perpendicular
to one another in this sample, the detected angle becomes
0.00◦ .
Fig. 16 is the sample H04I from UW-I. The ground truth is
−0.10◦ , and the detected skew angle is −0.19◦ . In this sample, the graphical components are overwhelming in number
over the textual ones, yet the grouping method and the convex hull based skew estimation method still function cor-
rectly. It shows, together with the sample I047 in Fig. 17,
that the proposed method is rather robust in the existence of
excessive noise.
Fig. 18 is the sample A06M from UW-I. The ground truth
is −3.00◦ , and the detected skew angle is −2.75◦ . There
is apparent warping along the spine of the original document. However, this does not impose serious problem for
the correct detection of the skew angle. This is because even
though the warping does change the shape and edges of the
groups, the left edges (no significant warping) and the bottoms edges (against the warping direction) are much less
affected, if affected at all.
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
13
Fig. 13. The sample A05G from UW-I (labeled in Fig. 10). The ground truth is −2.12◦ , and the detected skew angle is 0.14◦ . There is no hint of −2.12◦
by visual inspection.
5. Comparisons and conclusions
In this paper, we compare our results with two published
work using the same UW-I database by Chen et al. [20]
(the creators of the UW-I databases) and by Bloomberg
et al. [10]. Chen’s method is based on a recursive morphological transform on a down-sampled image with a regression method for parameter fitting. Bloomberg’s method
is projection-profile based that counts pixels along varying
scanning lines. Since both teams did not provide numerical results in their papers, the comparisons have to be made
from the readouts of the charts on their published papers.
The three participants (Chen, Bloomberg and us) all use
the full set of 979 samples from UW-I against the provided ground truth. Table 1 shows the accumulated percentage of samples versus the absolute detection error, which
is defined as the absolute difference between the detected
skew angle and the given ground truth, of the participating
methods.
As shown in Table 1, within 0.1◦ of absolute error, the
best performer is that of Chen (manual mode, 2 × 3 structuring element) at 86%, followed by the convex hull based
at 61%, then that of Chen in auto mode at 55%, and that
of Bloomberg (quarter-sized) at 44%. All the participants
ARTICLE IN PRESS
14
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 14. The sample J00B from UW-I (labeled in Fig. 10). The ground truth is −0.48◦ , and the detected skew angle is 0.95◦ for the left page and
−0.52◦ for the right page. The most prominent peak is for the left page with more groups. The value of the parameter k has been increased from 16 to
35 automatically.
are able to detect about 99% of the samples within 0.5◦ in
their best parameter settings. Fig. 19 shows the best performances of the participants listed, and the numerical values
are in Table 1.
There are some concerns about the results of Chen’s
method. Considering the difficulty of the real document
samples in UW-I, their results are exceptional: 86% of the
samples are detected within 0.1◦ of absolute error with the
manually tuned optimal parameters. However, as we understand from their original paper [20], their machine-learning
algorithm uses the same set of samples for both training and
testing. The appropriate procedure is to divide the whole
sample set into two parts, one part for training and the other
part for testing. Even though they created new samples by
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
15
Fig. 15. The sample N042 from UW-I (labeled in Fig. 10). The ground truth is 0.79◦ , and the detected skew angle is 0.0◦ . This sample reveals the
limitation of angular resolution at short distances, which is true for any skew estimation method.
rotating the original samples with known angles, this does
not introduce data independency. Therefore, it is not clear
how well their method works for images other than that of
UW-I on which their parameters are tuned.
There is another measurement, the regression analysis, on
the results of the second test. This is only provided for our
method. It is not available from the other two parties. The
purpose of this measurement is that the test images all have
very small skew angles (< 3◦ ). A method needs to prove that
the results it produces are not random small values. Furthermore, the regression analysis chart reveals all the outliers
with large errors, which cannot be seen from the accumu-
ARTICLE IN PRESS
16
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 16. The sample H04I from UW-I. The ground truth is −0.10◦ , and the detected skew angle is −0.19◦ . This is one of the samples that demonstrate
the robustness of the proposed skew estimation method in the presence of excessive noises.
lated percentage chart provided above. We provide the results of both the accumulated percentage of samples and the
regression analysis in order to provide detailed information
about our method for scrutiny.
Based on the available experimental results in Table 1,
the conclusion can be made that our convex hull based
method has the best performance in terms of detection
accuracy and reliability. Bloomberg’s method at 2× reso-
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
17
Fig. 17. The sample I047 from UW-I. The ground truth is 0.50◦ , and the detected skew angle is 0.25◦ . This is one of the samples that demonstrate the
robustness and versatility of the convex hull based model in selecting hints for skew estimation.
lution reduction has close performance within larger range
of absolute error. As for Chen’s method, if they can address
our concerns about their training methodology and confirm the same results as shown in their papers, the performance with the manually tuned optimal parameter setting
is hard to surpass, while not quite so in their automatic
mode.
It is also helpful to compare the convex hull based skew
estimation method in this paper with the fiducial line based
skew estimation method in our previous paper as appeared
in Ref. [28]. The two methods are both general-purpose
skew estimation methods for textual documents. The major
differences between the two are:
• Principles: The method in Ref. [28] is based on the
alignment of individual text lines, while the method in
this paper is based on the layout of text blocks.
• Accuracy: The two methods have comparable accuracy,
with slightly better results from the method in this paper.
The difference is mainly due to some of the “difficult”
samples in UW-I.
• Efficiency: The two methods have comparable execution
speed, with slightly better results from the method in
this paper.
ARTICLE IN PRESS
18
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
Fig. 18. The sample A06M from UW-I. The ground truth is −3.00◦ , and the detected skew angle is −2.75◦ . The warping along the spine of the original
document does not impede the correct detection of the skew angle.
Table 1
Performances comparison using the 979 real document images in UW-I. Shaded rows are the best performances from Chen, Bloomberg and ours
Chen (2 × 3, manual)
Chen (2 × 2, manual)
Chen (2 × 3, auto)
Chen (2 × 2, auto)
Bloomberg (2× reduction)
Bloomberg (8× reduction)
Convex hull based
0.0◦
0.1◦
0.2◦
0.3◦
0.4◦
0.5◦
–
–
–
–
–
–
4%
86%
75%
55%
24%
44%
37%
61%
93%
88%
78%
43%
75%
64%
86%
97%
93%
89%
61%
93%
82%
95%
99%
95%
93%
74%
98%
90%
98%
99%
97%
95%
83%
99%
95%
99%
Sources: Figs. 1 and 3 in Ref. [20]; Figs. 3 and 6 in Ref. [10]. Charts digitization uncertainty: ±0.5%.
ARTICLE IN PRESS
B. Yuan, C.L. Tan / Pattern Recognition
(
)
–
19
100%
Accumulated percentage of samples
90%
80%
70%
Chen (2×3, manual)
Chen (2×3, auto)
60%
Bloomberg (2× reduction)
Convex hull based
50%
40%
0.1
0.2
0.3
0.4
0.5
Absolute error (degrees)
Fig. 19. The 3-party performance evaluation using the test suite UW-I. See Table 1 for the numerical values, the original sources, and the uncertainty of
the data acquisition process.
• Excessive noise: Both can work on images with excessive noise, but in different ways. The method in Ref.
[28] depends on the alignment of text lines and misalignment of noise, while the method in this paper depends on the separation of noise and text by the component grouping.
• Multiple skews: Both can detect the existence of
multiple skews in an image, but only the method
in this paper can locate the regions with different
skews.
• Cross-column correlation: Some documents have two
or more columns in which the baselines of text are not
aligned (collinear), or different font sizes are used across
the columns. The method in Ref. [28] may fail on this
type of documents, especially when mono-spaced fonts
(courier, etc.) are used. The method in this paper has no
difficulty in dealing with this type of documents because
the columns are separated by the component grouping
process.
• Component touching: This can be serious in documents
that are scanned at low resolution or the printing quality
of the original document is poor. When the component
touching is severe, the method in Ref. [28] can still
process in the background mode, but the method in this
paper may have grouping error.
The benefits of using the convex hulls in this paper can
be extended to other applications. For instance, if the fidu-
cial line based skew estimation method in Ref. [28] uses the
centroids of the convex hulls rather than the centroids of the
connected components, there are observable improvements
in both the accumulated percentage of samples and regression analysis using the same set of UW-I samples, and the
overall execution speed can be improved as well.
The proposed skew estimation method is highly competitive in execution speed. It takes 1617 seconds to process the
979 real images (2592 × 3300 pixels each) in UW-I (about
1.7 s/sample) excluding image I/O on the Java 5 platform on
a 2.3 GHz Pentium IV personal computer.
References
[1] G. Nagy, Twenty years of document image analysis in PAMI, IEEE
Trans. Pattern Anal. Mach. Intell. 22 (1) (2000) 38–62.
[2] L. O’Gorman, R. Kasturi, Document Image Analysis, IEEE Computer
Society Press, Los Alamitos CA, 1995.
[3] R. Cattoni, T. Coianiz, S. Messelodi, C.M. Modena, Geometric layout
analysis techniques for document image understanding: a review,
ITC-IRST Technical Report #9703-09, 1998.
[4] J.J. Hull, in: J.J. Hull, S.L. Taylor (Eds.), Document Analysis Systems
II, World Scientific, Singapore, 1998, pp. 40–64.
[5] A.D. Bagdanov, Evaluation of document image skew estimation
techniques, Proc. SPIE 2660 (1996) 343–353.
[6] W. Postl, Detection of linear oblique structures and skew scan
in digitized documents, in: Proceedings of the Eighth International Conference on Pattern Recognition, Paris, October 1986,
pp. 687–689.
ARTICLE IN PRESS
20
B. Yuan, C.L. Tan / Pattern Recognition
[7] H.S. Baird, The skew angle of printed documents, in: Proceedings of
SPSE 40th Annual Conference and Symposium on Hybrid Imaging
Systems, Rochester, NY, May 1987, pp. 21–24.
[8] Y. Nakano, Y. Shima, H. Fujisawa, J. Higashino, M. Fujinawa,
An algorithm for skew normalization of document images, in:
Proceedings of the 10th International Conference on Pattern
Recognition, Atlantic City, NJ, 1990, pp. 8–13.
[9] A.L. Spitz, Skew determination in CCITT group 4 compressed
images, in: Proceedings of the First Annual Symposium on Document
Analysis and Information Retrieval, Las Vegas, 16–18 March 1992,
pp. 11–25.
[10] D.S. Bloomberg, G.E. Kopec, L. Dasari, Measuring document image
skew and orientation, Document Recognition II, Proceedings of SPIE,
vol. 2422, San Jose, CA, 6–7 February 1995, pp. 302–316.
[11] N. Liolios, N. Fakotakis, G. Kokkinakis, On the generalization of
the form identification and skew detection problem, Pattern Recogn.
35 (1) (2002) 253–264.
[12] S.N. Srihari, V. Govindaraju, Analysis of textual images using the
Hough transform, Mach. Vision Appl. 2 (3) (1989) 141–153.
[13] S. Hinds, J. Fisher, D. D’Amato, A document skew detection method
using run-length encoding and the Hough transform, in: Proceedings
of the 10th International Conference on Pattern Recognition, Atlantic
City, NJ, 17–21 June 1990, pp. 464–468.
[14] D.S. Le, G.R. Thoma, H. Wechsler, Automated page orientation and
skew angle detection for binary document images, Pattern Recogn.
27 (10) (1994) 1325–1344.
[15] B. Yu, A.K. Jain, A robust and fast detection algorithm for generic
documents, Pattern Recogn. 29 (10) (1996) 1599–1629.
[16] U. Pal, B.B. Chaudhuri, An improved document skew angle
estimation technique, Pattern Recogn. Lett. 17 (8) (1996) 899–904.
[17] A. Hashizume, P.S. Yeh, A. Rosenfeld, A method of detecting the
orientation of aligned components, Pattern Recogn. Lett. 4 (1986)
125–132.s
[18] L. O’Gorman, The document spectrum for page layout analysis,
IEEE Trans. Pattern Anal. Mach. Intell. 15 (11) (1993) 1162–1173.
(
)
–
[19] R. Smith, A simple and efficient skew detection algorithm via text row
accumulation, in: Proceedings of the Third International Conference
on Document Analysis and Recognition, Montreal, Canada, August
1995, pp. 1145–1148.
[20] S. Chen, R.M. Haralick, An automatic algorithm for text skew
estimation in document images using recursive morphological
transforms, in: Proceedings of the IEEE International Conference
on Image Processing, Austin, TX, 13–16 November 1994,
pp. 139–143.
[21] B. Gatos, N. Papamarkos, C. Chamzas, Skew detection and text line
position determination in digitized documents, Pattern Recogn. 30
(9) (1997) 1505–1519.
[22] S. Lu, B.M. Chen, C.C. Ko, Document image rectification using
fuzzy sets and morphological operators, in: Proceedings of the IEEE
International Conference on Image Processing, Singapore, 24–27
October 2004, pp. 2877–2880.
[23] J. Sauvola, M. Pietikäinen, Skew angle detection using texture
direction analysis, in: Proceedings of the Ninth Scandinavian
Conference on Image Analysis, Uppsala Sweden, June 1995, pp.
1099–1106.
[24] C. Sun, D. Si, Skew and slant correction for document images
using gradient direction, in: Proceedings of the Fourth International
Conference on Document Analysis and Recognition, Ulm Germany,
18–20 August 1997, pp. 142–146.
[25] H.K. Aghajan, T. Kailath, SLIDE: subspace-based line detection,
IEEE Trans. Pattern Anal. Mach. Intell. 16 (11) (1994) 1057–1073.
[26] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, Skew angle estimation
in document processing using Cohen’s class distributions, Pattern
Recogn. Lett. 2 (1999) 1305–1311.
[27] B. Yuan, C.L. Tan, A multi-level component grouping algorithm
and its applications, in: Proceedings of the Eighth International
Conference on Document Analysis and Recognition, Seoul, Korea,
29 August–1 September 2005, pp. 1178–1181.
[28] B. Yuan, C.L. Tan, Fiducial line based skew estimation, Pattern
Recogn. 38 (12) (2005) 2333–2350.
About the Author — B. YUAN received the B.Sc. and M.Sc. degrees in Nuclear Physics in 1985 and 1988 from Peking University, China. He received
his M.Sc. and Ph.D. degrees in Computer Science in 2000 and 2006 from National University of Singapore. His current research interests include
automatic and semi-automatic extraction of man-made objects from satellite images. He is currently a research scientist in the Centre for Remote Imaging,
Sensing and Processing (CRISP), National University of Singapore.
About the Author — C.L. TAN received the B.Sc. (Hons.) degree in Physics in 1971 from University of Singapore, the M.Sc. degree in Radiation Studies
in 1973 from University of Surrey, UK, and the Ph.D. degree in Computer Science in 1986 from University of Virginia, USA. His research interests
include document image and text processing, neural networks and genetic programming. He has published more than 200 research publications in these
areas. He is an associate editor of Pattern Recognition. He has served on the program committees of many international conferences and workshops,
including the International Conference on Document Analysis and Recognition (ICDAR) 2005, International Workshop on Graphics Recognition (GREC)
2005, and the International Conference on Pattern Recognition (ICPR) 2006. He is currently an Associate Professor in the Department of Computer
Science, School of Computing, National University of Singapore.