Slant estimation algorithm for OCR systems

Pattern Recognition 34 (2001) 2515}2522
Slant estimation algorithm for OCR systems
E. Kavallieratou*, N. Fakotakis, G. Kokkinakis
Wire Communications Laboratory, Electrical Computer Engineering, University of Patras, 26500 Patras, Greece
Received 13 December 1999; accepted 10 October 2000
Abstract
A slant removal algorithm is presented based on the use of the vertical projection pro"le of word images and the
Wigner}Ville distribution. The slant correction does not a!ect the connectivity of the word and the resulting words are
natural. The evaluation of our algorithm was equally made by subjective and objective means. The algorithm has been
tested in English and Modern Greek samples of more than 500 writers, taken from the databases IAM-DB and GRUHD.
The extracted results are natural, and almost always improved with respect to the original image, even in the case of
variant-slanted writing. The performance of an existed character recognition system showed an increase of up to 9% for
the same data, while the training time cost was signi"cantly reduced. Due to its simplicity, this algorithm can be easily
incorporated into any optical character recognition system. 2001 Pattern Recognition Society. Published by Elsevier
Science Ltd. All rights reserved.
Keywords: Slant estimation; Script writing; Wigner}Ville distribution; Character recognition
1. Introduction
Handwritten text is usually characterized by slanted
characters. In particular, the slanted characters slope
either from right to left or vice versa. Moreover, di!erent
deviations may appear not only within a text but also
within a single word. Some examples illustrating these
cases are shown in Fig. 1. Slanted characters constitute
a common feature of any natural language with a Latinstyle alphabet (e.g., English, Modern Greek, etc.). For
example, the percentage of slanted writing in IAM-DB
[1] database of English reaches 77% while the corresponding percentage in GRUHD, a database of Modern
Greek (1000 writers are included) [2], approaches 59%.
The above rates were provided by manually counting the
forms with apparent slanted writing either to the left or
to the right, according to the human judge. Furthermore,
slanted characters appear in both hand-printed and cursive writing.
* Corresponding author. Tel.: #30-61-991722; fax: #30-61991855.
E-mail address: [email protected] (E. Kavallieratou).
Consequently, a robust optical character recognition
(OCR) system has to be able to cope with slanted characters. Watanabe [3] conducted comparative experiments
showing that slant normalization minimizes the error of
recognition. To a further extent, the two most important
problems that may arise from the existence of slanted
characters, in regard to an OCR system, are the following:
E The application to slanted words of a character segmentation procedure, if such is required, that produces
vertical segment boundaries (e.g., based on histograms),
could result in defectively segmented characters as well
as in noisy segments.
E Both the computational cost of the training procedure
and accuracy of the recognition stage would be a!ected in a negative way. Indeed, concerning a single
character, the amount of the training data required for
covering as many slant angles as possible is substantial. Moreover, the classi"cation of a given character
into the correct class is much harder since the set of
possible classes is now bigger.
Therefore, the majority of recent OCR systems contain
a preprocessing stage dealing with slant correction [4}7].
0031-3203/01/$20.00 2001 Pattern Recognition Society. Published by Elsevier Science Ltd. All rights reserved.
PII: S 0 0 3 1 - 3 2 0 3 ( 0 0 ) 0 0 1 5 3 - 9
2516
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 1. Examples of slanted word.: (a) a word slanted to right
(b) a word slanted to left (c) a variant-slanted word.
This stage is usually located before the segmentation
module, if it exists, or just before the recognition stage
otherwise.
The most commonly used method for slant estimation
is the calculation of the average angle of near-vertical
strokes [4}6]. This approach requires the detection of the
edges of the characters and its accuracy depends on the
particular characters included in the word. Shridar [7]
presents two more methods concerning slant estimation
and correction. In the "rst one, the vertical projection
pro"le is used while the second one involves making use
of the chain code of entire border pixels.
Vinciarelli [8] presents a technique based on a function S() that provides a measure of the slant absence
across the word. The calculation of such function relies
on the vertical density histogram that is easier to be
obtained than the direction of the strokes. For each angle
in an interval [!153, 153], a shear transform t is
?
applied and the following histogram is calculated:
h (m)
H (m)" ?
, 0)m(nCol,
?
y(m)
where h (m) is the value of the vertical density histogram
?
of the image shear transformed by the angle . y(m) is
the di!erence between the maximum and minimum y
coordinates of the foreground pixels in the m column and
nCol is the number of columns in the image. The number
of foreground pixels of each column is divided by the
distance between the highest and the lowest pixel giving
H (m)"1, if the column contains a continuous stroke,
?
and H H (m)3[0,1] otherwise.
?
Then, the maximum quantity is computed:
h (i).
?
G&? G
S()"
The value for which S() is maximum, is assumed as
slant estimate and the corresponding shear transform t ,
?Y
when applied on the desloped original image, gives the
deslanted image. Although time consuming in the shearing transformation procedure, the above method proved
to decrease the error rate by 12% compared to that
proposed by Bozinovic [9].
However, the evaluation of the slant correction approaches is di$cult since the slants may vary even within
a single word. Additionally, in the relevant literature a
slant correction procedure is rarely evaluated separately,
so that comparative results cannot be given.
In this paper a new method for slant estimation is
presented based on a combination of the projection pro"le technique and the Wigner}Ville distribution. Moreover, it uses a simple and fast shearing transformation
technique. Our approach is character independent and
can easily be adapted in order to satisfy the requirements
of any OCR system. Its performance is measured in
relation to the improvement of the results of a character
recognition system.
In the next section the Wigner}Ville distribution
(WVD) is brie#y presented. The algorithm is described in
Section 3. Finally, some experimental results are given in
Section 4 and the conclusions drawn by this study are
included in Section 5.
2. Wigner}Ville distribution
An important chapter of signal processing is the nonstationary signals, that is, signals whose characteristics
vary with time contrary to stationary signals that are
time-independent. Therefore, in order to succeed a better
representation of non-stationary signals joint timefrequency distributions are used.
A "rst class of time-frequency representations is the
atomic decomposition or linear time}frequency representations. These distributions decompose the signal on the basis
of elementary signals (i.e. the atoms) which have to be well
localized in both time and frequency. However, there is
a trade-o! between time and frequency resolutions, as the
decomposition is succeeded by windowing the signal.
A good time resolution requires a short window; on the
other hand, a good frequency resolution requires a long
window. This is a consequence of the time}frequency resolution relation via Heisenberg} Gabor inequality [10].
In contrast, the energy distributions, another class
of time}frequency representations, distribute the energy
of the signal over two description variables: time and
frequency. The starting point is that since the energy of
a signal x can be deduced from the squared modulus of
either a signal or its Fourier transform,
E "
V
>
>
x(t) dt"
X( f ) df,
\
\
(1)
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
we can interpret x(t) and X( f ) as energy densities,
respectively, in time and in frequency. It is, then natural,
to look for a joint time and frequency energy density
(t, f ), such that
V
E "
V
> >
(t, f ) dt df,
V
\ \
(2)
which is an intermediary situation between those described by Eq. (1). As the energy is a quadratic function of
the signal, the time}frequency energy distributions will
be, in general, quadratic representations.
A very well-known representative of the energy distributions and member of the Cohen's class [11], is the
Wigner}Ville distribution:
=(t, f )"
>
z(t#/2)z(t!/2)e\DO d,
\
>
Z( f#u/2)Z( f!u/2)e
SR du.
\
1. The word image is arti"cially slanted to both left and
right for di!erent slant angles. The maximum slant
angle is 453 approximately and the slant angle step
depends on the height of the word image, as described
below.
2. For each of the extracted word images, the vertical
histogram is calculated.
3. The WVD is calculated for all the above histograms.
4. The curves of maximum intensity of the WVDs are
extracted.
5. The curve of maximum intensity with the greatest
peak, corresponding to the histogram with the most
intense alternations, is selected. In Fig. 3 several
curves of maximum intensity are shown.
6. The corresponding word image is selected as the most
non-slanted word.
(3)
where z(t) represents the analytical signal associated with
the signal s(t). The WVD can also be expressed as a function of the spectrum of the signal, Z( f ), under analysis as
follows:
=(t, f )"
2517
(4)
The WV function is a particularly popular distribution
due to the large number of desirable mathematical properties it satis"es. Claasen and Mecklenbrauker [10] proved the uniqueness of the Wigner}Ville distribution `in
the sense that it is that single energy distribution that
possesses all the stated desirable propertiesa. This justi"es
the numerous applications of WVD in Pattern Recognition [12,13], Synthesis [14], Seismic signal [15], Optics
[16], etc.
3. The algorithm
The vertical projection pro"le of a non-slanted word
presents the most dips between the characters, even if
they are connected, and the highest peaks at the main
body of the characters. The latter is much more evident
when ascenders and descenders are included. In the case
of slanted words, the otherwise vertical strokes of the
characters cover now the intra-character gaps. Hence, the
dips of the histogram are less deep, while the peaks are
smoother. The alternations (i.e., between dips and peaks)
of the vertical projection pro"le of a certain word are
more intense when it is non-slanted than slanted. In
Fig. 2 the vertical projection pro"les for di!erent slants
of the same word are shown. These slanted words were
produced automatically by applying the technique described below.
The proposed algorithm for slant estimation and correction for a given word, consists of six steps:
In order to slant a word image to right or left we follow
the procedure below. The word image is segmented in
equally wide horizontal zones. The lowest zone is considered to be the base. The zone above the base is shifted
one pixel to the right or to the left. The next zone (if it
exists) is shifted two pixels to the same direction, etc.
Thus, each pixel p(x, y) of the image, is shifted to the point
(x, y):
x"x#i, Z(i(Z,
0(Z)h
(5)
and
y"y,
(6)
where successive zones are identi"ed by successive ordinals Z and h the height of the image. The corresponding
slant will be
tan "Z!1/h.
(7)
The more the zones, the greater the slant angle. The
maximum slant angle corresponds to one-pixel-wide
zones (i.e., when the amount of zones is equal to the
height h of the word image in pixels). In this case, the
higher zone is shifted by h!1 pixels and the corresponding slant angle is the maximum one and it can be
calculated as
tan "h!1+1N"453.
(8)
To illustrate the above procedure, the gradual slanting
of a vertical stroke is shown (in enlargement) in Fig. 4,
while in Fig. 5 we can see the maximum slant of a nonslanted word to left and right. Notice that the words
produced by that technique are very natural (as can be
seen in Fig. 2). A maximum slant angle of $453 covers
the vast majority of handwritings. Shifting each zone by
more than one pixel would increase the maximum slant
angle. However, it could cause undesirable disconnections between adjacent zones producing an unnatural
outcome.
2518
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 2. Vertical projection pro"les of a word with various slants. For each slant the slant angle and the number of horizontal zones are as
follows: (a) !453!101, (b) !303!31, (c) !153!27, (d) 03!1, (e) 153!27, (f) 303!31, (g) 453!101.
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
2519
Fig. 3. Curves of maximum intensity that correspond to the histograms of Fig. 2.
Fig. 4. The gradual slanting of a vertical stroke in enlargement.
Fig. 5. The maximum slant of a word to left and right.
4. Experimental results
The presented slant removal algorithm has been tested
on a collection of word images taken from the IAM-DB
[1] and the GRUHD [2] databases comprising English
and Modern Greek unconstrained handwriting, respectively. In more detail, more than 1500 word images were
used taken from approximately 500 di!erent writers,
selected randomly. The word segmentation was performed automatically based on an OCR preprocessing
system [18].
As already mentioned, the evaluation of a slant removal method is di$cult since the selection of the most
appropriate result very often falls under subjective judgements (especially in case of dealing with variant-slanted
words). Moreover, a slant removal algorithm can
2520
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
Fig. 7. The system into which the proposed algorithm has been
incorporated.
Fig. 6. Some experimental results (leftdown: original word images, rightdown: corrected word images).
indirectly be evaluated by taking into account the improvement it provides to an existing OCR system.
Nevertheless, the application of our method to the
above mentioned data produced very satisfactory results.
Some examples are shown in Fig. 6. In general, it is clear
that even in the hardest cases the produced word is
considerably improved as regards its processing in
further stages (e.g., character segmentation, character
recognition, etc.).
It is worth noting that the already non-slanted words
are not a!ected in a negative way by applying this algorithm whether the characters are connected or not. The
ascenders and descenders, if they exist, play an important
role in slant estimation since the alternations of peaks
and dips in the histogram are getting more evident.
However, the proposed method can also handle words
without any ascender or descender. Regarding the
variant-slanted words, the slant of the majority of the
vertical strokes included in the word is more likely to
determine the "nal slant angle since more peaks of the
histogram will be generated.
The presented algorithm has been incorporated into
the character recognition system, shown in Fig. 7, aiming
at the automatic processing of document images. The
system, except of slant removing, includes six other main
modules, namely skew angle estimation and correction,
printed-handwritten text discrimination, line segmentation, word segmentation, character segmentation and
recognition, stemming from the implementation of already existing as well as novel algorithms. The proposed
technique could be applied to words or directly to the
text line images resulted from line segmentation. However, the uneven valleys of the vertical histogram, i.e. wide
valleys between words narrow valley between characters,
could give confusing results in some cases. This is the
reason that we preferred to use part of the text lines
instead. Assuming that every page corresponds to only
one writer, a skew angle is estimated per page. The longer
and the most solid the text parts, the more accurate the
estimation procedure will be, since they include more
information and even more histogram.
In order to select these parts, the valleys of the vertical
histogram of a text line, with width greater than a threshold
are considered to be the boundaries between the parts.
The 10 longest from the resulted parts were selected. As
a threshold, the 1/10 of the line height was used. However, any threshold small enough to sense word segment
valleys or smaller is appropriate. The resulted parts usually are entire words or parts of words. The slant estimation technique is applied to them and the average of the
estimated slants is considered to be the slant of the page.
In Fig. 8 a document image of IAM-DB as inserted
in the system and after the slant correction is shown. In
Fig. 9 the dependence of the recognition accuracy on the
amount of training samples per characters with and without the application of the slant removing algorithm are
presented, respectively. The results have been obtained
from tests applied to 200 forms (100 IAM-DB forms and
100 GRUHD forms). In the case of IAM-DB the system
was trained with samples taken by NIST [17] database,
while for the GRUHD, training sets, di!erent from the
testing sets, of the same database were used. As compared
with the initial recognition system (i.e., when no slant
correction was performed), the required training data for
achieving similar performance were reduced by more
than one-third.
The computational cost of using the proposed technique depends strongly on the size of the document.
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
2521
Fig. 8. An example of IAM-DB document image (a) as inserted into the system and (b) after the slant removing.
Fig. 9. Recognition accuracy versus the amount of training
samples per character, (a) before and (b) after the incorporation
of the slant removing algorithm.
However, the whole slant removing procedure for the
document of Fig. 8 requires 29.516 s using a Pentium III
at 300 MHz while just the slant estimation part demands
22.72 s.
5. Conclusions
In this paper, we presented an algorithm for slant
removal. In contrast to current techniques, our method
is character independent since it is based on the intent
alternations of the vertical histogram, indicating the
vertically oriented characters, rather than detecting the
almost vertical strokes that may be included in the word.
The WVD was used in order to estimate the slant angle
that can range between $453 according to the original
position.
The evaluation of our algorithm was made by both
subjective and objective means. First, the algorithm
was applied to isolated word image samples from both
English and Modern Greek databases. The extracted
results are natural, and almost always improved with
respect to the original image, even in the case of variantslanted writing. Then the algorithm was applied to entire
document images by incorporating it into an OCR system. The performance of the character recognition system was increased by up to 9% for the same data, while
the training time cost was signi"cantly reduced.
Almost any OCR system can bene"t from our algorithm since it requires little computational cost and it is
easily adapted. Cases where characters within a single
word have to be corrected by di!erent slant angles cannot be handled by our approach since it is based on the
dominant angle. However, we currently work on providing the most accurate results.
References
[1] U. Marti, H. Bunke, A full English sentence database for
o!-line handwriting recognition. Proceedings of the Fifth
International Conference on Document Analysis and Recognition, ICDAR'99, Bangalore, 1999, pp. 705}708.
[2] E. Kavallieratou, N. Liolios, E. Koutsogiorgos,
N. Fakotakis, G. Kokkinakis, The GRUHD database of
Modern Greek Unconstrained Handwriting, LREC2000,
Athens, vol. 3, 1999, pp. 1755}1759.
2522
E. Kavallieratou et al. / Pattern Recognition 34 (2001) 2515}2522
[3] M. Watanabe, Y. Hamammoto, T. Yasuda, S. Tomita,
Normalization techniques of handwritten numerals for
Gabor "lters, Proceedings of the International Conference
on Document Analysis and Recognition, ICDAR IEEE,
Los Alamitos, vol. 1, CA, 1997, pp. 303}307.
[4] G. Kim, V. Govindaraju, E$cient chain-code-based image
manipulation for handwritten word recognition, Proceeding of SPIE-The International Society for Optical
Enginering, Bellingham, vol. 2660, WA, USA, 1996,
pp. 262}272.
[5] S. Knerr, E. Augustin, O. Baret, D. Price, Hidden Markov
model based word recognition and its application to legal
amount reading on french checks, Comput. Vision Image
Understanding 70 (3) (1998) 404}419.
[6] A.W. Senior, A.J. Robinson, An o!-line cursive handwriting recognition system, IEEE Trans. Pattern Anal.
Mach. Intell. 20 (3) (1998) 309}321.
[7] M. Shridar, F. Kimura, Handwritten address interpretation using word recognition with and without lexicon,
Proceedings of the IEEE International Conference on
Systems, Man and Cybernetics, Piscataway, vol. 3, NJ,
USA, 1995, pp. 2341}2346.
[8] A. Vinciarelli, J. Luettin, O!-line cursive script recognition
based on continuous density HMM, Proceedings of the
Seventh International Workshop on Frontiers in Handwriting Recognition, Amsterdam, September 2000.
[9] R.M. Bozinovic, S.N. Srihari, O!-line cursive script word
recognition, IEEE Trans PAMI 11 (1) (1989) 68}83.
[10] T.A. Claasen, W.F. Mecklenbrauker, The Wigner distribution: a tool for time-frequency signal analysis, Phillips J.
Res. 35 (Parts 1}3) (1980) 217}250, 276}300, and 372}389.
[11] L. Cohen, Generalized phase-space distribution functions,
J. Math. Phys. 7 (1966) 781}786.
[12] B. Boashash, B. Lovell, L. White, Time frequency analysis
and pattern recognition using singular value decomposition of the Wigner}Ville distribution, advanced algorithms
and architecture for signal processing, Proc. SPIE 828
(1987) 104}114.
[13] G. Cristobal, J. Bescos, J. Santamaria, Application of
Wigner distribution for image representation and analysis,
Proceedings of the IEEE Eighth International Conference
Pattern Recognition, 1986, pp. 998}1000.
[14] K.B. Yu, S. Cheng, Signal synthesis from Wigner distribution, Proceedings of the IEEE ICASSP 85, 1985,
pp. 1037}1040.
[15] P. Boles, B. Boashash, The cross Wigner}Ville
distribution-a two dimensional analysis method for the
processing of vibrosis seismic signals, Proceedings of the
IEEE ICASSP 87, 1988, pp. 904}907.
[16] O. Kenny, B. Boashash, An optical signal processing for
time}frequency signal analysis using the Wigner}Ville
distribution, Journal of Electrical Electronic Engineering,
Australia, 1988, pp. 152}158.
[17] R. Wilkinson, J. Geist, S. Janet, P. Grother, C. Burges,
R. Creecy, B. Hammond, J. Hull, N. Larsen, T. Vogl,
C. Wilson, 1992. The First Census Optical Character
Recognition Systems Conference CNISTIR 4912. The
U.S. Bureau of Census and the National Institute of Standards and Technology. Gaithersburg, MD.
[18] E. Kavallieratou, N. Fakotakis, G. Kokkinakis, An Integrated system for Handwritten Document Image Processing, under reviewing.
About the Author*ERGINA KAVALLIERATOU was born in Kefalonia, Greece, in 1973. She received the Diploma in Electrical and
Computer Engineering in 1996 from the Polytechnic School of the University of Patras. She received her Ph.D. in Handwritten Optical
Character Recognition and Document Image processing from the same department. During the academic year 1997}1998 she was
a member of the Signals, Systems and Radiocommunications Laboratory of the Dept. of Telecommunications Engineering of the
Polytechnic School of Madrid, working on Image Processing. Her research interests include Optical Character Recognition, Document
Image Analysis and Image Processing.
About the Author*GEORGE KOKKINAKIS was born in Chios, Greece. He received the Diploma in Electrical Engineering
(Dipl.-Ing.) in 1961, the Doctor's Degree in Engineering (Dr.-Ing) in 1966 and the Diploma in Engineering Economics (Dipl.-Wirt.-Ing)
in 1967, all from the Technical University of Munich (Technische Hochschule Munchen). Since 1969 he is with the Department of
Electrical Engineering (Electrical and Computer Engineering since 1995) at the University of Patras, where he has organized and is
directing the Wire Communications Laboratory (WCL). His current activity in research and development includes, apart from
telecommunications subjects, the analysis, synthesis, recognition and linguistic processing of the Greek language. He has published
several books on Telecommunications and Electrotechnology and over 250 technical papers, articles and reports on Speech and
Language Technology and on Telecommunications. Dr. Kokkinakis is a senior member of IEEE and a member of the Technical
Chamber of Greece (TEE), the VDE (Verein Deutscher Elektrotechniker), ISCA (International Speech Communication Association), the
EURASIP (European Association for Signal Processing), the Linguistics Society of America (LSA) and the GORS (Greek Operations
Research Society). Since 1997 he is a member of the board of ISCA.
About the Author*DR. NIKOS FAKOTAKIS received the B.Sc. degree from the University of London (UK) in Electronics in 1978, the
M.Sc. degree in Electronics from the University of Wales (UK), and the Ph.D. degree in Speech Processing from the University of Patras,
(Greece), in 1986. From 1986 to 1992 he was lecturer in the Electrical and Computer Engineering Dept. of the University of Patras, from 1992
to 1999 Assistant Professor and since 2000 he has been Associate Professor in the area of Speech and Natural Language Processing and Head
of the Speech and Language Processing Group at the Wire Communications Laboratory. Dr. Fakotakis is author of over 100 publications in
the area of Speech and Natural Language Engineering. His current research interests include Speech Recognition/Understanding, Speaker
Recognition, Spoken Dialogue Processing, Natural Language Processing and Optical Character Recognition. Dr. Fakotakis is
a member of the Executive Board of ELSNET (European Language and Speech Network of Excellence), Editor-in-Chief of the
`European Student Journal on Language and Speecha, WEB-SLS. He is also a member of IEEE, TEE, EURASIP, ISCA.