Character Recognition using a Biorthogonal Discrete Wavelet

Character recognition using a biorthogonal discrete wavelet transform
George S. Kapogiannopoulos
Department of Informatics
University of Athens, Hellas (Greece)
Manos Papadakis
Hellenic Military Academy
Hellas (Greece)
ABSTRACT
We present an approach to o -line OCR (Optical Character Recognition) for hand-written or printed characters
using for feature extraction and classication Biorthogonal Discrete Wavelet Transform (BDWT). Our aim is to
optimize character recognition methods independently of printing styles (italics - bold), writing styles and fonts
used. Characters are identied with their contours, thus characterized from their curvature function. Curvature
function is used for feature extraction while classication is accomplished by LVQ algorithms. This method
achieves great recognition accuracy and font insensitivity requiring only a small training set of characters.
Key Words: wavelets, multiresolution analysis, OCR, shape representation, curvature.
1 INTRODUCTION
In this paper we present an o -line OCR system for individual handwritten or printed characters. We combine
the identication of every closed curve (with length) with its curvature function, with a biorthogonal multiresolution structure oriented decomposition algorithm and a neural network based on LVQ algorithm.
What is new in our system is the combination of the ability wavelet analysis o ers in the selective enhancement
either of main feature characteristics or ne details with the identication of an object with the curvature function
of its boundary.
Feature characteristics determine recognition accuracy. These should be insensitive to the character variations
that do not a ect the correct recognition by humans and sensitive to the character variations that lead to an
incorrect recognition by humans. Humans have the ability to recognize isolated characters with an accuracy of
97%.1 Robustness is satisfactory when the feature characteristics enable us to recognise characters of various
writing styles e ectively without the need of training for every individual style. This is top-priority requirement
especially in handwritten documents where the variation of writing styles is large even among the same persons.
Theory presented in the following section and experimental evidence presented in section 5 prove that curvature
is an accurate feature characteristic.
On the other hand there is enough evidence that the human vision decomposes every image into several
spatially oriented frequency channels each having approximately the same width on a logarithmic scale and that
the visual information in di erent frequency bands is processed separately.9 If we take as example the human
vision and the way individuals recognise various letters we can see that rst they get a general idea of the shape
of the letter and then they look for local details that will distinguish the particular letter among others similar.
For example letter 'C' is similar with letter 'G' but what distinguishes these two letters are the local details in
letter 'G'.
Fourier descriptors were used extensively in OCR because in a way implement the previous idea of human
vision mechanism. Indeed similar letters have similar low-frequency components. In other words low-frequency
components reect the basic shape while the high-frequency components represent the details of the character.15,10,12,4 The only di erence between the human vision and Fourier descriptors is that the Fourier descriptors
work globally, while the human vision both globally and locally. From 1988 and later wavelet analysis has become
very popular among scientists working in signal processing. Wavelet transform has the major advantage over the
Fourier transform, since it resembles the human vision.9
Our OCR system is consisted by 3 subsystems. The rst accomplishes curvature extraction of the contour of
the character. The second implements DBWT and the last one is a nearest neighbour classier which gives the
desired recognition.
Experimental evidence gave us high accuracy levels. On the other hand feature combination of the present
form of our system with systems using other feature characteristics such Euler number, number of holes, etc.
will lead to a complete system. We plan to develop such a system in an o -line OCR-system for Byzantine
hand-written documents and books where all characters are segmented.
2 MATHEMATICAL PRELIMINARIES
In 1992 Cohen, Daubechies and Feauveau in3 developed biorthogonal multiresolution structure (BMRA). It
is beyond the purpose of the present paper to present a complete
p account of their results. We restrict ourselves
in a brief account of the results we use. First set Df(t) = 2f(2t) , where f is an arbitrary square-integrable
function . It is easy to see that D is a unitary operator.
Consider the trigonometric polynomials m and m~ satisfying
m(t)m(t)
~ + m(t + )m(t
~ + )=1
(1)
We can select pairs m and m~ which dene compactly supported scaling functions and ~ constructed as in
the orthonormal case by means of the formulas
1
Y
^() = (2 );1=2
m(2;j )
(2)
1
Y
b~() = (2 );1=2
m(2
~ ;j )
(3)
j =1
j =1
such that both f0n : n 2 Zg and f~0n : n 2 Zg are Riesz bases for the subspaces they generate and satisfy
the biorthogonality condition < 0n ~0m >= nm . If we set V0 for the closed linear span of f0n : n 2 Zg
and V~0 for the closed linear span of f~0n : n 2 Zg , Vj = Dj (V0 ) and V~j = Dj (V~0 ), then fVj gj and fV~j gj are
multiresolution analyses. If Pj is the projection onto Vj along V~j? then it is a bounded operator and for every
square-integrable function f we have limj !1 Pj f = f.
There exist also functions and ~ such that f0n : n 2 Zg and f~0n : n 2 Zg are Riesz bases for the
subspaces they generate, W0 and W~ 0 respectively and satisfy the following biorthogonality conditions:
< 0n ~0m >= nm < 0n ~0m >= 0
We also have V1 = V0 +W0 and V~1 = V~0 + W~ 0 . Both sums are direct but not necessarily orthogonal. In addition
W0 is orthogonal to V~0 and W~ 0 is orthogonal to V0 . The latter direct sums are associated with decomposition
and exact reconstruction scheme implemented by a lter bank. Indeed assume f 2 V1 . Then f = f1 + f2 , where
f1 belongs to V0 and f2 belongs to W0 . If in addition f is compactly supported then it can be represented by a
nite linear combination of the basis elements of V1 . Decomposition algorithm implemented by two lters as in
gure 1 give the coecients of the nite linear combinations of the basis elements of V0 and W0 which represent
f1 and f2 respectively. The discrete Fourier transform of lter h1 is the trigonometric polynomial ei: m(:
~ + ),
while for h0 is the trigonometric polynomial m (see equation 1). Biorthogonality properties of f0n : n 2 Zg,
f~0n : n 2 Zg, f0n : n 2 Zg and f~0n : n 2 Zg prove decomposition and reconstruction algorithms. For details
on the subject one can refer to3 and to sections 3.12.1, 3.12.2.6 Some of the results listed here follow directly
from the results in3 and their proofs.
h1
2
h0
2
stage 1
h1
2
h0
2
stage 2
h1
2
h0
2
stage J
Figure 1: Octave-band lter bank with J stages implementing Decomposition algorithm.
Curvature is a main characteristic of space curves. When a curve is moved even without changing its shape the
equations representing the coordinate functions of the curve change. This was already known two centuries ago
and it was conjectured if there exist equations representing a space curve independently of the coordinate system,
equivalently independently of the position of the curve in the three dimensional space. We need the system we
built to be able (at least in the abstract sense) to recognize a character by its contour, which is a closed plane
curve, thus we need a coordinate-independent description of the contour of each character. This led us to use
the intrinsic equations of a curve and the following theorem which is a milestone of classical di erential geometry
and is due to F. Gauss.
Theorem 2.1. (Fundamental theorem for space curves) If two single-valued continuous functions k(s) and
(s), s > 0, are given, then there exists one and only one space curve, determined but for its position in space for
which s is the arc-length (measured from an appropriate point on the curve), k is the curvature function and is
the torsion.
The equations k = k(s) and = (s) are the intrinsic equations of a space curve. Plane curves have no torsion.
Applying the fundamental theorem to character-contours we obtain that these are characterized completely by
their curvature function parameterised by the arc-length. Let x be a plane curve parametrised by its arc-length
s. If '(s) is the angle between the tangent at x(s) and the positive horizontal half-axis, then the curvature k(s)
at x(s) is given by k(s) = d'
ds (s). For a complete introduction to our di erential geometry preliminaries and a
proof of the fundamental theorem one can refer to the classical book13 of Dirk Struik.
3 PREPROCESSING
Each character is identied with its contour. We describe the contour extraction procedure later. Every
such contour is a closed curve with length thus by the theorem 2.1 is characterized completely by its curvature
function. This is the principal feature characteristic we use for every character. As we have already noticed it is
independent of the position of the character, thus we do not have to deal with the coordinate functions of each
character contour, its rotations and translations. An additional advantage is that we identify a two-dimensional
object with an one-parameter real valued function. This also reduces computational complexity. More precisely
we use a piecewise linear approximation of the contour of each character (training or test) from which we extract a
ne approximation of the curvature function. This procedure will be described next in detail. On the other hand
some authors14 use the 2-dimensional representation of the character and Wavelet transform for each coordinate
function. Others10,12 consider curves as subsets of the complex plane and use Fourier transform in the complex
domain.
First we approximate the contour by tracing the boundary11 and then we code it using Freeman's chain code.5
Next we describe chain coding procedure.
On a rectangular grid a pixel is said to be 4-connected or 8-connected when it has the same properties as one
of its nearest four or eight neighbors.7
2
3
1
4
0
5
7
6
Figure 2: Directions corresponding to an 8-connected chain code.
The chain code can be consider
as a special kind of piecewise linear approximation of a curve in which all linear
p
segments have length 1 or 2 and all angular changes are multiples of a certain angle (45 degrees in 8-connected
paths). In chain coding we encode the direction vectors between successive boundary pixels. Usually in chain
coding if we consider 8-connected paths, we employ eight directions, which can be coded in 3-bit code words. It is
clear that the chain code depends on the pixel of the contour selected as starting pixel. In the case of 8-connected
chain code (gure 2), segments having code 0,1,7 (or 3,4,5) contribute to character's width, whereas segments
having code 1,2,3 (or 5,6,7) contribute to character's height. For a certain segment of a curve described with
chain code, with length s pixels, the coordinate wise di erences wi;s, hi;s between the starting pixel pi;s
and ending pixel pi of the segment are determined by taking the chain code cj between these two points and the
following algorithm:
wi;s =
hi;s =
X
j
X
j
Dwj
where
Dwj =
+1 if cj = 0 1 7
;1 if cj = 3 4 5
(4)
Dhj
where
Dhj =
+1 if cj = 1 2 3
;1 if cj = 5 6 7
(5)
The length of the arc of the contour at a given pixel is set equal to the number of pixels contained in the
contour between the starting pixel and the given one.
3.1 Curvature extraction and normalization
The curvature function for every closed plane curve is periodic. By segment length we mean a xed number
of consecutive pixels.
We use the chain code of a given contour to approximate the curvature function. In order to approximate the
curvature of the contour we must select a segment length in which we will evaluate the curvature function k(s)
dened in section 2 (gure 3). The segment length determines how smooth is the curvature function. Greater
segment lengths give smoother approximations of the curvature function. In the special case where the segment
length is 1 pixel the curvature changes are multiples of 45 degrees. Characters having di erent size, thus contours
of di erent length, have chain codes with di erent length. We must select such a segment length to evaluate the
curvature that optimises the removal of the noise due to chain quantization, while at the same time preserves
signicant ne detail and is scale invariant. We select s = N=8 as segment length to evaluate the curvature
function, where N is the total number of pixels found during boundary tracing.
( xi
( xi − s ,
−
1,
yi − 1)
(x i ,
yi )
yi − s)
(x i
ei
−
1−s,
yi − 1 − s )
ei − 1
Figure 3: Tangent values ei;s and ei;s;1 with segment size s = 6.
Next we describe the procedure of the approximation of the curvature function. Suppose we start from point
pi;s, and the selected segment has length equal to s pixels. Following the chain code from point pi;s for s pixels
to the point pi using formulas (4) and (5) we estimate the curvature at pi;s by the following equation:
hi;s ; arctan hi;1;s
ki;s = arctan w
w
i;s
i;1;s
i = 0 1 :::N ; 1
It follows easily that this procedure gives a ne approximation of the curvature function of the given contour.
We call this procedure curvature extraction.
The input vector for the DBWT (Discrete Biorthogonal Wavelet Transform) is the vector which in each
coordinate takes the estimated value of the curvature of the contour at the corresponding pixel by the previous
equation. Since the size of the input vector determines the high-resolution level from which the DBWT resumes,
we need all our input-vectors to have the same size. We use input-vectors of N = 512 coordinates for our DBWT.
If after the curvature extraction we have an input-vector with less than 512 coordinates we create another inputvector of exactly 512 coordinates by linear interpolation. We call this procedure curvature normalization.
The description of the contour of every character via the curvature function has the signicant advantages
which we already mentioned. However the selection of the starting pixel of the chain code of the contour gives
shifted versions of the approximations of the curvature function. To deal with the particular diculty we adopt
a uniform selection rule for the starting pixel during chain coding. We select as starting pixel the rst bottom
pixel when scanning from left to right.
3.2 Energy normalization
Many problems arise from the size of the letters and the quality of their quantization. Di erent sized letters
with the same quality of quantization may have the same curvature function, because of the variable segment size
we use extract the curvature function. Independently of the size of the character, if the quantization is sparse we
will have peaks, which do not correspond to real character features. These peaks a ect signicantly the absolute
value of the wavelet coecients. On the other hand if we have well-quantized characters the presence of the peaks
in the approximation of the curvature function is reduced. In this case peaks correspond to important character
features we want to detect. Our aim is to develop an eventually insensitive to quantization quality system for
feature extraction. The peaks of the curvature function increase signicantly the total energy of the curvature
function. Since our classier uses Euclidian distance which measures the energy of the di erence between two
vectors representing curvature functions, we need to bring all vectors to the same energy level. This procedure
is called energy normalization. We accomplish this in the following way: The energy E of the curvature function
of every character is given by the following equation:
NX
;1
E = N1
j k(i) j2
i=0
The values of curvature function must be scaled in order the curvature function's energy to bepequal to the
energy level Et. This is done by multiplying each value of the curvature function with the value Et=E. This
makes classication more accurate.
4 FEATURE EXTRACTION AND CLASSIFICATION
Wavelet representation by DBWT of curvature function has the additional advantage that variations in the
shape of the curve will cause only minor changes in the wavelet representation. Indeed experimental results show
that wavelet representation is robust and insensitive to noise. In addition we need only a small training set for
high recognition accuracy (see next section). The curvature function after the energy normalization is decomposed
by means of biorthogonal wavelet transform. In our experiments we used Biorthogonal Daubechies pairs (3,9).
We decompose every input vector in 9 resolution levels but for the classication we use only the 5 coarsest levels
(gure 4). Even if we use the 4 coarsest levels we do not experience a signicant accuracy loss.
The output vectors of the 5 coarsest levels constitute a single vector of 32 coordinates. These vectors corresponding either to training or test characters are inputs in the classication subsystem which is the last component
d)
a)
b)
c)
e)
d)
a)
b)
c)
e)
Figure 4: Wavelet analysis of two characters. a) Contour description. b) Curvature function. c) Curvature
function interpolated to 512 points. d) Curvature function after energy normalization. e) Biorthogonal wavelet
coecients.
of our system.
For classication we use an LVQ (Learning Vector Quantization) classier.8 LVQ is an auto-associative nearest
neighbor classier that classies arbitrary m patterns into p classes using an error-correction encoding procedure.
5 EXPERIMENTS
For our experiments we use a sample dataset of characters provided for free by NIST. The dataset was taken
from 49 di erent persons and contains 2870 distinct samples of handwritten digits (gure 5). The training set
used was usually about the 1/8 of the whole dataset. Training set and test set were disjoint.
Figure 5: A random sample of the characters used.
Experiment Training set Test set Accuracy in training set Accuracy in test set
1
317
2554
98.42%
93.56%
2
306
2564
98.37%
93.71%
3
976
1894
97.95%
95.62%
4
365
2505
98.36%
93.87%
5
362
2508
97.24%
92.30%
6
362
2508
96.97%
86.84%
In experiments 1-3 we take as training set randomly selected samples from the whole dataset of the 49 persons,
while in experiments 4-6 we take samples from the rst 10-12 persons. The accuracy results which are almost
independent of the selection of the training set and the number of persons, exhibit that wavelet representation is
robust, insensitive to noise and writing style, and requiring only a small training set for good recognition accuracy.
Experiment 6 reveals the superiority of the representation of the character by its curvature function comparing
to the representation of the contour as a 2-dimensional parametric curve. In this experiment we loose about 5%
in accuracy if we use the same wavelet decompositions, for each coordinate representation and the curvature
function.
The feature extraction subsystem takes in account only one characteristic, the DBWT coecients. Geometrical
features like Euler number, or the number of holes of a character have been ignored. It is important to notice
that such features do not characterize all elements of the dataset representing a single character. For example a
character may be thick enough that no holes exist.
6 CONCLUSIONS
The proposed OCR system has the advantage of being (at least in the abstract sense) independent of character
translations and rotations. However a uniform rule for the selection of the starting pixel for the contour extraction
procedure, does not allow us to get full benet from its translation and rotation invariance properties. Curvature
representation of characters being a real-valued function is signicantly more ecient comparing to 2-dimensional
representations of character contours. Experimental results are very encouraging considering that the size of the
database is small while the character variations are large. Wavelet analysis proved to be robust and insensitive
to noise and writing style. The accuracy was very high even in cases of small training sets used comparing to the
whole dataset. In other words the system succeeded even with unfamiliar writing styles. Wavelet analysis of the
curvature function seems to capture e ectively main feature characteristics in coarser scales. This allows us to
represent characters with small sized vectors, which lead to an e ective classication with an LVQ type classier.
In our future work we will experiment with wavelet packets which may increase the accuracy and robustness of
the system. Morphological and geometrical features e.g. Euler number and features associated with character's
skeletons, will be considered as supplementary characteristics in order to improve accuracy. Neural network
classication will be used in every level of the wavelet decomposition, from coarsest scales to ner ones. For each
the classier will reject mismatching characters. This will result faster and more accurate classication. We also
expect very low levels of misclassication. This is of particular importance for an OCR system, because rejections
are interpreted as errors, while misclassications pass undetected.2
ACKNOWLEDGMENT: The authors thank Professor N. Kalouptsidis (Department of Informatics, University of Athens, Greece) for the constant support, encouragement and helpful discussions during the preparation
of the present paper.
7 REFERENCES
1] Barry A. Blesser, Theodore T. Kuklinski, and Robert J. Shillman, \Empirical Tests for Feature Selection
Based on a Psychological Theory of Character Recognition", Pattern Recognition, Vol. 8, pp. 77-85, 1976.
2] Mindy Bokser, \Omnidocument Technologies", Proceedings of the IEEE, Vol. 80, No. 7, pp. 1066-1078, July
1992.
3] A. Cohen, I. Daubechies and J.-C. Feauveau, \Biorthogonal bases of compactly supported wavelets", Comm.
Pure and Appl. Mathematics, Vol. XLV (1992), pp. 485-560.
4] Thomas R.Crimmins, \A Complete Set of Fourier Descriptors for Two-Dimensional Shapes", IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-12, No. 6, pp. 848-855, November/December 1982.
5] H. Freeman, \Computer processing of line-drawing images", ACM Computer Surveys 6, pp. 57-97, March
1974.
6] M. Holschneider, Wavelets. An analysis tool, 1995, Oxford science publications.
7] Anil K. Jain, Fundamentals of Digital Image Processing
8] Teuvo Kohonen, \The self-organizing map", Proceedings of the IEEE, 78 (9), pp. 1464-1480, 1990.
9] Stephane G. Mallat, \Multifrequency Channel Decompositions of Images and Wavelet Models", IEEE Transactions of Acoustics, Speech and Signal Processing, Vol. 37, No. 12, December 1989.
10] Eric Persoon, and King-Sun Fu, \Shape Discrimination Using Fourier Descriptors", IEEE Transactions on
Systems, Man and Cybernetics, Vol.SMC-7, No. 3, pp. 170-179, March 1977.
11] I. Pitas, Digital image processing algorithms, Prentice Hall, 1992.
12] Michael T. Y. Lai and Ching Y. Suen, \Automatic Recognition of Characters by Fourier Descriptors and
Boundary Line Encodings", Pattern Recognition, Vol. 14, Nos. 1-6, pp. 383-393, 1981.
13] D.J. Struik, Lectures on classical dierential geometry, 1961, Addison-Wesley Pub. Co.
14] Patrick Wunsch and Andrew F. Laine, \Wavelet Descriptors for Multiresolution Recognition of Handprinted
Characters", Pattern Recognition, Vol. 28, No. 8, pp. 1237-1249, 1995.
15] Charls T. Zahn and Ralph Z. Roskies, \Fourier Descriptors for Plane Closed Curves", IEEE Transactions
on Computers, Vol. c-21, No. 4, pp. 269-281, March 1972.
Mailing addresses:
George S. Kapogiannopoulos
18 Megdoba str.
GR-13122 ILION
HELLAS (GREECE)
email: [email protected]
Manos Papadakis
93 Abidou str.
GR-15771 Zografou
HELLAS (GREECE)
email: [email protected]