Character recognition using a biorthogonal discrete wavelet transform George S. Kapogiannopoulos Department of Informatics University of Athens, Hellas (Greece) Manos Papadakis Hellenic Military Academy Hellas (Greece) ABSTRACT We present an approach to o -line OCR (Optical Character Recognition) for hand-written or printed characters using for feature extraction and classication Biorthogonal Discrete Wavelet Transform (BDWT). Our aim is to optimize character recognition methods independently of printing styles (italics - bold), writing styles and fonts used. Characters are identied with their contours, thus characterized from their curvature function. Curvature function is used for feature extraction while classication is accomplished by LVQ algorithms. This method achieves great recognition accuracy and font insensitivity requiring only a small training set of characters. Key Words: wavelets, multiresolution analysis, OCR, shape representation, curvature. 1 INTRODUCTION In this paper we present an o -line OCR system for individual handwritten or printed characters. We combine the identication of every closed curve (with length) with its curvature function, with a biorthogonal multiresolution structure oriented decomposition algorithm and a neural network based on LVQ algorithm. What is new in our system is the combination of the ability wavelet analysis o ers in the selective enhancement either of main feature characteristics or ne details with the identication of an object with the curvature function of its boundary. Feature characteristics determine recognition accuracy. These should be insensitive to the character variations that do not a ect the correct recognition by humans and sensitive to the character variations that lead to an incorrect recognition by humans. Humans have the ability to recognize isolated characters with an accuracy of 97%.1 Robustness is satisfactory when the feature characteristics enable us to recognise characters of various writing styles e ectively without the need of training for every individual style. This is top-priority requirement especially in handwritten documents where the variation of writing styles is large even among the same persons. Theory presented in the following section and experimental evidence presented in section 5 prove that curvature is an accurate feature characteristic. On the other hand there is enough evidence that the human vision decomposes every image into several spatially oriented frequency channels each having approximately the same width on a logarithmic scale and that the visual information in di erent frequency bands is processed separately.9 If we take as example the human vision and the way individuals recognise various letters we can see that rst they get a general idea of the shape of the letter and then they look for local details that will distinguish the particular letter among others similar. For example letter 'C' is similar with letter 'G' but what distinguishes these two letters are the local details in letter 'G'. Fourier descriptors were used extensively in OCR because in a way implement the previous idea of human vision mechanism. Indeed similar letters have similar low-frequency components. In other words low-frequency components reect the basic shape while the high-frequency components represent the details of the character.15,10,12,4 The only di erence between the human vision and Fourier descriptors is that the Fourier descriptors work globally, while the human vision both globally and locally. From 1988 and later wavelet analysis has become very popular among scientists working in signal processing. Wavelet transform has the major advantage over the Fourier transform, since it resembles the human vision.9 Our OCR system is consisted by 3 subsystems. The rst accomplishes curvature extraction of the contour of the character. The second implements DBWT and the last one is a nearest neighbour classier which gives the desired recognition. Experimental evidence gave us high accuracy levels. On the other hand feature combination of the present form of our system with systems using other feature characteristics such Euler number, number of holes, etc. will lead to a complete system. We plan to develop such a system in an o -line OCR-system for Byzantine hand-written documents and books where all characters are segmented. 2 MATHEMATICAL PRELIMINARIES In 1992 Cohen, Daubechies and Feauveau in3 developed biorthogonal multiresolution structure (BMRA). It is beyond the purpose of the present paper to present a complete p account of their results. We restrict ourselves in a brief account of the results we use. First set Df(t) = 2f(2t) , where f is an arbitrary square-integrable function . It is easy to see that D is a unitary operator. Consider the trigonometric polynomials m and m~ satisfying m(t)m(t) ~ + m(t + )m(t ~ + )=1 (1) We can select pairs m and m~ which dene compactly supported scaling functions and ~ constructed as in the orthonormal case by means of the formulas 1 Y ^() = (2 );1=2 m(2;j ) (2) 1 Y b~() = (2 );1=2 m(2 ~ ;j ) (3) j =1 j =1 such that both f0n : n 2 Zg and f~0n : n 2 Zg are Riesz bases for the subspaces they generate and satisfy the biorthogonality condition < 0n ~0m >= nm . If we set V0 for the closed linear span of f0n : n 2 Zg and V~0 for the closed linear span of f~0n : n 2 Zg , Vj = Dj (V0 ) and V~j = Dj (V~0 ), then fVj gj and fV~j gj are multiresolution analyses. If Pj is the projection onto Vj along V~j? then it is a bounded operator and for every square-integrable function f we have limj !1 Pj f = f. There exist also functions and ~ such that f0n : n 2 Zg and f~0n : n 2 Zg are Riesz bases for the subspaces they generate, W0 and W~ 0 respectively and satisfy the following biorthogonality conditions: < 0n ~0m >= nm < 0n ~0m >= 0 We also have V1 = V0 +W0 and V~1 = V~0 + W~ 0 . Both sums are direct but not necessarily orthogonal. In addition W0 is orthogonal to V~0 and W~ 0 is orthogonal to V0 . The latter direct sums are associated with decomposition and exact reconstruction scheme implemented by a lter bank. Indeed assume f 2 V1 . Then f = f1 + f2 , where f1 belongs to V0 and f2 belongs to W0 . If in addition f is compactly supported then it can be represented by a nite linear combination of the basis elements of V1 . Decomposition algorithm implemented by two lters as in gure 1 give the coecients of the nite linear combinations of the basis elements of V0 and W0 which represent f1 and f2 respectively. The discrete Fourier transform of lter h1 is the trigonometric polynomial ei: m(: ~ + ), while for h0 is the trigonometric polynomial m (see equation 1). Biorthogonality properties of f0n : n 2 Zg, f~0n : n 2 Zg, f0n : n 2 Zg and f~0n : n 2 Zg prove decomposition and reconstruction algorithms. For details on the subject one can refer to3 and to sections 3.12.1, 3.12.2.6 Some of the results listed here follow directly from the results in3 and their proofs. h1 2 h0 2 stage 1 h1 2 h0 2 stage 2 h1 2 h0 2 stage J Figure 1: Octave-band lter bank with J stages implementing Decomposition algorithm. Curvature is a main characteristic of space curves. When a curve is moved even without changing its shape the equations representing the coordinate functions of the curve change. This was already known two centuries ago and it was conjectured if there exist equations representing a space curve independently of the coordinate system, equivalently independently of the position of the curve in the three dimensional space. We need the system we built to be able (at least in the abstract sense) to recognize a character by its contour, which is a closed plane curve, thus we need a coordinate-independent description of the contour of each character. This led us to use the intrinsic equations of a curve and the following theorem which is a milestone of classical di erential geometry and is due to F. Gauss. Theorem 2.1. (Fundamental theorem for space curves) If two single-valued continuous functions k(s) and (s), s > 0, are given, then there exists one and only one space curve, determined but for its position in space for which s is the arc-length (measured from an appropriate point on the curve), k is the curvature function and is the torsion. The equations k = k(s) and = (s) are the intrinsic equations of a space curve. Plane curves have no torsion. Applying the fundamental theorem to character-contours we obtain that these are characterized completely by their curvature function parameterised by the arc-length. Let x be a plane curve parametrised by its arc-length s. If '(s) is the angle between the tangent at x(s) and the positive horizontal half-axis, then the curvature k(s) at x(s) is given by k(s) = d' ds (s). For a complete introduction to our di erential geometry preliminaries and a proof of the fundamental theorem one can refer to the classical book13 of Dirk Struik. 3 PREPROCESSING Each character is identied with its contour. We describe the contour extraction procedure later. Every such contour is a closed curve with length thus by the theorem 2.1 is characterized completely by its curvature function. This is the principal feature characteristic we use for every character. As we have already noticed it is independent of the position of the character, thus we do not have to deal with the coordinate functions of each character contour, its rotations and translations. An additional advantage is that we identify a two-dimensional object with an one-parameter real valued function. This also reduces computational complexity. More precisely we use a piecewise linear approximation of the contour of each character (training or test) from which we extract a ne approximation of the curvature function. This procedure will be described next in detail. On the other hand some authors14 use the 2-dimensional representation of the character and Wavelet transform for each coordinate function. Others10,12 consider curves as subsets of the complex plane and use Fourier transform in the complex domain. First we approximate the contour by tracing the boundary11 and then we code it using Freeman's chain code.5 Next we describe chain coding procedure. On a rectangular grid a pixel is said to be 4-connected or 8-connected when it has the same properties as one of its nearest four or eight neighbors.7 2 3 1 4 0 5 7 6 Figure 2: Directions corresponding to an 8-connected chain code. The chain code can be consider as a special kind of piecewise linear approximation of a curve in which all linear p segments have length 1 or 2 and all angular changes are multiples of a certain angle (45 degrees in 8-connected paths). In chain coding we encode the direction vectors between successive boundary pixels. Usually in chain coding if we consider 8-connected paths, we employ eight directions, which can be coded in 3-bit code words. It is clear that the chain code depends on the pixel of the contour selected as starting pixel. In the case of 8-connected chain code (gure 2), segments having code 0,1,7 (or 3,4,5) contribute to character's width, whereas segments having code 1,2,3 (or 5,6,7) contribute to character's height. For a certain segment of a curve described with chain code, with length s pixels, the coordinate wise di erences wi;s, hi;s between the starting pixel pi;s and ending pixel pi of the segment are determined by taking the chain code cj between these two points and the following algorithm: wi;s = hi;s = X j X j Dwj where Dwj = +1 if cj = 0 1 7 ;1 if cj = 3 4 5 (4) Dhj where Dhj = +1 if cj = 1 2 3 ;1 if cj = 5 6 7 (5) The length of the arc of the contour at a given pixel is set equal to the number of pixels contained in the contour between the starting pixel and the given one. 3.1 Curvature extraction and normalization The curvature function for every closed plane curve is periodic. By segment length we mean a xed number of consecutive pixels. We use the chain code of a given contour to approximate the curvature function. In order to approximate the curvature of the contour we must select a segment length in which we will evaluate the curvature function k(s) dened in section 2 (gure 3). The segment length determines how smooth is the curvature function. Greater segment lengths give smoother approximations of the curvature function. In the special case where the segment length is 1 pixel the curvature changes are multiples of 45 degrees. Characters having di erent size, thus contours of di erent length, have chain codes with di erent length. We must select such a segment length to evaluate the curvature that optimises the removal of the noise due to chain quantization, while at the same time preserves signicant ne detail and is scale invariant. We select s = N=8 as segment length to evaluate the curvature function, where N is the total number of pixels found during boundary tracing. ( xi ( xi − s , − 1, yi − 1) (x i , yi ) yi − s) (x i ei − 1−s, yi − 1 − s ) ei − 1 Figure 3: Tangent values ei;s and ei;s;1 with segment size s = 6. Next we describe the procedure of the approximation of the curvature function. Suppose we start from point pi;s, and the selected segment has length equal to s pixels. Following the chain code from point pi;s for s pixels to the point pi using formulas (4) and (5) we estimate the curvature at pi;s by the following equation: hi;s ; arctan hi;1;s ki;s = arctan w w i;s i;1;s i = 0 1 :::N ; 1 It follows easily that this procedure gives a ne approximation of the curvature function of the given contour. We call this procedure curvature extraction. The input vector for the DBWT (Discrete Biorthogonal Wavelet Transform) is the vector which in each coordinate takes the estimated value of the curvature of the contour at the corresponding pixel by the previous equation. Since the size of the input vector determines the high-resolution level from which the DBWT resumes, we need all our input-vectors to have the same size. We use input-vectors of N = 512 coordinates for our DBWT. If after the curvature extraction we have an input-vector with less than 512 coordinates we create another inputvector of exactly 512 coordinates by linear interpolation. We call this procedure curvature normalization. The description of the contour of every character via the curvature function has the signicant advantages which we already mentioned. However the selection of the starting pixel of the chain code of the contour gives shifted versions of the approximations of the curvature function. To deal with the particular diculty we adopt a uniform selection rule for the starting pixel during chain coding. We select as starting pixel the rst bottom pixel when scanning from left to right. 3.2 Energy normalization Many problems arise from the size of the letters and the quality of their quantization. Di erent sized letters with the same quality of quantization may have the same curvature function, because of the variable segment size we use extract the curvature function. Independently of the size of the character, if the quantization is sparse we will have peaks, which do not correspond to real character features. These peaks a ect signicantly the absolute value of the wavelet coecients. On the other hand if we have well-quantized characters the presence of the peaks in the approximation of the curvature function is reduced. In this case peaks correspond to important character features we want to detect. Our aim is to develop an eventually insensitive to quantization quality system for feature extraction. The peaks of the curvature function increase signicantly the total energy of the curvature function. Since our classier uses Euclidian distance which measures the energy of the di erence between two vectors representing curvature functions, we need to bring all vectors to the same energy level. This procedure is called energy normalization. We accomplish this in the following way: The energy E of the curvature function of every character is given by the following equation: NX ;1 E = N1 j k(i) j2 i=0 The values of curvature function must be scaled in order the curvature function's energy to bepequal to the energy level Et. This is done by multiplying each value of the curvature function with the value Et=E. This makes classication more accurate. 4 FEATURE EXTRACTION AND CLASSIFICATION Wavelet representation by DBWT of curvature function has the additional advantage that variations in the shape of the curve will cause only minor changes in the wavelet representation. Indeed experimental results show that wavelet representation is robust and insensitive to noise. In addition we need only a small training set for high recognition accuracy (see next section). The curvature function after the energy normalization is decomposed by means of biorthogonal wavelet transform. In our experiments we used Biorthogonal Daubechies pairs (3,9). We decompose every input vector in 9 resolution levels but for the classication we use only the 5 coarsest levels (gure 4). Even if we use the 4 coarsest levels we do not experience a signicant accuracy loss. The output vectors of the 5 coarsest levels constitute a single vector of 32 coordinates. These vectors corresponding either to training or test characters are inputs in the classication subsystem which is the last component d) a) b) c) e) d) a) b) c) e) Figure 4: Wavelet analysis of two characters. a) Contour description. b) Curvature function. c) Curvature function interpolated to 512 points. d) Curvature function after energy normalization. e) Biorthogonal wavelet coecients. of our system. For classication we use an LVQ (Learning Vector Quantization) classier.8 LVQ is an auto-associative nearest neighbor classier that classies arbitrary m patterns into p classes using an error-correction encoding procedure. 5 EXPERIMENTS For our experiments we use a sample dataset of characters provided for free by NIST. The dataset was taken from 49 di erent persons and contains 2870 distinct samples of handwritten digits (gure 5). The training set used was usually about the 1/8 of the whole dataset. Training set and test set were disjoint. Figure 5: A random sample of the characters used. Experiment Training set Test set Accuracy in training set Accuracy in test set 1 317 2554 98.42% 93.56% 2 306 2564 98.37% 93.71% 3 976 1894 97.95% 95.62% 4 365 2505 98.36% 93.87% 5 362 2508 97.24% 92.30% 6 362 2508 96.97% 86.84% In experiments 1-3 we take as training set randomly selected samples from the whole dataset of the 49 persons, while in experiments 4-6 we take samples from the rst 10-12 persons. The accuracy results which are almost independent of the selection of the training set and the number of persons, exhibit that wavelet representation is robust, insensitive to noise and writing style, and requiring only a small training set for good recognition accuracy. Experiment 6 reveals the superiority of the representation of the character by its curvature function comparing to the representation of the contour as a 2-dimensional parametric curve. In this experiment we loose about 5% in accuracy if we use the same wavelet decompositions, for each coordinate representation and the curvature function. The feature extraction subsystem takes in account only one characteristic, the DBWT coecients. Geometrical features like Euler number, or the number of holes of a character have been ignored. It is important to notice that such features do not characterize all elements of the dataset representing a single character. For example a character may be thick enough that no holes exist. 6 CONCLUSIONS The proposed OCR system has the advantage of being (at least in the abstract sense) independent of character translations and rotations. However a uniform rule for the selection of the starting pixel for the contour extraction procedure, does not allow us to get full benet from its translation and rotation invariance properties. Curvature representation of characters being a real-valued function is signicantly more ecient comparing to 2-dimensional representations of character contours. Experimental results are very encouraging considering that the size of the database is small while the character variations are large. Wavelet analysis proved to be robust and insensitive to noise and writing style. The accuracy was very high even in cases of small training sets used comparing to the whole dataset. In other words the system succeeded even with unfamiliar writing styles. Wavelet analysis of the curvature function seems to capture e ectively main feature characteristics in coarser scales. This allows us to represent characters with small sized vectors, which lead to an e ective classication with an LVQ type classier. In our future work we will experiment with wavelet packets which may increase the accuracy and robustness of the system. Morphological and geometrical features e.g. Euler number and features associated with character's skeletons, will be considered as supplementary characteristics in order to improve accuracy. Neural network classication will be used in every level of the wavelet decomposition, from coarsest scales to ner ones. For each the classier will reject mismatching characters. This will result faster and more accurate classication. We also expect very low levels of misclassication. This is of particular importance for an OCR system, because rejections are interpreted as errors, while misclassications pass undetected.2 ACKNOWLEDGMENT: The authors thank Professor N. Kalouptsidis (Department of Informatics, University of Athens, Greece) for the constant support, encouragement and helpful discussions during the preparation of the present paper. 7 REFERENCES 1] Barry A. Blesser, Theodore T. Kuklinski, and Robert J. Shillman, \Empirical Tests for Feature Selection Based on a Psychological Theory of Character Recognition", Pattern Recognition, Vol. 8, pp. 77-85, 1976. 2] Mindy Bokser, \Omnidocument Technologies", Proceedings of the IEEE, Vol. 80, No. 7, pp. 1066-1078, July 1992. 3] A. Cohen, I. Daubechies and J.-C. Feauveau, \Biorthogonal bases of compactly supported wavelets", Comm. Pure and Appl. Mathematics, Vol. XLV (1992), pp. 485-560. 4] Thomas R.Crimmins, \A Complete Set of Fourier Descriptors for Two-Dimensional Shapes", IEEE Transactions on Systems, Man and Cybernetics, Vol. SMC-12, No. 6, pp. 848-855, November/December 1982. 5] H. Freeman, \Computer processing of line-drawing images", ACM Computer Surveys 6, pp. 57-97, March 1974. 6] M. Holschneider, Wavelets. An analysis tool, 1995, Oxford science publications. 7] Anil K. Jain, Fundamentals of Digital Image Processing 8] Teuvo Kohonen, \The self-organizing map", Proceedings of the IEEE, 78 (9), pp. 1464-1480, 1990. 9] Stephane G. Mallat, \Multifrequency Channel Decompositions of Images and Wavelet Models", IEEE Transactions of Acoustics, Speech and Signal Processing, Vol. 37, No. 12, December 1989. 10] Eric Persoon, and King-Sun Fu, \Shape Discrimination Using Fourier Descriptors", IEEE Transactions on Systems, Man and Cybernetics, Vol.SMC-7, No. 3, pp. 170-179, March 1977. 11] I. Pitas, Digital image processing algorithms, Prentice Hall, 1992. 12] Michael T. Y. Lai and Ching Y. Suen, \Automatic Recognition of Characters by Fourier Descriptors and Boundary Line Encodings", Pattern Recognition, Vol. 14, Nos. 1-6, pp. 383-393, 1981. 13] D.J. Struik, Lectures on classical dierential geometry, 1961, Addison-Wesley Pub. Co. 14] Patrick Wunsch and Andrew F. Laine, \Wavelet Descriptors for Multiresolution Recognition of Handprinted Characters", Pattern Recognition, Vol. 28, No. 8, pp. 1237-1249, 1995. 15] Charls T. Zahn and Ralph Z. Roskies, \Fourier Descriptors for Plane Closed Curves", IEEE Transactions on Computers, Vol. c-21, No. 4, pp. 269-281, March 1972. Mailing addresses: George S. Kapogiannopoulos 18 Megdoba str. GR-13122 ILION HELLAS (GREECE) email: [email protected] Manos Papadakis 93 Abidou str. GR-15771 Zografou HELLAS (GREECE) email: [email protected]
© Copyright 2026 Paperzz