Font Identification Using the Grating Cell Texture Operator Huanfeng Ma and David Doermann Language and Media Processing Laboratory Institute for Advanced Computer Studies University of Maryland, College Park, MD 20742, USA ABSTRACT In this paper, a new feature extraction operator, the grating cell operator, is applied to analyze the texture features and classify fonts of scanned document images. This operator is compared with the isotropic Gabor filter which was also employed for font classification. In order to improve the performance, a back-propagation neural network (BPNN) classifier was applied and compared with the simple weighted Euclidean distance (WED) classifier. Experimental results for five fonts of three scripts show that the grating cell operator performs better than the isotropic Gabor filter, and the BPNN classifier can provide more accurate classification results than the WED classifier. Keywords: Grating Cell Operator, Gabor Filter, Back-propagation Neural Network (BPNN), Font Identification, Document Analysis, Optical Character Recognition (OCR) 1. INTRODUCTION In document analysis, font classification is an important challenge. With the availability of an accurate font classifier, an OCR system can be designed as a combination of multiple single-font recognizers and improve the performance.1, 2 Unlike font classification approaches based on typographical features extracted by means of local attribute analysis,3–7 the approach proposed by Zhu et al.8, 9 performed the font (including font face and font style) identification based on the global texture features extracted using an isotropic Gabor filter bank. Their approach achieved high accuracy for font style identification of computer-generated digital binary images. However, because of the difference of computer-generated images and scanned document images, the same approach only gave measurable performance for scanned document images, and the performance of the identification dropped significantly especially when the brightness and contrast of images vary significantly. According to the analysis in this paper, one reason for this performance dropping down is that the approach is able to capture the global font attributes but less apt to distinguish finer typographical attributes. Another reason is that the classifier is a simple weighted Euclidean distance (WED) classifier which is not effective in a 32-dimensional feature space. Considering that no local analysis is required and that it is easily tuned to work on different scripts and different fonts, font identification algorithms that use global texture features are promising. In order to improve the classification performance, however, we employed a new type of texture operator which is called grating cell operator to extract the texture features and replaced the WED classifier with a back propagation neural network (BPNN) classifier. The grating cell texture operator is also based on the Gabor filter bank but constructed differently from the isotropic Gabor filter bank operator. This new operator was inspired by the function of a recently discovered orientation-selective neuron type in areas V1 and V2 of the visual cortex of monkeys, called the grating cell.10, 11 Similar to other orientation-selective neurons, grating cells respond vigorously to a grating of bars of appropriate orientation, position and periodicity. But unlike other orientation-selective cells, grating cells respond weakly or do not respond at all to single bars. As an operator to extract texture features, Petkov and Kruizinga12, 13 show that the grating cell operator is very competitive in texture detection and segmentation. Further author information: (Send correspondence to Huanfeng Ma) Huanfeng Ma: E-mail: [email protected] David Doermann: E-mail: [email protected] The construction of this operator is described in Section 3.2. Detailed presentation is available in Petkov and Kruizinga’s paper.12, 13 The remainder of this paper is organized as the follows: In Section 2, preprocessing is described. Section 3 addresses how to extract the texture features and introduces the design of the grating cell operator. In Section 4, we describe how to design the classifier. Experimental results and conclusion are provided in Sections 5 and 6 respectively. 2. DOCUMENT IMAGE PREPROCESSING Before being fed to the texture operator to extract features, we must perform the following preprocessing to the input document images. • Deskewing. Deskewing makes the detection of baseline of a word and the construction of the text line easier and more accurate. However, for feature extraction, the Gabor filter texture operator in general can capture the texture features in different orientations, it is not necessary to deskew the input images. • Extra word spacing removal. Words are extracted from the text zone of the input document images. The extracted words are then merged into one single text line where the spacing between two adjacent words is set to the average spacing between two characters. The alignment of words in the vertical direction is based on the detected baseline of each word. • Line spacing removal. The text line from above is cut into multiple text lines with length N (N = 256 in our experiments). These text lines are merged into a text block with width N . The line spacing is set to zero, and each line has the same height as that of the single text line obtained in the above step. • Text block normalization. In this step, the text block with width N from above is divided into multiple text blocks with size N × N . The texture operator is applied to these normalized images. 3. FEATURE EXTRACTION After the preprocessing, each input document image is divided into multiple images with fixed size and the extra space removed. In the texture extraction step, two texture operators are applied to the normalized images. These two operators are describe as follows. (a) (b) Figure 1. Spatial-frequency responses of two Gabor filter banks used to extract the texture features. (a) Isotropic Gabor filter; (b) Gabor filter used for the grating cell operator. 3.1. Isotropic Gabor filter The first texture operator is the 2D isotropic Gabor filter operator proposed by Zhu et al.,8 whose computational model is: 1 x2 + y 2 1 ge (x, y) = √ · exp(− ) · cos[2π (x · cosθ+y · sinθ)] 2 2 2σ λ 2πσ go (x, y) = √ 1 · exp(− 1 x2 + y 2 ) · sin[2π (x · cosθ+y · sinθ)] 2σ 2 λ 2πσ 2 where ge and go are the even-symmetric and odd-symmetric Gabor filters. λ, θ and σ are the spatial wavelength, orientation and space constant of the Gabor envelope. In our experiments, normalized image have size 256 × 256 and four values of spatial wavelengths are selected: 1.56, 3.12, 6.24, 12.48. The combination of these wavelengths with four selected values of θ (0◦ , 45◦ , 90◦ , 135◦ ) give a total of 16 Gabor channels. The standard deviation σ of the Gaussian envelope is selected to satisfy σ/λ = 0.56. The spatial frequency response of this operator is shown in Figure 1(a). The response output of an image I(x,y) after applying the Gabor filter operator is defined as: Z Z ∗ Gmn (x, y) = I(s, t)gmn (x − s, y − t)dsdt (1) where * indicates the complex conjugate. 3.2. Grating cell operator The grating cell operator is also based on a 2D Gabor filter bank. The following family of 2D Gabor functions are used to model the spatial summation properties of a simple cell: gλ,θ,ϕ (x, y) = √ 1 2πσ 2 · exp(− x02 + γ 2 y 02 x0 ) · cos(2π + ϕ) 2 2σ λ (2) where, x0 = x · cosθ − y · sinθ y 0 = x · sinθ + y · cosθ (3) (4) In equations 2, 3 and 4, the standard deviation σ of the Gaussian envelope determines the size of the receptive field. The receptive filed ellipse is determined by the parameter γ, which is called the spatial aspect ratio. It has been found that the value of γ varies in a limited range (0.23, 0.92),14 and in our experiments, γ has value 0.5. Similar to the isotropic Gabor filter, the parameter λ determines the preferred spatial frequency (1/λ) of the receptive field function. The ratio σ/λ determines the spatial frequency bandwidth of a linear filter. We chose σ/λ = 0.56 as suggested in Petkov and Kruizinga’s paper.12, 13 The parameter θ specifies the orientation and the parameter ϕ (ϕ ∈ (−π, π]) is a phase offset and determines the symmetry of the function g(x, y). Choosing 3 values (1.56, 3.12, 6.24) for λ and 8 values (0◦ , 22.5◦ , 45◦ , 67.5◦ , 90◦ , 112.5◦ , 135◦ , 157.5◦ ) for θ, the spatial frequency response of this Gabor filter bank is shown in Figure 1(b). Using the above parameterization, the response sλ,θ,ϕ (x, y) of a simple cell modeled by a receptive field function gλ,θ,ϕ (x, y) to an input image I(x, y) can be computed as follows. 0 ! if aλ (x, y) = 0 sλ,θ,ϕ (x, y) = χ rλ,θ,ϕ (x,y) R aλ (x,y) rλ,θ,ϕ (x,y) aλ (x,y) otherwise +C where χ(z) = 0 for z < 0, χ(z) = z for z ≥ 0, R is the maximum response level and C is the semisaturation constant. rλ,θ,ϕ (x, y) and aλ (x, y) are computed as follows: Z Z rλ,θ,ϕ (x, y) = I(s, t)gλ,θ,ϕ (x − s, y − t)dsdt Z Z 1 (x − s)2 + γ 2 (y − t)2 I(s, t)exp − dsdt 2σ 2 2πσ A quantity qλ,θ (x, y), called the activity of a grating subunit with preferred orientation θ and preferred grating periodicity λ is computed as follows: 1 if ∀n, Mλ,θ,n (x, y) ≥ ρMλ,θ (x, y) qλ,θ (x, y) = (5) 0 if ∃n, Mλ,θ,n (x, y) < ρMλ,θ (x, y) aλ (x, y) = √ where ρ is a threshold parameter with a value smaller than but near one. The auxiliary quantities Mλ,θ,n (x, y) and Mλ,θ (x, y) are computed as follows: Mλ,θ,n (x, y) = max{sλ,θ,ϕn (x0 , y 0 )} with (x0 , y 0 ) satisfying the following conditions: n(λ/2)cosθ ≤ (x0 − x) < (n + 1)(λ/2)cosθ n(λ/2)sinθ ≤ (y 0 − y) < (n + 1)(λ/2)sinθ and ϕn takes values as follows: ϕn = 0 π n = −3, −1, 1 n = −2, 0, 2 Mλ,θ (x, y) = max{Mλ,θ,n (x, y)|n = −3, ..., 2} In the next stage, as the final output of the grating cell texture operator, the response Wλ,θ is computed as: Z Z 1 (x − s)2 + (y − t)2 Wλ,θ (x, y) = √ (qλ,θ (s, t) + qλ,θ+π (s, t)) · exp − dsdt (6) 2(βσ)2 2πσ where β takes value 5. 3.3. Parameter selection We use an image containing four different Cyrillic fonts to select the parameter values for the grating cell operator. Because of the parameter γ in expression 2 has value 0.5, different from the isotropic Gabor filter bank operator, we chose 8 directions for the grating cell operator to cover all possible feature orientations. And there are only 3 spatial wavelengths (1.56, 3.12, 6.24) chosen for the grating cell operator (Figure 1(b) shows the spatial-frequency response of this filter bank). Figure 2 shows there is no significant difference between the segmentation result using 4 spatial wavelengths and the result using 3 spatial wavelengths. Petkov and Kruizinga12, 13 suggested the value of ρ in expression 5 should be a value smaller than but near one (such as 0.9). However, in our experiments, we found a value in range [0.4, 0.7] is more appropriate for the analysis of document images. So in our experiments, the value of ρ is chosen 0.5. Although we fixed the chosen spatial wavelengths for the texture operator based on the experimental results, the values of wavelengths can be set dynamic which could be decided by the resolution of document images and the average size of words in the images. 3.4. Feature representation The final feature vector used for font classification is constructed on the output of the texture operator. For the output of each channel, we compute the mean µmn and the standard deviation σmn , and the feature vector is constructed as: x = [µ00 , σ00 , µ01 , σ01 , ..., µmn , σmn ] The feature vector for the isotropic Gabor filter operator is constructed from the output of equation 1. Four spatial wavelengths and four orientations can provide a 32-dimensional feature vector for each input image. For the grating cell operator, the feature vector is constructed from the output of equation 6, three spatial wavelengths and eight orientations provide a 48-dimensional feature vector for each input image. In order to compare the effectiveness of texture features extracted by these two texture operators, Figure 3 shows some texture segmentation examples using these two operators. The results indicate that the grating cell operator can capture more differences between two different textures than the isotropic Gabor filter operator. (a) (b) (c) (d) Figure 2. Segmentation result of four Cyrillic fonts using the K -means clustering algorithm. (a) Image with 4 different fonts; (b) Ground truth; (c) Segmentation result using 3 spatial frequencies for the grating cell operator; (d) Segmentation result using 4 spatial frequencies for the grating cell operator. (a) (b) (c) (d) (e) (f) (g) (h) (i) Figure 3. Results of texture segmentation experiments using the K -means clustering algorithm. (a) Image with 2 different textures; (b) Image with 5 different textures; (c) Image with 9 different textures; (d)(e)(f) Segmentation results using the grating cell operator (K =2,5,9); (g)(h)(i) Segmentation results using the isotropic Gabor filter (K =2,5,9). 4. CLASSIFIER DESIGN The first classifier used for the font classification is a simple weighted Euclidean distance (WED) classifier. For each test sample, the classification is based on the distance between this sample and each class. Suppose the feature vector is in a d -dimensional space (for isotropic Gabor filter operator, d =32; for the grating cell operator, d =48), and the computed mean and standard deviation feature vectors for class i are µ(i) , α(i) , where i = 1...M and M is the number of classes. Then for each test sample x ∈ Rd , the distance between this sample and each class is computed using the following formula: d (i) X x − µ k k d(i) (x) = α(i) k=1 k i = 1...M The second classifier is a three-layer fully-connected feedforward neural network classifier, using a back propagation weight tuning method (BPNN). The details of this classifier can be found in Duda et al.’s book.15 For this classifier, the input layer has the same number of units as the dimension of the input feature vector. The output layer has the same number of units as the font classes. The hidden layer always has the same number of units as the input layer. So the BPNN classifier constructed for the isotropic Gabor filter features has 32 input units and 32 hidden units; while the BPNN classifier constructed for the grating cell features has 48 input units and 48 hidden units. 5. EXPERIMENTAL RESULTS To test the effectiveness of these two operators for font classification, we created an artificial Cyrillic text image with 4 blocks of different fonts. As with images in Figure 3, the K -means clustering (K =4) algorithm based on the extracted texture features gives the segmentation result shown in Figure 4. It is obvious that the grating cell operator provides much better result than the isotropic Gabor filter operator. (a) (b) (c) (d) Figure 4. Results of font segmentation experiment using the K -means clustering algorithm. (a) Image with 4 different fonts; (b) Ground truth; (c) Segmentation result using the grating cell operator; (d) Segmentation result using the isotropic Gabor filter. The proposed approach was applied to the scanned documents of 3 different scripts (Latin, Greek and Cyrillic), where each script has 5 commonly used fonts Arial(AR), Century Gothic(CG), Comic Sans MS(CSM), Courier New(CN) and Times New Roman(TNR). Classification results are as follows: 5.1. Font classification within script This experiment was used to test the font classification for the same script. The results for each script are shown in Tables 1, 2 and 3. 5.2. Style classification within font Because of the visual difference between different styles of the same font, we believe that the grating cell operator can also be applied to extract texture features for font style classification. Figure 5(a) shows an image which contains three parts, where each part represents one style (bold, italics and normal) of the Times New Roman font of the Cyrillic script. Figure 5(c) shows the image segmentation result using the K -means clustering algorithm based on texture features extracted using the grating cell operator, and Figure 5(b) is the ground truth for segmentation. Figure 5 shows the ability of this operator to capture texture features of different font styles. Replacing the K -means clustering approach with the BPNN classifier, we applied the approach to the classification of normal, bold and italics styles of Cyrillic Times New Roman font. The results in Table 4 further show the effectiveness of this approach to do font style classification. Table 2. Greek font classification results using different texture operators and different classifiers (IGF: Isotropic Gabor Filter, GC: Grating Cell). Table 1. Latin font classification results using different texture operators and different classifiers (IGF: Isotropic Gabor Filter, GC: Grating Cell). Operator Classifier AR CG CSM CN TNR IGF WED BPNN 93.10% 100.00% 87.50% 90.62% 93.54% 100.00% 94.00% 94.00% 96.88% 100.00% WED 93.10% 93.75% 93.54% 94.00% 96.88% Operator Classifier AR CG CSM CN TNR GC BPNN 100.00% 93.75% 93.54% 94.00% 100.00% Table 3. Cyrillic font classification results using different texture operators and different classifiers (IGF: Isotropic Gabor Filter, GC: Grating Cell). Operator Classifier AR CG CSM CN TNR IGF WED BPNN 76.84% 71.58% 66.67% 100.00% 83.19% 98.32% 36.25% 90.00% 29.46% 84.50% WED 58.82% 96.67% 66.67% 84.21% 96.67% GC BPNN 100.0% 100.0% 96.67% 97.37% 96.67% Table 4. Style classification results for Times New Roman font of Cyrillic. GC WED BPNN 60.00% 84.21% 100.00% 100.00% 73.95% 98.17% 67.5% 94.25% 53.49% 90.31% (a) IGF WED BPNN 70.59% 100.0% 96.67% 100.0% 36.67% 93.33% 76.32% 86.84% 96.67% 96.67% Styles Normal Bold Italics (b) Normal 100.00% 0.00% 0.00% Bold 0.00% 100.00% 0.00% Italics 0.00% 0.00% 100.00% (c) Figure 5. Results of font style segmentation experiment using the K -means clustering algorithm. (a) Image with 3 different styles of the same font; (b) Ground truth; (c) Segmentation result using the grating cell operator. 5.3. OCR performance comparison As we mentioned in Section 1, as a preprocessor of the OCR, font classification can help improve the OCR performance. Using a recognizer based on the Zernike moment,16 we evaluated the accuracy of OCR on five different fonts. The results are shown in Table 5. The recognizer with no font classification is trained using the mixed training set from five different fonts, and the final classifier is a Nearest-Neighbor classifier. The OCR with font classifier has five recognizers built for five fonts respectively, and the final classifier is also a Nearest-Neighbor classifier. The two OCR systems have the same training set. The results in Table 5 show that the OCR with font classification can improve the performance. However, the improvement is not significant. One reason is that the testing documents have such good qualities that the recognition results without font identification are good enough. To test the performance improvement by font classification, we added noise to the collection of the Latin documents by photo-copying the original documents many times. Table 6 shows the OCR results working at two different noise levels, which show that the font identification is more important for recognition of noisy documents than for recognition of relatively clean documents. 5.4. Experimental result analysis The results in Tables 1, 2 and 3 show that the grating cell operator can more accurately capture texture features of different fonts. According to Petkov and Kruizinga’s experiments,12, 13 the grating cell operator detects only texture and does not respond to other image attributes, while traditional Gabor filter operator responds to both texture and other image attributes such as edges. Because of the characteristics of document images, strokes of Table 6. OCR results for Latin script with more noise (where “WTFC” means “without font classification” and “WFC” means “with font classification”). Table 5. OCR results for Latin, Greek and Cyrillic scripts with and without font classification (where “WTFC” means “without font classification” and “WFC” means “with font classification”). Script Font AR CG CSM CN TNR Latin WTFC WFC 98.13% 98.28% 95.36% 96.17% 96.63% 96.63% 95.31% 96.67% 96.33% 96.37% Greek WTFC WFC 98.37% 98.48% 94.86% 94.92% 95.59% 96.63% 93.85% 93.91% 94.19% 94.22% Cyrillic WTFC WFC 91.41% 93.94% 79.25% 89.34% 85.89% 87.22% 97.48% 98.35% 91.32% 93.97% Noise Font AR CG CSM CN TNR 1 WTFC 88.95% 91.60% 84.44% 86.12% 90.05% 2 WFC 93.76% 96.16% 93.33% 96.18% 96.23% WTFC 78.88% 81.77% 81.05% 80.37% 73.52% WFC 84.64% 91.6% 84.44% 89.41% 79.09% characters of different fonts often create a series of gratings with different patterns, which make the grating cell operator more effective to extract these pattern features. In Section 1, we mentioned that a simple classifier (such as WED) can not provide good performance for scanned documents, especially when there are two different fonts that have close texture features, a relatively complex classifier is required. Figures 6(a) and 6(c) show the images of five Cyrillic fonts and the segmentation result using the K -means clustering algorithm. It can be seen that only four of them can be correctly segmented. However, after removing the center part which represents the font Comic Sans MS (figure 6(d)) and using the same parameters to extract the features, the remaining four parts can be correctly segmented using the same K -means clustering algorithm (figure 6(f)), which implicitly means the incorrect segmentation was caused by the feature similarity of different fonts. Comparing this result with the results shown in Table 3, we can draw the conclusion that the BPNN classifier did improve the classification accuracy. (a) (b) (c) (d) (e) (f) Figure 6. Results of font segmentation for different numbers of Cyrillic fonts using the K -means clustering algorithm. (a) Image with 5 different Cyrillic fonts; (b) Ground truth for 5 fonts; (c) Segmentation result for 5 fonts; (d) Image with 4 different Cyrillic fonts; (e) Ground truth for 4 fonts; (f) Segmentation result for 4 fonts. In all the experiments, we assumed that each document page has a single font or style, which obviously doesn’t model all the cases. However, the segmentation results in Figure 6 and 4 show this approach still works for pages with multiple fonts as long as the content with the same font can be easily organized into one single block. 6. CONCLUSION The results shown in Section 5 indicate that the grating cell operator usually can capture more texture differences between different fonts than the isotropic Gabor filter operator, while the BPNN classifier can provide better performance than the simple WED classifier. Therefore the combination of the grating cell and BPNN classifier provides the highest performance. Compared with the isotropic Gabor filter operator, one drawback of the grating cell operator is the speed. If the Gabor filter bank in the isotropic Gabor filter operator and the grating cell operator have the same number of wavelengths (S) and same number of orientations (M ), the input image has size N ×N , and we use the FFT and IFFT to compute the convolution, the complexity of the isotropic Gabor filter operator is about O(2SM N 2 logN ), while the complexity of the grating cell operator is about O(4SM N 2 logN + 2SN 2 logN + CSM N 2 ), where C is a number that is dependent on the wavelength and larger than 2. 7. ACKNOWLEDGMENT The support of this research under DARPA cooperative agreement N660010028910, National Science Foundation grant EIA0130422, CASL contract ODOD.021 and DOD contract MDA90402C0406 is gratefully acknowledged. Thanks Gary Gang Zi for providing the ground truth file for the evaluation of OCR results. REFERENCES 1. S. Kahan, T. Pavlidis, and H. Baird, “On the recognition of printed characters of any font or size,” IEEE Transactions on Pattern Analysis and Machine Intelligence 9(2), pp. 274–288, 1987. 2. H. S. Baird and G. Nagy, “A self-correcting 100-font classifier,” in Proc. SPIE Symp. on Electronic Imaging: Science and Technology, (San Jose, CA), 1994. 3. S. Khoubyari and J. Hull, “Font and function word identification in document recognition,” Computer Vision and Image Understanding 63(1), pp. 66–74, 1996. 4. R. Cooperman, “Producing good font attribute determination using error-prone information,” The International Society for Optical Engineering Journal 3027, pp. 50–57, 1997. 5. H. Shi and T. Pavlidis, “Font recognition and contextual processing for more accurate text recognition,” in The Fourth International Conference on Document Analysis and Recognition (ICDAR’97), pp. 39–44, 1997. 6. A. Zramdini and R. Ingold, “Optical font recognition using typographical features,” IEEE Trans. Pattern Analysis and Machine Intelligence 20(8), pp. 877–882, 1998. 7. A. Schreyer, P. Suda, and G. Maderlechner, “Font style detection in documents using textons,” in The Third Document Analysis System Workshop, Assoc. for Pattern Recognition, 1998. 8. Y. Zhu, T. Tan, and Y. Wang, “Font recognition based on global texture analysis,” IEEE Trans. Pattern Analysis and Machine Intelligence 23(10), pp. 1192–1200, 2001. 9. T. Tan, “Rotation invariant texture features and their use in automatic script identification,” IEEE Transactions on Pattern Analysis and Machine Intelligence 20(7), pp. 751–756, 1998. 10. R. von der Heydt, E. Peterhans, and M. Dursteler, “Grating cells in monkey visual cortex: Coding texture?,” in Channels in the Visual Nervous System: Neurophysiology, Psychophysics and Models, B. Blum, ed., pp. 53–73, Freund, London, U.K., 1991. 11. R. von der Heydt, E. Peterhans, and M. Dursteler, “Periodic-pattern-selective cells in monkey visual cortex,” J. Neuroscience 12, pp. 1416–1434, 1992. 12. N. Petkov and P. Kruizinga, “Comptational models of visual neurons specialised in the detection of periodic and aperiodic oriented visual stimuli: Bar and grating cells,” Biological Cybernetics 76, pp. 83–96, 1997. 13. P. Kruizinga and N. Petkov, “Nolinear operator for oriented texture,” IEEE Transactions on Image Processing 8(10), pp. 1395–1407, 1999. 14. J. Jones and A. Palmer, “An evaluation of the two dimensional gabor filter model of simple receptive fields in cat striate cortex,” J. of Neurophysiology 58, pp. 1233–1258, 1987. 15. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification (Second Edition), John Wiley & Sons, Inc., New York, N.Y., 2001. 16. A. Khotanzad and Y. H. Hong, “Rotation invariant image recognition using feature selected via a systematic method,” Pattern Recognition 23(10), pp. 1089–1101, 1990.
© Copyright 2026 Paperzz