Dynamic Calculation and Characterization of Threshold Value in

International Conference on challenges in IT, Engineering and Technology (ICCIET’2014) July 17-18, 2014 Phuket (Thailand)
Dynamic Calculation and Characterization of
Threshold Value in Optical Character
Recognition
Ayşe Eldem, and Fatih Başçiftçi

decided how to binarize each region [3]. A learning
framework for the optimization of the binarization methods
was developed, which was designed to determine the optimal
parameter values for a document image [4]. Text segmentation
in generic displays was learnt the best binarization values for a
commercial OCR system [5]. A local adaptive thresholding
method based on a water flow model was used, in which an
image surface was considered as a three-dimensional (3-D)
terrain [6]. A new adaptive approach was presented for the
binarization and enhancement of degraded documents. In the
approach a pre-processing procedure using a low-pass Wiener
filter, a rough estimation of foreground regions, a background
surface calculation by interpolating neighboring background
intensities, a thresholding by combining the calculated
background surface with the original image while
incorporating image up-sampling and finally a post-processing
step in order to improve the quality of text regions and
preserved stroke connectivity [2]. 40 selected thresholding
methods from various categories were compared in the
context of nondestructive testing applications as well as for
document images and the comparison was based on the
combined performance measures [7].
In this study, image was first transformed into gray and
noises on the image were eliminated by median and mean
filters. Subsequently, various thresholding algorithms were
applied for transforming into black-white. Binary image
representation is the preferred format for image analysis in
most textual image techniques, and the performance of the
subsequent steps in document analysis applications depends
on the accuracy of the binarization process [8]. In this study,
the best thresholding algorithm was determined among Otsu
[9], Sauvola [10], Niblack [11] and White [12] algorithms.
Consequently, characters separation process was applied on
the resulting image.
Abstract—Optical character recognition (OCR) can be used in
some management mechanisms of state and business world to
organize documents scanned or captured by camera. Therefore, OCR
is one of the subjects rapidly evolving in the recent times. This study
investigates the procedure steps until character recognition. That’s
because, the more correctly preprocessings are applied; the better
results can be obtained in character recognition. In this study, the
algorithms providing the best results are determined by following the
procedure stages of gray scale transformation, noise removal and
image thresholding. Binarization plays an important role in character
recognition. For this reason, algorithms dynamically determine
thresholds are preferred for character recognition in this study.
Accordingly, Otsu thresholding method was determined to give the
best results in terms of both picture quality and speed. Therefore, this
method was used for character recognition. Primarily, lines were
determined for character separation and then letters were individually
obtained.
Keywords—Image Processing, OCR, Thresholding, Binarization,
Pattern Recognition.
I. INTRODUCTION
O
PTICAL Character Recognition (OCR) is the process of
converting and digitizing documents written by a tool,
machine or hand into computer language [1]. For this reason,
optical character recognition is an important step for image
processing. That’s because, it is necessary to distinguish
characters from background and properly obtain the
characters. Otherwise, characters cannot be defined correctly.
The obtained image is first transformed into gray and the
present noises are eliminated. Then, binarization is applied.
Document image binarization (threshold selection) refers to
the conversion of a gray-scale image into a binary image [2].
These are the basic steps of image processing.
A novel binarization method was developed for document
images produced by cameras. To resolve the problem, the
proposed method divided an image into several regions and
II. MATERIAL AND METHOD
The obtained optical images were first transformed into
gray. Then, noise reduction techniques were applied to
eliminate degradations (e.g. noises) on images that could be
caused by machines. Instead of thresholding by a certain gray
level, image binarization was made by thresholding methods
dynamically determining threshold values by the condition of
gray level pixels, and thus, black and white version of the
Ayşe Eldem is with the Department of Computer Technology and
Computer Programming, Technical Science Vocational High School,
Karamanoğlu Mehmetbey University, 70100, Karaman, TURKEY.
Fatih Başçiftçi is with the Department of Computer Engineering, Faculty
of Technology, Selcuk University, 42003 Konya, TURKEY.
This work is supported by Selçuk and Karamanoğlu MehmetBey
Universities Scientific Research Projects Coordinatorships, Konya, Karaman,
Turkey.
http://dx.doi.org/10.15242/IIE.E0714006
9
International Conference on challenges in IT, Engineering and Technology (ICCIET’2014) July 17-18, 2014 Phuket (Thailand)
P(T) is the histogram of the image; mf(T) and mb(T) are the
mean intensity for the whole image [17].
image was obtained. The steps of the process are shown in
figure 1.
Obtain the
image
Preprocessing
Binarization
process
White: In this method, gray value of the pixel is compared
to the mean value of its neighbors. If its value is lower than
the average, that pixel indicates the object, and otherwise, it
shows the background [12]. This thresholding method is
formulized as follows (2).
Division into
characters
Fig.1 Process steps
A. Noise Reduction Methods
Degradation (noises) occurs due to out of focus viewfinder
during photo shooting, lens quality, very bright environment,
lack of light or objective angle [13]. Mean and median filters
are generally used to reduce this kind of noises.
1
B(i, j )  
0
Niblack: In this method, threshold value is determined with
the following equation based on local mean and standard
deviation for bxb window size (3).
T (i, j)  m(i, j)  k.s(i, j)
(3)
m(i,j) and s(i,j) are mean and standard deviation,
respectively. k is accepted as -0.2. [11][14].
Median: Median filter is another way to eliminate the
noises in an image. Similarly, a window size like 3x3 is
determined in the median filter and applied to its neighbors.
Gray level values of pixels within the window size are taken
and ordered among themselves, and then the median value is
assigned as the new pixel value.
Sauvola: It is an adaptive and threshold-based binarization
method that can be seen as a generalization of Niblack’s
method [4]. In this method, pixel density at (x,y) point is
defined as the g(x,y) function getting values between 0-255.
As can be seen in the equation below (4), if g(x,y) value is
lower than threshold, it becomes 0, and otherwise, it gets 255
[10].
B. Thresholding Methods
It is one of the most important processes in separating
background and foreground in document image binarization.
Because binarization plays a key role in document processing
since its performance affects quite critically the degree of
success in a subsequent character segmentation and
recognition [2]. In this way, image is transformed into black
and white grayscale and it becomes easier for processing.
Binarization as separation of text from background regions
has formed the foundation of many learning methods, which
start with a rough estimation of the text and background
regions, and then attempt to learn their behavior, in order to
classify regions that are in the confusion interval [4].
Thresholding algorithms depend on a multitude of factors
such as the gray level distribution of the document, local
shading effects, the presence of denser, non-text components
such as photographs, the quality of the paper etc. [14].
In this study, 4 thresholding methods were used.
 0
o(x, y)= 
255
if g(x, y) < t(x, y)
otherwise
(4)
When threshold t(x,y) is determined, the mean and standard
deviation (s(x,y)) are used. R is 128 and k is a parameter
changing between [0.2, 0.5]. The equation is given below (5).
T(i, j)= m(i, j)+[1+k.(
s(i, j)
- 1)]
R
(5)
III. APPLICATION
In this study, an image is obtained from outer environment
and subject to some preprocessing. These preprocessing
include transformation to grayscale and using mean or median
filter. In pattern recognition systems, the aim of these
preprocessing is to distinguish the pattern from undesired
components in order to obtain similar key properties from
similar patterns [18]. Following preprocessing, separation into
letters is made to determine the best methods (White, Sauvola,
Niblack and Otsu) for dynamically determining the threshold
value.
In the current study, after an image is taken, it is initially
transformed into grayscale. The gray image is obtained with
(R+G+B)/3 method as an image consists of RGB color space.
The gray form of the image is shown in figure 2.
Otsu: Otsu’s method is a powerful, parameterless method
belonging to threshold-based method [15]. When the
background and foreground intensities are well separated,
Otsu’s method yields good binarized results [3]. In Otsu
thresholding method, optimum threshold is determined
through minimizing the changes for weighted total class of the
object and background pixels [9][16]. Optimum threshold is
measured with the equation below (1).
http://dx.doi.org/10.15242/IIE.E0714006
(2)
µw(x,y), is the mean gray value of the window being
processed; bias is generally taken as 2; w is the window size
(15 in general); and I(i,j) is the value of the current pixel [16].
Means: Mean filter is one of the methods used to reduce
noises in images. It can be used with neighborhood types like
3x3 and 5x5. Each pixel value is replaced with the mean
(average) value of its neighbor pixels. For instance, the new
pixel value is assigned the mean value of 9 pixels in a 3x3
mask.
Topt  arg max[ P(T ).(1- P(T )).(m f (T ) - mb (T )) 2 ]
if mw (i, j )  I (i, j )* bias
otherwise
(1)
10
International Conference on challenges in IT, Engineering and Technology (ICCIET’2014) July 17-18, 2014 Phuket (Thailand)
(a)
(c)
(b)
(d)
Fig. 2 a) Color image b) Grayscale image
Fig. 4 a) White Thresholding b) Sauvola Thresholding
c) Niblack Thresholding d) Otsu Thresholding
Subsequently, noise removal methods are applied. For this
purpose, mean [3x3] and Median [3x3] filters were used on
the figure 3, respectively. Of these filters, mean filter blurred
the image even more. Therefore, the image subject to median
filter was used in the following stages.
In the comparison of durations of 4 thresholding methods,
Otsu method was determined to give the best performance.
Time schedule is given in Table I.
TABLE I
DURATIONS OF THRESHOLDING METHODS
Method Name
White Thresholding
Sauvola Thresholding
Niblack Thresholding
Otsu Thresholding
(a)
Time (msecond)
89 ms
119 ms
384 ms
48 ms
Initially, the rows are determined for character separation as
shown in figure 5. In this process, the image is scanned from
the beginning and the initial black point is accepted as the
start of row and the last black point is determined as the end
of row.
(b)
Figure 3. a) Mean [3x3] b) Median [3x3]
The results of the methods used for binarization following
the noise removal are given in Figure 4. Accordingly, Otsu
method was determined to give the best results in binarization
process. Therefore, Otsu thresholding was preferred character
separation.
Fig. 5 Start and end of row
Subsequently, for determining the columns, the initial
character is found by advancing from the first black point to
right side until the first column of white points is reached, and
upon finishing this letter, it is taken from the image and
stored. As can be seen in Figure 6 below, initially the letter “i”
and then “m”, “a” and “g” letters in the first row are
distinguished with image scan. The same process is repeated
for the other rows. Thus, character separation process is
completed.
(a)
(b)
(a)
http://dx.doi.org/10.15242/IIE.E0714006
11
International Conference on challenges in IT, Engineering and Technology (ICCIET’2014) July 17-18, 2014 Phuket (Thailand)
[6]
[7]
(b)
[8]
[9]
[10]
(c)
[11]
[12]
(d)
[13]
Fig. 6 Letter separation process
[14]
IV. CONCLUSION
Many applications have been developed in optical character
determination. In the current study, the process steps until
character separation are investigated. For this purpose,
primarily the obtained image was transformed into grayscale
and noise removal was applied. At this stage, median filter
was determined to give better results in image quality, and
therefore, median filter was preferred. Subsequently, 4
thresholding methods were applied, respectively, and Otsu
thresholding method provided the best fit. Otsu thresholding
method was found to be better in both image quality and
performance. Using Otsu thresholding method, the start and
end of rows were determined on black and white image and
character separation process was achieved.
[15]
[16]
[17]
[18]
REFERENCES
[1]
[2]
[3]
[4]
[5]
S. Ay, and A. Varol, “Karakter Tanıma İçin Düzenli Özellik Çıkarma
İşleminin İncelenmesi ve Uygulanması”, Bilimde Modern Yöntemler
Sempozyumu (BMYS 2008), Eskişehir Osmangazi Üniversitesi,
Eskişehir, 2008, pp. 1063-1070.
B. Gatos, I. Pratikakis, and S.J. Perantonis, “Adaptive degraded
document image binarization”, Pattern Recognition, vol. 39, pp. 317 –
327, 2006.
http://dx.doi.org/10.1016/j.patcog.2005.09.010
C. Chou, W.H. Lin, and F. Chang, “A binarization method with
learning-built rules for document images produced by cameras”, Pattern
Recognition, vol. 43, pp. 1518–1530, 2010.
http://dx.doi.org/10.1016/j.patcog.2009.10.016
M. Cheriet, R.F. Moghaddam, and R. Hedjam, “A learning framework
for the optimization and automation of document binarization methods”,
Computer Vision and Image Understanding, vol. 117, pp. 269–280,
2013.
http://dx.doi.org/10.1016/j.cviu.2012.11.003
A. Fernández-Caballero, M.T. López, and J.C. Castillo, “Display text
segmentation after learning best-fitted OCR binarization parameters”,
Expert Systems with Applications, vol. 39, pp. 4032–4043, 2012.
http://dx.doi.org/10.1016/j.eswa.2011.09.162
http://dx.doi.org/10.15242/IIE.E0714006
12
I.K. Kim, D.W. Jung, and R.H. Park, “Document image binarization
based on topographic analysis using a water flow model”, Pattern
Recognition, vol. 35, pp. 265-277, 2002.
http://dx.doi.org/10.1016/S0031-3203(01)00027-9
M. Sezgin, and B. Sankur, “Survey over image thresholding techniques
and quantitative performance evaluation”, Journal of Electronic
Imaging, vol. 13(1), pp. 146–165, 2004.
http://dx.doi.org/10.1117/1.1631315
B. Bataineh, S.N.H. Sheikh Abdullah, and K. Omar, “An adaptive local
binarization method for document images based on a novel thresholding
method and dynamic windows”, Pattern Recognition Letters, vol. 32,
pp. 1805–1813, 2011.
http://dx.doi.org/10.1016/j.patrec.2011.08.001
N. Otsu, “A threshold selection method from gray-level histograms”,
IEEE Trans. Systems, Man, and Cybernetics, vol. 9(1), pp. 62–66, 1979.
http://dx.doi.org/10.1109/TSMC.1979.4310076
J. Sauvola, and M. Pietikainen, “Adaptive document image
binarization”, Pattern Recognition, vol. 33(2), pp. 225–236, 2000.
http://dx.doi.org/10.1016/S0031-3203(99)00055-2
W. Niblack, An Introduction to Digital Image Processing, Englewood
Cliffs, NJ: Prentice-Hall. 1986, pp. 115-116.
J.M. White, and G.D. Rohrer, “Image Thresholding for Optical
Character Recognition and Other Applications Requiring Character
Image Extraction”, IBM Journal of Research and Development, Vol.
27(4), pp. 400-411, 1983.
http://dx.doi.org/10.1147/rd.274.0400
Ö Taban, http://omertaban.com/tag/median-filtre/, Access Time:
01.10.2013.
B. Sankur, M. Sezgin, Image Thresholding Techniques: A Survey Over
Categories,
http://algoval.essex.ac.uk/rep/thresholdingreview.doc,
Access Time: 20.06.2013.
R. F. Moghaddam, and M. Cheriet, “AdOtsu: An adaptive and
parameterless generalization of Otsu’s method for document image
binarization”, Pattern Recognition, vol. 45, pp. 2419–2431, 2012.
http://dx.doi.org/10.1016/j.patcog.2011.12.013
M. Sezgin, “İmge Eşikleme Yöntemlerinin Başarım Değerlendirmesi ve
Tahribatsız Muayenede Kullanımı”, PhD Thesis, 2002.
P. Liao, T. Chen, and P. Chung, “A Fast Algorithm for Multilevel
Thresholding”, Journal of Information Science and Engineering, vol. 17,
pp. 713-727, 2001.
A. Şengür and İ. Türkoğlu, “Değişmez Momentlerle Türkçe Karakter
Tanıma”, Doğu Anadolu Bölgesi Araştırmaları, pp. 114-117, 2004.