Nutrition Facts Label Processing using Image Segmentation and

International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
Nutrition Facts Label Processing using Image Segmentation and Token
Matching based on OCR
Apoorva P
Lecturer, Department of Computer Science,
Amrita Vishwa Vidyapeetham,
Mysuru Campus, Amrita University, India.
Bindushree.M,
PG Scholar, Department of Computer Science,
Amrita Vishwa Vidyapeetham,
Mysuru Campus, Amrita University, India.
Bhamati.H.A
PG Scholar, Department of Computer Science,
Amrita Vishwa Vidyapeetham,
Mysuru Campus, Amrita University, India.
information about an object or a group of object in order to
facilitate classification. The input document may contain
several lines of text that needs to be categorized into single
character for recognition. Row of characters in image is first
detected. Now each character is to be identified for the row
obtained earlier. This is done by scanning the row vertically
from top to bottom; the first darker pixel detected is the
leftmost (left) pixel of character. Now if all pixels are found to
be blank then this is right of character. The character from the
scanned image is normalized from any pixel size to 15 X 15
pixels. It cropped the image by using top, left, right, and
bottom boundaries. Now the cropped image of 15 X 15 can be
binarized into array of 15 X 15, where black representing 1
and white representing 0. Once the pattern is generated pattern
based recognition or token matching is done on the generated
binary format with the existing templates. Usage of mobile
phones is made simple and gaining lot of popularity.
Especially mobile phones with high resolution camera also
make our tasks easier. A method to identify and list the
nutritional facts written on the labels of food packets is
developed. It is very important for the consumer today to
know about their daily diet. Most of us refer to the nutritional
facts labels and there is a need to maintain a log of all these
facts about the food we consume. The labels are captured
using mobile phone cameras and the text from the labels is
extracted with the help of OCR. A method for preprocessing
is also discussed. Image once taken is preprocessed and
segmented before sending it to the OCR. To improve the
output of OCR token matching and value rectification
methods are used. The paper is organized as follows. A brief
literature survey is presented in section 2. The working
principles of the proposed method are presented in section
3.Details of the dataset used, experimental settings and the
obtained results are presented in section 4. The paper is
concluded in section 5.
Abstract
The system focuses on recognizing nutrition facts from the
fact label. A combination of Otsu’s method for segmentation
and tesseract for optical character recognition (OCR) is
introduced in this work. It is a combined approach to identify
and tabulate line items on nutrition facts labels from mobile
images. At first, we apply image processing techniques
including adaptive thresholding, small region removal,
eccentricity filtering, orientation detection, and rotation to
locate a nutrition facts label in a given image. We then use
region segmentation to isolate individual lines in the facts
table. Later optical character recognition is used on the
segmented line images, and then token matching is used to
correct the output to match a dictionary of expected words and
values. The combination of combined approach with
segmentation and OCR gives a better result than the OCR on
unprocessed images. Mobile images are mostly poor in
resolution. The proposed method achieves better success rate
in identifying each label line item.
Keywords: Image segmentation, Hough transform, Optical
character recognition, tesseract, Token matching.
Introduction
Often abbreviated as OCR, Optical Character recognition
refers to recognition of printed or written text characters in
scanned images by a computer. The main function of OCR is
to acquire the scanned image, preprocess it, and extract the
features and pattern generation. In the image acquisition stage
the text images are obtained from a scanner or a pre stored
image file. In order to handle the unwanted noise, and if the
image is taken using a low resolution scanning device then a
preprocessor helps in smoothening characters, handle
touching characters, proportional spacing, variable line
spacing. Feature extraction is the process of getting
6591
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
strokes and edges. The results show that the algorithm is
robust in most cases, except for very small text characters that
are not properly detected. Also in the case of low contrastin
the image, misclassifications occur in the texture
segmentation. A focus of attention based system for text
region localization has been proposed by Liuand
Samarabandu in [19]. The intensity profiles and spatial
variance is used to detect text regions in images. A Gaussian
pyramid is created with the original image at different
resolutions or scales. The text regions are detected in the
highest resolution image and then in each successive lower
resolution image in the pyramid.
The approach used in [20] utilizes a support vector machine
(SVM) classifier to segment text from non-text in an image or
video frame. Initially text is detected in multi scale images
using edge based techniques, morphological operations and
projection profiles of the image. These detected text regions
are then verified using wavelet features and SVM. The
algorithm is robust with respect to variance in color and size
off on t as well as language.
Related Work
Some of the important papers related to nutrition label
information extraction are described in this section. OCR is
the mechanical or electronic conversion of images of typed,
handwritten or printed text into machine-encoded text. For the
better recognition it is important to provide the image in a
more suitable form [2] image preprocessing is one of the
important step and it is carried out in java and it is done to
increase the recognition accuracy in the software, the process
involved in image processing includes number of steps such
as filtering, gray scale conversion and binirization hence after
preprocessing step we get the final enhanced image. Tesseract
is an open source engine to do the optical character
recognition, it is one of the most accurate OCR engine
available now, it was developed by Google Inc. and licensed
under apache 2.0,[3] the process is character recognition is
done using the enhanced image the process involves the use of
two-pass approach to character recognition. The second pass
is known as "adaptive recognition" and uses the letter shapes
recognized with high confidence on the first pass to recognize
better the remaining letters on the second pass. This is
advantageous for unusual fonts or low-quality scans where the
font is distorted Once the words are recognized on the label it
is important to understand the meaning of the terms used on
the nutrition label, [14] US food and drug administration as
published a detailed instruction on how to read the nutrition
contents on the label also [15] a journal published by
myfitnesspal give an idea to combine calorie value of the food
a person take and the daily exercise a person does to give the
information on the proper diet for that person.
Several different methods have been introduced in the past for
detection and localization of text in images. These approaches
take into consideration different properties related to text in an
image such as color, intensity, connected-components, edges
etc. These properties are used to distinguish text regions from
their background and/or other regions within the image. The
algorithm proposed by Wang and Kangas in [16] is based on
color clustering. The input image is first pre-processed to
remove any noise if present. Then the image is grouped into
different color layers and a gray component. This approach
utilizes the fact that usually the color data in text characters is
different from the color data in the background. The potential
text regions are localized using connected component based
heuristics from these layers. Also an aligning and merging
analysis (AMA) method is used in which each row and
column value is analyzed [16]. The experiments conducted
show that the algorithm is robust in locating mostly Chinese
and English characters in images; some false alarms occurred
due to uneven lighting or reflection conditions in the test
images.
The text detection algorithm in [17] is also based on color
continuity. In addition it alsouses multi-resolution wavelet
transforms and combines low as well as high level image
features for text region extraction. The text finder algorithm
proposed in [18] is based on the frequency, orientation and
spacing of text within an image. Texture based segmentation
is used to distinguish text from its background. Further a
bottom-up ‘chip generation’ process is carried out which uses
the spatial cohesion property of text characters. The chips are
collections of pixels in the image consisting of potential text
Proposed System
The mobile phone images are processed in two phases. In the
first phase, an image is processed for extracting individual
lines of text within the nutrition facts labels and to segment
them. These segmented images are later used for word
recognition. In the second phase Tesseract is used for
recognition of words where the OCR extracts text and
numbers from the images processed in the first phase.
Image Processing
The steps of image processing phase are shown in Figure 1.
Preprocessing and Binarization
Figure 1: Identification of nutrition facts label region.
The original image is converted to binarized image, then
small-region removal is applied on the image, and the image
is filtered through eccentricity filtering. On the resultant image
Hough transform is applied and further filtering based on kmeans is done with morphological closing operation to detect
the lines. To remove the noise for region filtering the original
color images are preprocessed first and binarized. Then small
6592
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
region removal is applied. Filtering is done using a 3X3
neighborhood median filtering to smoothen high-frequency
noise. In order to identify and improve the regions which is
shadowed Adaptive histogram equalization is used. After
shadowed region, bright areas which are caused due to glare
are manipulated in the original image.
A 16X 16 pixel tiles is used to filter images were it is
converted to binary image using local adaptive thresholding.
The pixels within a block were set to zero if the maximum
intensity variation in that block is below 0:3. The entire noisy
and bright regions in the image that are not the part of line in
the image are filtered during adaptive image binarization
stage. In order to preserve the pixels corresponding to the
lines the image is filtered further and any region with
eccentricity less than 0.98 are removed. The smaller regions in
the binarized image are reduced and larger regions in the
binarized image are retained. It is observed that the lines in
the nutritional fact label are parallel lines. If there are small
differences from the orientation of the parallel lines then such
clusters were deleted. In order to rotate the filtered images to
its proper position the Hough transform (HT) is applied on the
filtered image.
Figure 3: (a) Point p0 (b) All possible lines through p0
represented in the Hough space
Following steps are to be considered for detecting straight
lines in an image. First, Edge detection, e.g. using the Canny
edge detector [8].Then mapping of edge points to the Hough
space and storage in an accumulator is done. Interpretation of
the accumulator to yield lines of infinite length. The
interpretation is done by thresholding and possibly other
constraints. Later infinite lines are converted to finite lines.
The finite lines can then be super imposed back on the
original image. The Hough transforms itself is performed in
point 2, but all steps except edge detection is covered in this
worksheet.
The Hough transform takes a binary edge map as input and
attempts to locate edges placed as straight lines. The idea of
the Hough transform is, that every edge point in the edge map
is transformed to all possible lines that could pass through that
point. Figure 4 illustrates this for a single point, and Figure 5
illustrates this for two points. A typical edge map includes
many points, but the principle for line detection is the same as
illustrated in Figure 5 for two points. Each edge point is
transformed to a line in the Hough space, and the areas where
most Hough space lines intersect is interpreted as true lines in
the edge map.
Representation of Lines in the Hough Space
Lines can be represented uniquely by two parameters. Often
the form in Equation 1 is used with parameters a and b.
y=a.x+b
(1)
This form is, however, not able to represent vertical lines.
Therefore, the Hough transform uses the form in Equation 2,
which can be rewritten to Equation 3 to be similar to Equation
1. The parameters θ and r is the angle of the line and the
distance from the line to the origin respectively.
r = x .cos θ + y . sin θ
(2)
y =-(cos θ/sin θ) . x + r / sin θ
(3)
All lines can be represented in this form when θ € [0, 180] and
r € R (or θ € [0, 360] and r > 0).The Hough space for lines has
therefore these two dimensions; θ and r, and a line is
represented by a single point, corresponding to a unique set of
parameters (θ 0, r0). The line-to-point mapping is illustrated
in Figure 2.
Figure 4: (a) Points p0 and p1 (b) All possible lines through
p0 and/or p1 represented in the Hough space
Infinite lines are detected by interpretation of the accumulator
when all edge points has been transformed. An example of the
entire line detection process is shown in Figure 4.The most
basic way the detect lines is to set some threshold for the
accumulator, and interpret all values above the threshold as a
line. The threshold could for instance be 50% of the largest
value in the accumulator. This approach may occasionally
suffice, but for many cases additional constraints must be
applied. As it is obvious from Figure 5, several entrances in
the accumulator around one true line in the edge map will
Figure 2: Line to point mapping.
An important concept for the Hough transform is the mapping
of single points. The idea is that a point is mapped to all lines
that can pass through that point. This yields a sine-like line in
the Hough space. The principle is illustrated for a point p0 =
(40, 30) in Figure 3.
6593
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
corresponding regions of the original grayscale image were
segmented and binarized with Otsu’s method.
have large values. Therefore a simple threshold has a
tendency to detect several (almost identical) lines for each true
line. To avoid this, a suppression neighborhood can be
defined, so that two lines must be significantly different
before both are detected.
The classical Hough transform detects lines given only by the
parameters r and θ and no information with regards to length.
Thus, all detected lines are infinite in length. If finite lines are
desired, some additional analysis must be performed to
determine which areas of the image that contributes to each
line. Several algorithms for doing this exist. One way is to
store coordinate information for all points in the accumulator,
and use this information to limit the lines. However, this
would cause the accumulator to use much more memory.
Another way is to search along the infinite lines in the edge
image to find finite lines. The advantage of the Hough
transform is that the pixels lying on one line need not all be
contiguous. This can be very useful when trying to detect lines
with short breaks in them due to noise, or when objects are
partially occluded. As for the disadvantages of the Hough
transform, one is that it can give misleading results when
objects happen to be aligned by chance. This clearly shows
another disadvantage which is that the detected lines are
infinite lines described by their (m,c) values, rather than finite
lines with defined end points.
Segmentation
There are different type of the Region based method like
thresholding, region growing and region splitting and merging
[8]. Thresholding is an important technique in image
segmentation applications. The basic idea of thresholding is to
select an optimal gray-level threshold value for separating
objects of interest in an image from the background based on
their gray-level distribution. While humans can easily
differentiable an object from complex background and image
thresholding is a difficult task to separate them. The graylevel histogram of an image is usually considered as efficient
tools for development of image thresholding algorithms.
Thresholding creates binary images from grey-level ones by
turning all pixels below some threshold to zero and all pixels
about that threshold to one. If g(x, y) is a threshold version of
f(x, y) at some global threshold T, it can be defined as [4], g(x,
y) = 1 if f(x, y) ≥ T = 0
Otherwise Thresholding operation is defined as: T = M [x, y,
p(x, y), f (x, y)]
In this equation, T stands for the threshold; f (x, y) is the gray
value of point (x, y) and p(x, y) denotes some local property of
the point such as the average gray value of the neighborhood
centered on point (x, y) Based on this, there are two types of
thresholding methods.
Global thresholding:
When T depends only on f (x, y) (in other words, only on
gray-level values) and the value of T solely relates to the
character of pixels, this thresholding technique is called global
thresholding.
Local thresholding:
If threshold T depends on f (x, y) and p(x, y), this thresholding
is called local thresholding. This method divides an original
image into several sub regions, and chooses various thresholds
T for each sub region reasonably [6].
Otsu method is type of global thresholding in which it depend
only gray value of the image. Otsu method was proposed by
Scholar Otsu in 1979. Otsu method is global thresholding
selection method, which is widely used because it is simple
and effective [7]. The Otsu method requires computing a gray
level histogram before running. However, because of the onedimensional which only consider the gray-level information, it
does not give better segmentation result. So, for that two
dimensional Otsu algorithms was proposed which works on
both gray-level threshold of each pixel as well as its Spatial
correlation information within the neighborhood. So Otsu
algorithm can obtain satisfactory segmentation results when it
is applied to the noisy images [11]. Many techniques thus
were proposed to reduce time spent on computation and still
maintain reasonable thresholding results. In [12], proposed a
fast recursive technique that can efficiently reduce
computational time. Otsu’s method was one of the better
threshold selection methods for general real world images
with regard to uniformity and shape measures. However,
Otsu’s method uses an exhaustive search to evaluate the
criterion for maximizing the between-class variance. As the
Figure 5: Hough Transform
At this point, two challenges sometimes persisted, small
letters clinging on to the lines due to blur in the original
image, additional noise to the left and right of the detected
lines. To address these issues, additional filtering was
performed based on row and column intensities. In particular,
any row containing too few white pixels in total was set to
zeros; any column passing through too few distinct white
regions was set to zeros. In both cases, the threshold was
selected by k-means clustering. Finally, we performed
morphological closing with a long, thin horizontal structuring
element in order to close disjointed lines in the binary mask.
Based on the lines in the mask, we identified the bounding
boxes of the text regions of the nutrition facts labels. The
6594
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
number in classes of an image increases, Otsu’s method takes
too much time to be practical for multilevel threshold
selection [13].Otsu’s method is aimed in finding the optimal
value for the global threshold. It is based on the interclass
variance maximization. Well thresholding classes have wel l
discriminated intensity values.
The weighted within-class variance is equation (4).
 w2 (t )  q1 (t ) 12 (t )  q2 (t ) 22 (t )
Further noise removal was performed using orientation
filtering, since binarization alone often resulted in a significant
number of small noise regions. Finally, each segmented
region of text was cropped (by removing all dark and all white
padding on the image borders) and further divided by
identifying all-white horizontal lines running through the
center of the region. In most cases, the segmentation of each
nutrition facts line was performed cleanly.
(4)
Where the class probabilities are estimated as:
t
q1 (t )   P(i ) q2 (t ) 
i 1
I
 P(i)
i t 1
(5)
And the class means are given by:
t
1 (t )  
i 1
I
iP (i )
iP (i)
 2 (t )  
q1 (t )
i t 1 q2 (t )
(6)
Finally, the individual class variances are:
P(i)
q1 (t )
i 1
I
P(i)
 22 (t )   [i  2 (t )]2
q2 (t )
i t 1
t
 12 (t )   [i  1 (t )]2
(7)
Now, we could actually stop here. All we need to do is just
run through the full range of t values [1,256] and pick the
value that minimizes .
But the relationship between the within-class and betweenclass variances can be exploited to generate a recursion
relation that permits a much faster calculation.
•
The basic idea is that the total variance does not
depend on threshold (obviously).
•
For any given threshold, the total variance is the sum
of the within-class variances (weighted) and the
between class variance, which is the sum of
weighted squared distances between the class means
and the grand mean.
Figure 6: After segmentation
In Figure 6 at the top a grayscale image is shown with the
binarization. Later the steps like cropping, and orientation
noise filtering is conducted. Text quality was significantly
enhanced.
Word Recognition
After some algebra, we can express the total variance as
equation (8)
(8)
Since the total is constant and independent of t, the effect of
changing the threshold is merely to move the contributions of
the two terms back and forth.
So, minimizing the within-class variance is the same as
maximizing the between-class variance.
The nice thing about this is that we can compute the quantities
in recursively as we run through the range of t values. Finally
q1 (t  1)  q1 (t )  P(t  1)
q (t ) 1 (t )  (t  1) P(t  1)
1 (t  1)  1
q1 (t  1)
2 (t  1) 
  q1 (t  1) 1 (t  1)
1  q1 (t  1)
(9)
Figure 7: (a) Raw OCR output from unsegmented (top) and
segmented images. (b) The segmented images led to cleaner
text outputs.
(10)
(11)
Optical character recognition (OCR) is used to extract text and
numeric values from preprocessed images. Tesseract was used
6595
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
as OCR engine because it is one of the most widely used
open-source engines. However, character based recognition is
performed by Tesseract instead of word-based recognition. So
the output was then refined word-based token matching to
improve the accuracy. Tesseract was restricted to English
letters, digits, and the special characters %, ?, and . This
improved the ability to rectify misread characters from OCR.
Tesseract was very sensitive to image skew, clutter, and noise,
especially with unsegmented block images. Figure 7 shows a
comparison of Tesseract performance on block versus
segmented inputs. An improved performance was evident
with segmented images.
50 images, and 34% error if we exclude the images that failed
segmentation due to confusion with barcode recognition.
Table 1: The above table shows error rates and runtime across
four test scenarios.
Error
Rate
Avg
Time
Token Matching
The words that are used on the nutrition facts label are limited
(e.g., calories, cholesterol), the output from Tesseract was
matched to a predefined dictionary to improve the recognition
results. Two token matching methods were implemented.
Naive string matching is used to look for exact appearances of
tokens in each line. Scored matching based on the
Levenshtein distance between input lines and each token in
the dictionary, were normalized by the line length. Since a
lower score indicates a higher confidence, this normalization
ensured that shorter tokens are not disproportionately favored.
Output tokens and related numeric values with a confidence
score above a given threshold (60%) were considered. At first
if there label quantities that contain number are considered in
naive string matching, (e.g., ‘10g’). The numbers that directly
followed the matched tokens were considered without any
rectification. Later in the second phase, that is the scored
matching algorithm, searches for tokens such as ‘mg,’ ‘g,’ or
‘%’ that represent the values that are seen in a line. If both
gram and percent values are given, we use the recommended
daily values of each item to reconcile any conflicts between
the values. Figure 8shows token matching and value
rectification results which show further improvement.
Raw Image
Naive
Token
Match
match
70%
68%
Segmented Image
Naive
Token
Match
Match
53%
36%
13 sec
33 sec
16 sec
38 sec
Error rate is calculated as the percentage of line items across
all 50 test images that the algorithm either did not detect or
detected with incorrect values. The results were also tested
using another OCR engine namely (ABBYY Fine reader 11)
and the result remains same (34% with failed segmentation
images removed).A comparison of our processing techniques
is shown in Table 1. It represents the effectiveness
segmentation as it represents the result for unprocessed (raw)
versus segmented image result. There is a large difference in
the error rate between the naive and token matching methods
on the segmented images. It indicates the need for stronger
word recognition with the help of a high-quality OCR output
in order to get good results. The improved result also requires
a thorough image processing steps to segment the images for
high quality recognizer.
Conclusion and Future Work
The experiments conducted shows a decreased error rate of
36% compared to the error rate when raw images were used
i.e. 68% with the same token matching algorithm. The image
processing steps and the token matching method surely gives
an improved result of 34% over the raw image and naïve
matching algorithm. When a different OCR (ABBYY
Finereader 11) was used error rate remains the same. The
results are based on a dataset of 50 images taken by iphone 5s.
The method proves to be efficient for mobile images of the
nutritional fact labels and proves consistent on a larger dataset
also.
Our runtime was calculated on R2013a MATLAB on a 2.4
GHz Intel dual-core processor. For the image segmentation
step alone, average processing time on this machine was 1525s per image. On an improved hardware machine the average
runtime can be estimated as an average runtime of 20s per
image.
The results discussed are mainly affected by the difficulty in
interpreting the bolded and blurred text. Along with template
matching, morphological operators such as eroding the bolded
image may improve the accuracy of the OCR. Recognizers
other than token matching, such as template matching, bag of
words, or multi-class support vector machines can be used.
The main problem faced during the image segmentation was
images that contained barcode and high-eccentricity clutter
oriented parallel to the nutrition facts lines. The method
Figure 8: (a)Token matching and numeric rectification.
(b)Levenshtein distance matching resulted in improved token
recognition.
Experimental Results
The combined algorithm was tested on a dataset of 50 images
of processed food items which are from a local supermarket.
All the images were taken using an iPhone 5. For each image,
details like values for calories, fat, cholesterol, sodium,
carbohydrates, dietary fiber, sugar, and protein were manually
verified. The performance of the algorithms is verified using
these values. The algorithm resulted in 36% error on the full
6596
International Journal of Applied Engineering Research ISSN 0973-4562 Volume 11, Number 9 (2016) pp 6591-6597
© Research India Publications. http://www.ripublication.com
[14] H. Jiang, C. Han, K. Fan, “A Fast Approach to
Detect and Correct Skew Documents," in
Proceedings of the 13th International Conference on
Pattern Recognition, vol. 3, 1996, pp. 742-746.
[15] ShrinathJanvalkar,
Paresh
Manjrekar,
SarveshPawar,''Text
Recognition
from
an
Image'',”Int. Journal of Engineering Research and
Applications”ISSN: 2248-9622, Vol. 4, Issue
4(Version5), April 2014, pp.149-151.
[16] Kongqiao Wang and Jari A. Kangas, Character
location in scene images from digitalcamera, the
journal of the Pattern Recognition society, March
2003.
[17] K.C. Kim, H.R. Byun, Y.J. Song, Y.W. Choi, S.Y.
Chi, K.K. Kim and Y.K Chung,Scene Text
Extraction in Natural Scene Images using
Hierarchical FeatureCombining and verification,
Proceedings of the 17th International Conference
onPattern Recognition (ICPR ’04), IEEE.
[18] Victor Wu, RaghavanManmatha, and Edward M.
Riseman, TextFinder: AnAutomatic System to
Detect and Recognize Text in Images, IEEE
Transactions onPattern Analysis and Machine
Intelligence, Vol. 21, No. 11, November 1999.
[19] Xiaoqing Liu and JagathSamarabandu, A Simple and
Fast Text LocalizationAlgorithm for Indoor Mobile
Robot
Navigation,
Proceedings
of
SPIEIS&TElectronic Imaging, SPIE Vol. 5672, 2005.
[20] Qixiang Ye, Qingming Huang, Wen Gao and Debin
Zhao, Fast and Robust textdetection in images and
video frames, Image and Vision Computing 23,
2005.
discussed finds it difficult to differentiate the actual lines and
lines out of interest if they are parallel. A filtering method to
handle these difficulties can be considered. The proposed
work does not support any distortion or curvature in the
image. However the work results in a robust way to correct a
little distortion in adjacent regions.
References
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Hetal J. Vala, AsthaBaxi, “A Review on Otsu Image
Segmentation
Algorithm”,ISSN:
2278-1323,
International Journal of Advanced Research in
Computer Engineering & Technology (IJARCET),
Volume 2, Issue 2, February 2013.
W. Bieniecki, S. Grabowski, and W. Rosenberg,
“Image Preprocessing for Improving OCR
Accuracy,” in Proceedings of the International
Conference on Perspective Technologies in MEMS
Design, 2007, pp. 75-80.
R. Smith, “An Overview of the Tesseract OCR
Engine,” in Proceed-ings of the Ninth International
Conference on Document Analysis and Recognition,
vol. 2, 2007, pp. 629-633.
Rafael C. Gonzalez, Richard E. Woods, “Digital
Image Processing”, 2nd ed., Beijing: Publishing
House of Electronics Industry, 2007.
W. X. Kang, Q. Q. Yang, R. R. Liang,“The
Comparative Research on Image Segmentation
Algorithms”, IEEE Conference on ETCS, pp. 703707, 2009
Er. Nirpjeetkaur and ErRajpreeetkaur, “A review on
various method of image thresholding”,IJCSE-2011.
Zhong Qu andLiHang”Research on Iimage
Segmentation Based on the Improved Otsu
Algorithm.”,2010
W. X. Kang, Q. Q. Yang, R. R. Liang,“The
Comparative Research on Image Segmentation
Algorithms”, IEEE Conference on ETCS, pp. 703707, 2009.
Z. Ningbo, W. Gang, Y. Gaobo, and D. Weiming, “A
fast 2d otsu thresholding algorithm based on
improved histogram,” in Pattern Recognition, 2009.
CCPR 2009. Chinese Conference on, 2009, pp. 1-5.
L. Dongju and Y. Jian, “Otsu method and k-means,”
in Hybrid Intelligent Systems, 2009. HIS ’09. Ninth
International Conference on, vol. 1, 2009, pp. 344349.
LIU Jian-zhuang, Li Wen-qing, “The Automatic
threshold of gray level pictures via Two-dimentional
Otsu Method”, Acta Automatic Sinica,1993
J. Gong, L. Li, and W. Chen, “Fast recursive
algorithms for two-dimensional thresholding,”
Pattern Recognition,vol. 31, no. 3, pp. 295-300,
1998.
P. K. Sahoo, S. Soltani, A. K. C. Wong, and Y.
Chen, “A survey of thresholding techniques,”
Computer Vision Graphics Image Processing, Vol.
41, 1988, pp. 233-260.
6597