Word-Level Recognition of Cursive Script

172
IEEE TRANSACTIONS ON COMPUTERS, VOL. C-28, NO. 2, FEBRUARY 1979
[4] D. H. Sawin, III, "Design of reliable synchronous sequential circuits," IEEE
Trans. Comput., vol. C-24, pp. 567-569, May 1975.
[5] S. A. Gupta, D. K. Chattopadhyay, A. Patil, A. K. Bandyopadhyay, M. S. Basu,
and A. K. Choudhury, "Realization of fault-tolerant and fail-safe sequential machines," IEEE Trans. Comput., vol. C-26, pp. 91-96, Jan. 1977.
[6] G. Mago, "Monotone functions in sequential circuits," IEEE Trans. Comput.,
vol. C-22, pp. 928-933, Oct. 1973.
Word-Level Recognition of Cursive Script
RAOUF F. H. FARAG
Abstract-Recognition of cursive script has been attempted by
discovering the letters forming each word, and consequently recognizing the word. The work done in this area has employed several
techniques to enhance recognition results; examples of these
techniques include use of topological features and use of context.
The problem solved in this research is somewhat more modest, in
the sense that the goal of the research is to recognize a small
vocabulary of key words and not to recognize every English word. A
model is introduced to recognize cursive script on a word-level basis
rather than a letter-level basis. Two recognition experiments are
conducted on a set of ten cursively written key words with a resulting
100 percent correct recognition on that training sample.
Index Terms-Cursive script, Markov chain, real-time recogni-
tion, word recognition.
I. INTRODUCTION
Recognition of cursive script has been usually approached from
several points of view. The view first proposed by Mermelstein
and Eden [1] is that to be able to recognize handwritten cursive
script, the researcher has to separate the pattern under question
into its component letters. Using the letters forming the word
cursive script can be -recognized. This approach is logical if the
objective is to recognize all English words, or even a large subset
of the language. Mermelstein's model is based on separating the
input real-time word into a sequence of up and down strokes, and
then transform these strokes into letters with the aid of contextual
information and a dictionary. Ehrich and Koehler [2] illustrate
how contextual information has been effectively and efficiently
represented by use of binary digrams. In [2] the use of context is
essential since its elimination decreases correct recognition by
about 80 percent. Ehrich's et al. [2] results are impressive even
though they point out that as the dictionary size increases
(approaching normal English) the contextual information, which
is so essential, becomes less effective. This is true since the majority of the words found in the dictionary are short words. Sayre [3]
uses context in the form of digrams and trigrams in conjunction
with a topological approach to the recognition of cursive script.
Sayre does not utilize real-time information but uses sentences
already written. Sayre reports a 79 percent recognition of 84
words. The test data used are 3 different phrases written a total of
20 times (by ten different authors). Three of the twenty sentences
Manuscript received November 28, 1977; revised January 12, 1978, June 14, 1978,
and July 21, 1978.
The author was with St. John's University, Jamaica, NY 11439. He is now with
RCA Solid-State Technology Center, Somerville, NJ 08876.
are used for training, in the sense of extracting topological features. The remaining 17 sentences contain the 84 words used in
the recognition experiments. In evaluating recognition techniques
for cursive script two points should be kept in perspective. The
first is that mentioned by Ehrich et al. [2], concerning the effect of
dictionary size on contextual information. The second concerns
topological approaches, namely, whether the topological features
extracted are sample-dependent or not. These factors should help
in assessing the general applicability of proposed recognition
models. The reader is also referred to Frishkopf and Harmon [4]
which is one of the earliest papers dealing with recognition of
cursive script.
Recognition of cursive script is obviously a worthwhile endeavor for further investigation. The pursuit of a solution to this
general problem should not hinder the solution of more modest
subproblems. This correspondence deals with one such subproblem, namely, recognition of cursive script in a limited vocabulary
environment. One example of such an environment is found in the
recognition of handwritten programs. A second example is found
in shipping or manufacturing, where key words direct further
processing.' The basic idea utilized in this research is to recognize
cursive script on a word level. This approach should be possible
since the vocabulary under consideration is small and the number
of pattern classes is not excessive. Recognition is done without
any attempt to discover the letters forming the words, and therefore no contextual information is used.
II. THE WORD-LEVEL RECOGNITION MODEL
The model to be introduced in this section uses real-time information in much the same manner as that used by Mermelstein [1].
The handwritten cursive script is viewed as a sequence of directional strokes, these strokes are obtained as a sequence in real
time. The sampling rate is selected to correspond to an approximate stroke length of ' in. This coding scheme can be viewed as a
transformation a defined as follows:
oc:y
Ax
where Ay is the change of y coordinates between two successive
points. Ax is the corresponding change of the x coordinates; andf
is a set of eight elements {0, 1, 2, 3, 4, 5, 6, 7} such that
iY if (i/4_-/8) < tan-' _
<
_ + /8
and i ef
Fig. 1 shows the eight directions that form the setf Fig. 2 is an
example of writing the letter R in real time, the code generated for
that letter is also shown. This directional coding scheme is similar
to a scheme suggested by Freeman [5].
A. Learning
A word written in a nonreal-time environment contains information that may be utilized by use of topological features [3], or
other researcher defined techniques. A word, or group of words,
written in a real-time environment contain additional information
found in the order in which the strokes are entered. This type of
real-time information has been utilized by Mermelstein et al. [1]
1 Threshold Technology, Inc., Delran, NJ 08075, manufactures and markets a
voice input device for specifically that purpose. The system is used by a single user at
a time and is capable of recognizing up to 200 key words. This system is used for
machine control, inventory update and so on.
0018-9340/79/0200-0172$00.75 C) 1979 IEEE
173
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-28, NO. 2, FEBRUARY 1979
Mi1
N1 is
0
1
2
3
4
[p 0
p1 1
p2
p3
X
5
6
7
P5
P6
P7 I
=
4s
vector representing the time interval 1.
a row
p0 is the probability that the first stroke is a 0.
Fig. 1. Coding scheme.
P1 is the probability that the first stroke is a 1.
Etc.
and
Fig. 2. Example: Letter R. Code: 660123457554.
i=O
p.
1
=
1;
0 < p < 1;
i1
i = 0,1,2,3>,4,5,6,7
Fig. 3. Initial stochastic row vector.
for recognition of script, and by Farag and Chien [6] for
verification of handwritten signatures. This research deals with an
Stroke at time interval j
environment in which the real-time information is available. The
restrictions posed by such a requirement are not severe since real0
1
3
2
4
5
6
7
time script entry terminals are commercially available. The price
of these terminals should be comparable to the price of hardware
0
.......
p
p
p
00
needed to process script written in a nonreal-time environment.
01
07
Learning and representing the real-time information should be
1
.......
p
done in a manner consistent with the nature of this information.
01
The model chosen to represent this real-time information is a
2
nonstationary Markov chain. Each pattern class (key word) is
ct
represented as a collection of transition matrices, each matrix
aD
corresponding to a particular time interval. The states of the
3
*rMarkov chain are the strokes defined in Fig. 1. Since the coding
scheme proposed uses 8 vectors to denote str6kes, there will be M.J
4
eight states, namely, {O, 1, 2, 3, 4, 5, 6, 7}. Like any nonstationary
a
Markov chain a key word would be represented by a collection of
h
4-J
s
stochastic forward transition matrices. The initial vector Ml cor0
U1)
$.,
responds to the probability distribution of the first stroke. In Fig.
ci)
3 one such stochastic row vector is shown.
6
For the jth stroke the stochastic transition matrix Mj between
the (j- 1) stroke and the jth stroke is used. One such matrix is
7 0 "7 1
7
shown in Fig. 4. As an example p34 denotes the probability of a
Fig. 4. Stochastic transition matrix at.dbe jth interval.
stroke 4 at time j having obtained a stroke 3 at time j - 1.
To illustrate the use of the first-order Markov chain the R
shown in Fig. 2 would have the following probability associated
with it:
Markov chain is found to be, some statistical tests would have to
be
conducted. The statistical testing involved, is time consuming
1
3
4
2
5
6
7
10
12
8
9
551
P6 P66 * P60 * Pol * P12 * P23 P34 * p45 P57 * P75 P55
P54
and would require more attention than the recognition tests in(1) volved. Consequently, no optimization of the learning model is
The superscript denotes the time interval in which the stroke performed; a nonstationary Markov chain would be used for
learning. The reader interested in optimizing this learning
occurs.
The model discussed so far is a first-order nonstationary procedure is referred to Anderson et al. [7], Jermann [8], and
Markov chain, if a second-order nonstationary Markov chain was Forney [9].
For each key word a set of stochastic matrices is formed to
to be used two symbols occurring at time interval j- 2 and j - 1
would be considered in calculating the probabilities at time inter- represent this key word. Since each key word is formed of a
val ]. In this case the probability associated with the R shown in number of strokes that may vary from one pattern class to the
next, the number of strokes representing each pattern class are
Fig. 2 would be
taken to be equal. For long pattern classes the excess information
5
1
2
6
4
5
9
50
3
p7
11
12
P6 P66 P660 * P601 Po02 P123 P234 P375
*7 77 P7) * P554. found in the last part of the patterns is truncated. This normaliza(2) tion allows a uniform handling of patterns during classification.
.-I
4-)
The superscript denotes the time interval in which the strokes
occur.
In order to determine whether the Markov chain is in fact
stationary or nonstationary, and what the correct order of the
B. Classification
The classification scheme used is a maximum likelihood
classifier proposed by Lee and Fu [10]. The unknown pattern (Z)
is fed into the representation of all key words. The representations
174
IEEE TRANSACTIONS ON COMPUTERS, VOL.
C-28,
NO.
2,
FEBRUARY 1979
Fig. 5. Maximum likelihood classifier.
Fig. 6. Sequential classification scheme.
being nonstationary Markov chains. By selecting the correct
entries of the matrices the probabilities p(Z W1), i = 1, 2, ., N
are calculated much in the same way as the probability calculations of equations (1) and (2). The joint probabilities p(Z, W1) are
obtained by multiplication with the a priori probabilities p(w1).
The maximum selector now picks the largest joint probability; if
this choice is p(Z, Wj) then Z is classified as belonging to key word
j. (Refer to Fig. 5.)
III. EXPERIMENTS IN RECOGNITION OF
chains; for a first-order chain the initial matrix is of size 1 x 8, and
each subsequent matrix is square of size 8 x 8. For the secondorder chain the initial matrix is a row matrix of size 1 x 8, the
second matrix is an 8 x 8 matrix, each subsequent matrix is a
64 x 8 matrix. The size 64 x 8 is used since one symbol is read at
a time thus there are eight possible inputs; the second-order chain
indicates that two previous symbols are remembered or there
exists 64 such states.
IV. DOMAIN OF APPLICABILITY AND REMARKS
CURSIVE SCRIPT
The classification model suggested for word-level recognition is
The experiments on cursive script were conducted using ten key based on the assumption that the number of key words to be
words of comparable length. Each of the ten key words was writ- classified is small. This limitation can be lifted by considering a
ten by a different author, and each key word was written a total of sequential classification scheme. Fig. 6 illustrates the scheme. The
20 times. Thus, the data available were a total of 200 key words. key words would be sequentially subdivided using some readily
Two experiments were conducted on this set of key words. The identifiable feature. An example of such a feature is length. Once
first experiment involved using a first-order nonstationary the division is finished the word-level recognition model can now
Markov chain to represent each key word. The training sample be used. In addition to subdividing a large dictionary into more
used is the total sample set. When this training set was classified, manageable divisions, this scheme can eliminate the problem of
196 key words were correctly classified and 4 key words were distinguishing between two words with similar starting letters.
incorrectly classified yielding an average recognition rate of 98 Using the word-level recognition model suggested in Section II,
percent. When the recognition experiment was repeated using a the words class and classify may not be distinguishable. If you
second-order nonstationary Markov chain, the result was perfect recall that the model truncated longer words, the two words class
recognition of 100 percent. The time required to classify a key and classify may become totally indistinguishable if the "ify" in
word using either the first-order or the second-order model classify was truncated. By using the sequential classification
averaged 0.34 s using a DEC 10 computer. The computation of scheme words such as class and classify would fall in different
p(Z W) is calculated by selecting the appropriate entries from the subdictionaries, thus eliminating the need to distinguish between
matrices Mi and multiplying these probabilities as was done in (1) them.
and (2). The classification time for the first-order and the secondThe perfect recognition reported in the previous section was
order process are equal. The difference between both schemes is based on classifying the training sample. Also, the key words were
found in the size of the matrices associated with the different presented in an isolated form to the recognizer. The problem faced
175
IEEE TRANSACTIONS ON COMPUTERS, VOL. c-28, NO. 2, FEBRUARY 1979
in this research was the collection of a large enough real-time
handwritten sample for nontraining sample classification. Segmentation of sentences into words is a major research topic that
should be an area of further investigation.
In the Introduction it was mentioned that voice processing
equipment are commercially available that recognize isolated
words. Typical markets for such equipment are found in control
environments where voice commands may perform certain tasks.
A specific market for this voice equipment is found on shipping
docks, where voice commands route commodities to preselected
locations. In situations of high acoustical noise, or close proximity
of several system users, a voice input device may be undesirable.
In such applications or applications where excessive employee
talking is present, the proposed cursive script recognition system
should be viewed favorably.
A second application is found in processing handwritten computer programs. Consider Fortran, the key words to be
recognized are AND, BACKSPACE, COMMON, CALL, COMPLEX,
CONTINUE, DATA, DIMENSION, DO, DOUBLE, PRECISION, END FILE,
END, EQUIVALENCE, EXTERNAL, FORMAT, FUNCTION, INTEGER, IF,
LOGICAL, NOT, OR, PAUSE, PRINT, PUNCH, READ, REAL, RETURN,
REWIND, STOP, SUBROUTINE, TRUE AND WRITE. A few additional
abbreviations are also allowed. This vocabulary should pose no
problem in recognizing it as cursive script. The syntax of the
Fortran statements is well known and recognition of the remainder of the statements can be performed using the syntactic information of Fortran.
The recognition model introduced in this paper is based on a
probabilistic classification scheme. In Farag et al. [6] distance
measures are used for verifying signatures written in real time. The
coding scheme used in [6] is identical to the coding scheme used in
this research. Similar distance measures are used by Fujimoto et
al. [11] for classification of numerals in a nonreal-time environ-
ment. Distances measures can, therefore, be investigated as alternative methods for the design of the recognition model.
The results presented in this paper illustrate that recognition of
cursive script can be done on a more modest approach. If the goal
of a researcher is to recognize English, then the approaches found
in the literature, i.e., recognizing words by discovering and recognizing their component letters, have to be followed. On the other
hand, if the application is concerned with a limited vocabulary
then the word level recognition can be an effective and successful
technique.
ACKNOWLEDGMENT
The author would like to thank a reviewer for pointing out the
inefficiency of a computational technique previously used.
REFERENCES
[1] P. Mermelstein and M. Eden, "Experiments on computer recognition of connected handwritten words," Inform. Contr., vol. 7, pp. 255-270, 1964.
[2] R. W. Ehrich and K. J. Koehler, "Experiments in the contextual recognition of
cursive script," IEEE Trans. Comput., vol. C-24, pp. 182-194, Feb. 1975.
[3] K. M. Sayre, "Machine recognition of handwritten words: A project report,"
Pattern Recognition, vol. 5, pp. 213-228, Sept. 1973.
[4] L. S. Frishkopf and L. D. Harmon, "Machine reading of cursive script," in Proc.
4th London Symp. Information Theory, Butterworths, London, 1961.
[5] H. Freeman, "On the encoding of arbitrary geometric configurations," IRE
Trans. Electron. Comput., vol. EC-10, pp. 260-281, 1961.
[6] R. F. H. Farag and Y. T. Chien, "Online signature verification," in Proc. Int.
Conf. on Online Interactive Computing, pp. 403-424, Uxbridge, England, 1972.
[7] T. W. Anderson and L. A. Goodman, "Statistical inference about Markov
chains," Annals Math. Stat., vol. 28, pp. 89-110, 1957.
[8] W. H. Jermann, "Measurement of linearly dependent processes," Ph.D. dissertation, Univ. of Connecticut, Storrs, 1967.
[9] C. D. Forney, Jr., "The Viterbi algorithm," Proc. IEEE, vol. 61, pp. 268-278,
Mar. 1973.
[10] H. C. Lee and K. S. Fu, "Stochastic linguistics for picture recognition," Tech.
Rep. TR-EE72-17, Purdue Univ., Lafayette, IN, June 1972.
[11] Y. Fujimoto, S. Kadota, S. Hayashi, M. Yamamoto, S. Yajima, and M. Yasuda,
"Recognition of handprinted characters by nonlinear elastic matching," 3rd Int.
Joint Conf: Pattern Recognition, pp. 113-118, Coronado, CA, Nov. 1976.