Grammatical error detection from English
utterances spoken by Japanese
Takuya Anzai∗ , Seongjun Hahm∗ , Akinori Ito∗ , Masashi Ito† and Shozo Makino‡
∗
Tohoku University, Sendai 980-8579 Miyagi Japan
E-mail: {takuya,branden65,aito}@makino.ecei.tohoku.ac.jp
† Tohoku Institute of Technology, Sendai 982-8577 Miyagi Japan
E-mail: [email protected]
‡ Tohoku Bunka Gakuen University, Sendai 981-0943 Miyagi Japan
E-mail: [email protected]
Abstract—This paper describes methods to recognize English
utterances by Japanese learners as accurately as possible and
detects grammatical errors from the transcription of the utterances. This method is a building block for the voice-interactive
Computer-Assisted Language Learning (CALL) system that enables a learner to make conversation practice with a computer.
A difficult point for development of such a system is that the
utterances made by the learners contain grammatical mistakes,
which are not assumed to happen in an ordinary speech recognizer. To realize generation of accurate transcription including
grammatical mistakes, we employed a language model based on
an N-gram trained by generated texts. The text generation is
based on grammatical error rules that reflect tendency of grammatical mistakes made by Japanese learners. The experimental
results showed that the proposed method improved recognition
accuracy compared with the conventional recognition and error
detection method.
I. I NTRODUCTION
With the progress of globalization in recent years, population of English learners has been increased. Among various English learning methods, Computer-Assisted Language
Learning (CALL) system is one of the most promising learning
methods. Most CALL systems focus on training for reading,
writing and listening, and few commercial CALL systems
provide with training method for speaking. Recently, CALL
systems for speaking practice have been developed. However,
conventional CALL systems with speaking practice focus on
training of pronunciation or intonation [1,2,3].
To improve conversation skills, it is necessary for learners
to practice conversation in a real dialogue. So, several voiceinteractive CALL systems for conversation practice have been
developed [4,5].
One of the biggest problems on realizing such CALL systems is how to recognize learners’ utterance correctly. In most
cases, utterances spoken by a foreign language learner contain
pronunciation errors and grammatical mistakes. However, a
voice-interactive CALL system must recognize a learner’s
utterance correctly including grammatical errors to progress
dialogue smoothly and to give pertinent feedbacks. Therefore,
to develop a CALL system for grammar or expression practice, we have to improve recognition accuracy of utterances
containing grammatical errors.
In this paper, we propose a method for recognizing learner’s
utterances and detecting grammatical errors. This work is an
improvement of recognition method based on text generation
and N-gram trained from the generated text [6]. We revised
the rules of generation of sentences with errors for improving
recognition performance.
On developing the CALL system, we assume that learners
first do a pre-exercise on the fundamental vocabulary and
grammatical expressions used in a specific situation, and then
make a conversation with the system on the topics. The
learner’s utterance is recognized by the system using a speech
recognizer. If the utterance contains any errors, the system
points them out and demonstrates a correct utterance. In this
way, learners progress a conversation practice [6].
Because learners already know keywords through the preexercise before the conversation with the system, we can
expect learners to respond to the system using the same
expressions as those appearing in the pre-exercise [5]. We
call a correct sentence expected to be uttered by learners “a
target sentence”. However, in reality, not all utterances made
by a learner match the target sentences. A sentence actually
uttered by a learner is referred to as “an uttered sentence”.
And then, the recognition results by the speech recognizer
are often different from the uttered sentences because of
recognition errors. We refer to a sentence obtained from the
speech recognizer as “a recognized sentence”.
II. R ECOGNITION OF LEARNERS ’ UTTERANCE USING AN
N- GRAM FROM GENERATED TEXT
In this section, we describe the approach by Ito et al. [6],
which is our previous work. An N-gram LM is powerful and
flexible with regard to unpredictable errors, but it requires a
large amount of training data. For training an N-gram without
large training data, we developed a method to train an N-gram
using artificial sentences generated from the target sentences.
First, we prepare grammatical error rules that are frequently
made by Japanese learners. Then, the rules are applied to
the target sentences, and a sufficient amount of sentences
are generated. Finally, a back-off N-gram is trained from the
generated sentences.
482
Proceedings of the Second APSIPA Annual Summit and Conference, pages 482–485,
Biopolis, Singapore, 14-17 December 2010.
10-0104820485©2010 APSIPA. All rights reserved.
TABLE I
G RAMMATICAL ERRORS IN THE SST CORPUS
Rank
Correct
Uttered
Type
1
a
²
DEL
2
the
²
DEL
3
²
the
INS
4
a
the
SUB
5
to
²
DEL
Rank
1
2
3
In this text generation method, two kinds of error rules
are used: corpus-based error rules and generic error rules.
The first ones are extracted from transcriptions of English
utterances by native Japanese speakers. The SST (Standard
Speaking Test) corpus was used as a source of the error rules
[7]. The SST corpus consists of transcriptions of interviews
with native Japanese speakers in English. The grammatical and
lexical errors in this corpus are manually annotated with the
correct expressions. Table I shows the errors most frequently
observed in the corpus. The symbol ² denotes the absence of
a corresponding word. The error types DEL, INS and SUB
indicate a deletion error, an insertion error and a substitution
error, respectively. The second ones are manually created to
cover a wider range of error types. These error rules include
insertion of articles, confusion between singular and plural,
adjective comparative and verb tense errors.
On applying these two kinds of rules, the generic error rules
were first applied, and the corpus-based rules were applied
next. The probabilities of choosing a corpus-based rule are in
proportion to the frequencies of the errors in the SST corpus.
Conversely, the probabilities of applying a generic error rule
are given a priori. The final recognition performance was not
greatly affected by the probabilities of the generic error rules.
In the corpus-based rules, only substitution and deletion
error rules were applied to one word. However, errors applied
to more than one word (e.g. Correct: office worker, Uttered:
work man) and insertion errors also exist in the SST corpus.
Therefore, the error rules concerning multiple words should
be treated properly.
III. R EVISION OF S ENTENCE G ENERATION RULES
A. Contraction rules
In the previous work, generic error rules include contraction
rules such as “I am → I’m” or “He’s → He is”. However,
for a contraction is not a mistake, it is inappropriate to
regard a contraction as a grammatical error. Therefore, we
generate sentences with and without contractions, and all of
the generated sentences are used as new target sentences.
B. Corpus-based rules
Ito et al. [6] did not use insertion error as corpus-based
error rules because contextual information is indispensable
for applying such rules. Instead, only insertion of articles
was involved in the generic error rules. In this paper, we
address various insertion errors including article insertions
considering POS (parts-of-speech) of the words before and
after the insertion words.
It is necessary to determine the POS of all words that
appeared in the SST corpus to extract the insertion error rules.
4
5
TABLE II
I NSERTION ERRORS IN THE SST CORPUS
Preceding POS
Insertion
Following POS
preposition
the
singular noun
base form verb
the
singular noun
not 3rd person singular,
the
singular noun
present tense verb
preposition or infinitive “to”
the
singular noun
not 3rd person singular,
a
singular noun
present tense verb
TABLE III
M ULTIPLE WORD SUBSTITUTIONS IN THE SST CORPUS
Rank
Correct
Uttered
1
like it
like
2
a lot of
lot of
2
a little bit
little bit
3
will have
have
3
see it
it
We estimated POS tags of all words using Brill’s Tagger [8],
and counted insertion errors and the POS tags around the
insertions. Table II shows the rules most frequently observed in
the corpus. These results show that most of the insertion errors
made by native Japanese speakers are insertions of an article
before a noun. The other ones are conjunction or preposition
insertion errors.
The errors applied to more than one word were not used
in the previous work. Table III shows the multiple word
substitutions most frequently observed in the corpus. These
rules are also included in the revised error rules.
C. Generic error rules
The generic error rules of the previous work include insertion of article, contraction of words, singular/plural confusion
of nouns, comparative form of adjectives and misuse of tense
of verbs. Among these error rules, insertion of articles and
contraction rules are excluded from the generic error rules
because they are included in other than the generic error rules.
In addition, the following error rules concerning adjectives are
added.
• The rules for generating “base form”, “more + base form”
(e.g. more good) and “more + comparative” (e.g. more
better) from comparative form of adjectives.
• The rules for generating “base form”, “most + base form”
(e.g. most good) and “most + superlative” (e.g. most
biggest) from superlative form of adjectives.
These errors were frequently observed in the SST corpus.
D. Thesaurus-based word substitution rules
If the learners utter a wrong word and the word did not
appeared in the SST corpus, it is impossible to recognize
that word correctly because that word is not included in the
vocabulary of the recognizer. To solve this OOV (out-ofvocabulary) problem, we used WordNet [9] to reduce the OOV
by acquiring synonyms and hypernyms. We refer to these rules
as the thesaurus-based word substitution rules.
E. Generation of training text using the error rules
Fig. 1 shows an overview of N-gram training using the error
rules. We first apply all sorts of error rules according to the
483
TABLE V
E XPERIMENTAL CONDITIONS
Acoustic model
512-mixture 5-state HMM
Acoustic feature
MFCC, ∆MFCC, ∆∆MFCC,
∆pow, ∆∆pow
Decoder
Julius 4.1.2
Output probability of
0.2
error words
Number of generated texts 100,000/target sentence
utter English translation of those sentences. Prompts
for utterances were given using synthesized voice. In
addition to the memorized sentences, the learners were
also asked to translate Japanese sentences that were not
included in the previously presented target sentences.
Table IV shows an example of uttered sentences corresponding the target sentence “I’m an office worker. I work at a car
company.”.
B. Effect of the contraction rules
Fig. 1. Training of an N-gram from generated text
Freq.
6
2
1
1
1
1
TABLE IV
E XAMPLE OF UTTERED SENTENCES
Uttered sentences
I’m an office worker. I work at a car company.
I’m an office worker. I worked at a car company.
I’m an office worker. I work at car company.
I’m a worker. I work at a car company.
I’m a office worker. I work at a car company.
I’m a office worker. I worked at an car company.
output probability of error words. The error rules are applied
word by word in the following order.
• Insertion errors considering POS context
• Multiple-word substitution errors
• Corpus-based substitution and deletion errors
• Generic error rules or thesaurus-based word substitution
rules
If a word is as is after the corpus-based error rules are
applied, generic error rules or thesaurus-based word substitution rules are applied according to the probability of applying
thesaurus-based word substitution rules.
IV. E XPERIMENT
A. Data collection
To carry out an experiment under the condition similar to the
assumed CALL system, we collected utterances by Japanese
native speakers as follows.
1) English sentences (the target sentences) containing keywords were prepared.
2) Learners were asked to remember the target sentences.
No time limitation was set for the memorization task.
3) Japanese translations of the target sentences were presented to the learners, and the learners were asked to
We carried out an experiment to examine the effect of
revision of the contraction rules. In this experiment, we use
two sets of target sentences: one contains the original target
sentences, and the other one includes the sentences generated
by the contraction rules. The system performance was measured using the word accuracy of the recognized sentences.
The test data consisted of 126 sentences spoken by 15 students,
each of which includes contraction or contractible expression.
Table V shows the experimental conditions.
Word accuracy using the sentences generated by the proposed contraction rules was 84.88%, while that of the conventional method was 83.46%. The performance of the proposed
method that used multiple target sentences was better than
the method where the contraction rules were included in the
generic error rules.
C. Effect of revision of error rules
Next, we investigated the effect of revision of other types
of error rules: the corpus-based rules, the generic rules and
the thesaurus-based rules.
According to the experiment, word accuracy was not very
affected by the revision of the corpus-based and generic rules.
However, 6 words decrease in OOV was observed by applying
the corpus-based rules because of the insertion rules and the
multiple word substitution rules.
Next, we examined the effect of thesaurus-based word
substitution rules upon word accuracy and the number of OOV.
The test data included 441 sentences spoken by 15 students,
and other experimental conditions are same as shown in Table
V. Applying the thesaurus-based word substitution resulted in
decreasing word accuracy by 0.2. However, the number of
OOV decreased by 4 words.
D. Comparison of total performance
Next, we compared the conventional error rules and the
revised error rules. The performance was compared using word
accuracy as well as recall and precision of grammatical error
detection. The probability of applying thesaurus-based word
484
89
88
distinct sentence
acc[%]
87
86
85
84
83
82
10000
conventional
revised
0
0.1
0.2
0.3
0.4
output probability of error words
conventional
revised
1000
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
output probability of error words
0.5
Fig. 3. Number of distinct sentences with respect to error probability
Fig. 2. Word accuracy of the methods with respect to error probability
0.38
conventional
revised
0.36
0.34
precision ratio
substitution rules was 0.4 when we used revised rules. The
experimental results are shown in Fig. 2. It shows the word
accuracy with respect to the output probability of error words.
The results show that the best performance was obtained when
the revised rules were applied and the output probability of
error words was 0.05. From the result shown in Fig. 2, it is
found that the optimum probability for applying error rules
were different for each error rules. The optimum probability
for the revised rules was smaller than that for the conventional
rules. This difference seems to be caused by the fact that the
proposed method employs larger variety of error rules than the
previous method. Fig. 3 shows the number of distinct sentences
in the generated text. For the variation of the sentences
generated by the conventional rules was narrower than that by
the revised rules, we needed to generate more errors to obtain
sufficient number of distinct sentences when the conventional
rules were used. When the output probability of error words
was 0.05, the number of distinct sentences by the revised
rules was about 200 more than that by the conventional ones.
However, we observed deterioration of word accuracy, which
seems to be caused by too large vocabulary size.
Next, we evaluated the same results from the aspect of
recall and precision of error detection. A detection result was
regarded as a misdetection when any of the following were
wrong: word position, recognized word, or type of error. The
relationship between the recall and precision rate is shown
in Fig. 4. The results showed that the revised rules produced
better performance than conventional rules.
0.32
0.3
0.28
0.26
0.24
0.22
0.2
0.1
0.15
0.2
0.25
0.3
recall ratio
0.35
0.4
0.45
Fig. 4. Recall and precision results
by using more corpus data and improving the error rules.
ACKNOWLEDGMENT
The present study was supported in part by a JSPS Grantin-Aid for Scientific Research (B) 20320075.
V. C ONCLUSIONS
We have developed a method to recognize English utterances spoken by Japanese for development of a voiceinteractive CALL system. We prepared target sentences with
and without contractions. Then we applied the revised error
rules, which involved insertion errors or rules applied to
multiple words. The generated text using the proposed method
improved the recognition performance and recall/precision
ratio. Although we could improve the performance of recognition and error detection, the absolute performance of the
method is still low for applying an actual CALL system. We
need further improve ment of the performance of the system
485
R EFERENCES
[1] M. Eskenazi, “Detection of foreign speakers’ pronunciation errors for
second language training - preliminary results”, Proc. ICSLP, Philadelphia, 1465-1468, 1996.
[2] S. Nakagawa, N. Nakamura and K. Mori, “A statistical method of evaluating pronunciation proficiency for English words spoken by Japanese”,
IEICE Trans. Inf. and Syst., E87-D(7), 1917-1922, 2004.
[3] A. Ito, Y. Lim, M. Suzuki and S.Makino, “Pronunciation error detection
for computer-assisted language learning system based on error rule
clustering using a decision tree”, Acoust. Sci. and Tech., 28(2), 131133, 2007.
[4] M. Eskenazi, “Using automatic speech processing for foreign language
pronunciation tutoring : some issues and prototype”, Language Learning
and Technology, 2(2), 62-76, 1999.
[5] O. P. Kweon, A. Ito, M. Suzuki and S. Makino, “A grammatical error
detection method for dialogue-based CALL system”, Journal of Natural
Language Processing, 12(4), 137-156, 2005.
[6] A. Ito, R. Tsutsui, M. Ito and S. Makino, “Recognition of English
utterances with grammatical and lexical mistakes for dialogue-based
CALL system”, Proc. Interspeech, 2819-2822, 2008.
[7] Y. Tono, T. Kaneko, H. Isahara, T.Saiga and E. Izumi, “A 1 million-word
spoken corpus of Japanese learners of English and its implications for
L2 lexicography”, Proc. 2nd Asialex International Congress, 257-262,
2001.
[8] E. Brill, “A simple rule-based part of speech tagger”, Proc. ANLP-92,
3rd Conf. on Applied Natural Language Processing, 152-155, 1992.
[9] Princeton University, “WordNet”, http://wordnet.princeton.edu/
© Copyright 2026 Paperzz