Design and Implementation of a Spell Checker for Assamese.pdf

Design and Implementation of a Spell Checker for Assamese
Monisha Das
S. Borgohain
Juli Gogoi
Resource Centre for Indian Language Technology Solutions,
Indian Institute of Technology Guwahati
Email: {moni, samir, juli}@iitg.ernet.in
S. B. Nair
Dept. Computer Science and Engineering
Indian Institute of Technology Guwahati
North Guwahati – 781 039 Assam (India)
Email: [email protected]
Abstract
Spell Checkers form a vital ingredient of text
processors, character recognition systems, dictionary
search engines, language processing software and similar
tools. Though considerable work has been done in the
area for English and related
languages, the Indian
language scenario presents a relatively more complex and
uphill task. This paper describes strategies involved in the
implementation of a Spell Checker for Assamese, the
official language of the North Eastern Indian State of
Assam.
1. Introduction
Spell Checkers have become a vital ingredient of
high technology applications such as speech recognition
and generation, character or text recognition systems and
pen-based interfaces. Investigated as early as the
1960’s,computer aided spell checking techniques continue
to be one of the challenging areas in information
processing.
Though considerable work has already been carried
out in this field, especially in English and European
languages [1,2], the task of creating Spell Checkers
remains a massive exercise when we switch to a new
language. Building a Spell Checker calls for an exhaustive
study of several aspects of the language in question – the
construction of a morphological analyzer, a detailed
dictionary, rules for resolving inflexions and other
language dependent anomalies, to name a few. This
requires the intrusion of linguists into the domain of
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
Computer Science. A large number of Spell Checker
programs [3,4,5] are currently available for a wide variety
of languages.
With a view to standardize and build a uniform
environment for information processing in Indian
Languages, the Ministry of Information Technology of the
Government of India has set up several Resource Centres
nation wide. This paper describes the investigations
carried out by one such Centre to implement and deploy a
Spell Checker for Assamese. A multilingual computer
system, which permits users to interact with computers in
Assamese, will have far reaching consequences in a state
like Assam, where English is not spoken or understood by
the vast majority of people. The Spell Checker program for
Assamese aims at being a component of such a text and
natural language processing application.
2. Overview of spell checking applications
and spell checking techniques
A typical spell checking application, presents a list of
alternatives for each misspelled word encountered in a
document. The user in turn either selects one of these
words, or decides to retain and treat the current word as a
valid one. Some checkers also allow the user to add words
to the lexicon of correct words thereby enhancing the
vocabulary. Spell checking techniques can be broadly
classified into three categories.
2.1. Non-word error detection
This involves detection of non-words, meaning words
not present in the lexicon of valid words, or misspellings.
The most commonly used techniques to detect such
errors are N-gram analysis and Dictionary look-up.
The
latter
employs
efficient
dictionary
lookup/pattern-matching algorithms (such as hashing
techniques, tries, finite state automata, frequency ordered
binary search trees, etc.), dictionary partitioning schemes
and morphological processing techniques, whereas the
former often makes use of frequency counts or
probabilities of occurrence of N-grams in a large corpus of
text.
2.2. Isolated-word error correction
In this category, correction, usually in the form of
suggestion generation is performed without taking into
account the textual or linguistic context in which the
words appear. Minimum edit distance techniques [6,7],
Similarity key technique [8], Rule-based methods, Ngram [9] and Probabilistic and Neural Net techniques fall
into this category.
2.3. Context-dependent word correction
This takes care of real-word errors (errors that
result in another valid word) and non-word errors, which
have more than one potential correction. Traditional
Natural Language Processing (NLP) [10,11], and
Statistical Language Processing (SLM) form the two main
approaches being explored.
3. Spell Checkers - The Indian Language
Context
Investigating into techniques for building Spell
Checkers for European languages especially for those that
are represented in Roman script may now be looked upon
as a redundant task. The availability of virtually free
source code for several routine spell checking chores [12]
has proved to be a boon in the rapid development of such
software. The Indian language scenario is however far
different from that of its foreign counterparts. While it may
be easy to employ all techniques of Spell Checkers used
for English, its realization still requires much more effort.
Linguistic information is an understandable prerequisite
but the use of different fonts, conjuncts or juktaksharas –
the resultant of combining more than one character to
result in a different glyph, and the manner in which the
content is represented and stored make the problem of
evolving an integrated Spell Checker for Indian languages
a very complicated issue. A sizeable amount of work has
been carried out on realizing tools for text processing in
Indian languages of which Spell Checkers are an integral
part. I-Leap Office [13] from C-DAC is one such
environment that features text processing in various Indian
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
languages with spell checking for Gujarati, Hindi, Marathi
and Telugu. Isolated language Spell Checkers have
however evolved by efforts from both Linguists and
Computer Scientists. Indica [14], software that has made
DTP facilities available in Indian Apple Macintosh and
Windows, incorporates a spell checking program. Picatype
offers a Spell Checker for Hindi and Marathi that can be
used with Pagemaker and Microsoft Word. It has 50,000
Hindi words and 40,000 Marathi words embedded with
language specific rules and allows the user to create his or
her own dictionaries. Akruti Office (X Plus Series) is a
package that supports Hindi, Marathi, Gujarati, Tamil,
Telugu, Kannada, Malayalam, Oriya, Bengali and
Gurmukhi and has a Spell Check facility. Several others
are either in the process of development or in the
prototype-testing phase.
3.1. Assamese – The Script and the Language
Basically an Indo-Aryan language, Assamese [15]
has derived its phonetic character set and behavior from
Sanskrit. It is written using the Assamese script, which is a
derivative of Bangla. Assamese is written from left to right
and top to bottom, in the same manner as English. A large
number of ligatures are possible since potentially all the
consonants can combine with one another. Vowels can
either be independent or dependent upon a consonant or a
consonant cluster. The Assamese alphabet has consonant
letters, independent vowel letters, dependent vowel signs
(matras), punctuation and numerals. The Assamese
alphabet is almost identical to the Bengali alphabet except
for the letter » =TC? in Assamese, which is used in place
of Ì[ý Ì=TC? in Bengali, and the letter ¾ =YC? which is
used only in Assamese.
3.2. Grammatical features
(1) Personal markers used in various kinship terms in
connoting the age and rank of both the speaker and listener
form a unique feature of the languages spoken in NorthEastern India such as Assamese, Bodo, Karbi, and Mising.
For instance, in Assamese, in the phrase,
åTöç]ç»
åV=Töç»ç
[iPhonetic
pronunciation
:VWOCTF'WVCTC]meaning your father, TCis the personal
deictic or marker.
(2) The process of negation of verbs in Assamese is
another feature, which clearly demarcates it from the rest
i
The information following Assamese words or letters in this paper
comprises of the phonetic pronunciation expressed in the International
Phonetic Alphabet (http://www.arts.gla.ac.uk/IPA), depicted using the
SILDoulos IPA93 font.
of its sisters in new Indo-Aryan languages and other
Dravidian languages. In Assamese X =P? is attached to the
verb followed by a vowel, which is the exact copy of the
vowel of the first syllable of the verb, as in
Xç_çãG =PCNCIG? ‘ do not want’ (1st, 2nd, 3rd person)
×X×_ãFgç =PKNKMJ7? ‘ will not write’ (1st person)
The various negative markers in Assamese are:
A system of Suffix stripping has been used for the
development of a Morphological Analyser for Assamese.
It is based on two kinds of knowledge. The dictionary
Table 1. Some classifiers in Assamese.
Classifier
LX
[\nP]
Follows
A definite noun
(masculine gender)
Xç =PC? ×X =PK? XÇ [PW], åX =P'? and X =Pn?
(3) The use of plural suffixes is another feature of
Assamese. For instance, the entire bound forms such as
cg÷Tö =JnV? å[ýç» [DWT], ×[ý_çEõ [DKNCM], ]Fç [OQMJC],
LçEõ [\CM], aEõ_ [ZnMnN]
denote plurality and are suffixed to a noun or a pronoun.
(4) The extensive use of classifiers is another feature of
Assamese. For almost everything or every shape the
language uses a different classifier. Table 1 lists a few of
such classifiers.
(5) The classifiers are also combined with all types of
nouns and numerals occurring in the language resulting in
the following type of grammatical constructions.
A
['? + LX [\nP] +
classifier + noun)
]çXÇc÷
[OCPWJ] +
+ classifier)
A
]çXÇc÷
[OCPWJ] (numeral +
LX
[\nP] (noun + numeral
['] +
3.3. Morphological Analysis in the Indian Context
Indian languages, like many other languages of the
world have a relatively free word order. They also have a
rich system of case endings and post-positions
(collectively called vibhakti). The majority of grammar
frameworks are designed for English and other positional
languages. As far as morphological processing of Indian
languages is concerned, the team at the Department of
Computer Science and Engineering, Indian Institute of
Technology Kanpur has already initiated some work. They
have adopted the Paninian Grammar [16] approach, for
morphological processing of languages such as Hindi,
Telegu, Kannada, Marathi, Bengali and Punjabi.
3.4. Morphological Analysis of Assamese used for
spell checking
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
LXÝ =\nPK? G»EõÝ
=InTCMK?
æ$Oôç =V7? $Oôç =VC?
×$Oô =VK? FX [MJnP]
A definite noun
(feminine gender)
A common noun
usually an object
contains knowledge either about a stem or a word chosen
to be the reference form. The dictionary also contains
information about the syntactic category and the
grammatical features of the word. All the possible
inflections that can occur are added to the knowledge base.
The algorithm for stripping is more complex than that for a
Finite State Automaton. At the top level, it consists of
finding a rule to be triggered, matching the suffix given in
the rule with the suffix of the input word, and then
substituting the new suffix to give a base word to look up
in the dictionary. The same technique is again applied for
the prefixes to derive the root word.
For example, for the word æ$K÷ç¾ç_Ýã[ýç»
=U7CNKD7T?meaning ‘the girls’, the morphological
analyzer returns the root word æ$K÷ç¾ç_Ý =U7CNK?meaning
'girl'. Similarly for the word [ý'_ç =DQNC?meaning 'let us
go' the root word [ý'_ =DQN?meaning ‘come with me’ is
returned and for the word
_GTö =NnInV?meaning
’accompanying ’ the root word _G =NnI?meaning
‘accompany’ or ‘join’ is returned.
3.5. Soundex encoding scheme for Assamese
The Soundex method [17,18] of suggestion
generation for Isolated-word error correction falls in the
category of Similarity key techniques. The basic idea here
is to map every word into a key so that similarly spelled
words will have identical or similar keys. Thus, when a
key is computed for a misspelled string it will provide a
b) Soundex method: The Soundex code of the misspelled
word is generated, according to the coding scheme
devised.
link to all similarly spelled words (candidates) in the
dictionary.
The advantage of this method is that the search speed
is improved since it is not necessary to directly compare
the misspelled word with every other word in the
dictionary. However, complexities emerge in the encoding
scheme for Assamese, where matras, and conjunct letters
have to be accounted for, rather than just vowels and
consonants, as in English.
A Soundex encoding scheme for Assamese has been
used which has a set of encoding rules, and up to 14
numerical codes, one code each for a particular group of
Assamese letters. This scheme has been evolved along the
lines of the Soundex encoding scheme for English, with
special application to the complexities of Assamese. This
scheme accounts for all Assamese characters, including
juktaksharas (conjunct letters) and matras. Further subcategorization of the juktaksharas into more groups with
code numbers of up to 23 has also been worked out,
though not used as yet.
One of the Soundex encoding rules, which have been
used, is as follows: The first letter of the word is retained
as the first letter of the Soundex code. If a matra is
attached to the first letter, it is also retained. Matras at all
later positions are discarded.
For example, for the word
æ$K÷ç¾ç_Ýã[ýç»
=U7CNKD7T?meaning 'the girls', the letter
$K÷ =U?along
with the matra å ç =7?is retained as the first portion
of the Soundex code: æ$K÷ç =U7?.A few Assamese words,
along with their Soundex codes are listed in Table 2.
An approximate match is performed with the Soundex
codes of all valid words present in the dictionary. A list
comprising of words Soundex codes of which closely
match those of the misspelled word is built.
c) Edit-distance: By applying the four editing operations,
which commonly generate typographic errors, i.e. addition,
deletion, substitution and transposition of letters, another
list of suggestions is arrived at.
Using the minimum edit distance technique, the candidate
suggestions thus obtained are ranked.
3.6. Spell checking strategy for Assamese
4. Implementation details
An entire document containing Assamese text with
possible misspellings is first tokenized, and each token or
word is checked for spelling errors. If an incorrect spelling
is found, suggestions for rectifying it are generated. The
modules that perform the above tasks are(1) Error detection: For non-word error detection a
dictionary based approach has been used, wherein the
whole word is checked for its presence in the dictionary.
The word is simultaneously forwarded to a morphological
analyzer for morphological processing. If the word is
found in the dictionary, the word is assumed to be correct,
and the output of the morphological analyser is ignored.
(2)Suggestion generation: If the word is not found in the
dictionary, the following Isolated-word error correction
techniques are used and a list of suggestions arrived at:
a) Morphological Analysis: The morphological analyzer
processes the word and delivers the root word, along with
a list of possible valid suffixes and prefixes. Out of these
affixes, the ones that closely match the affixes of the
misspelled word are selected and by attaching these affixes
to the root word a list of suggestions is arrived at.
For detection of misspellings, a dictionary-based
method has been used. A dictionary of about 5000
Assamese words has been used, and hashing has been
used for quick access to the data stored in the dictionary.
Tests have been conducted on sample input files
containing Assamese text with misspellings, and response
times have been found to be about reasonably good for a
document with 1000 words. For suggestion generation the
Soundex code of the misspelled word is generated, and an
approximate matching is carried out with codes of the
valid words stored in the dictionary. Error detection and
correction methods have been implemented using Perl. A
detailed description of each of the modules involved in
detection and correction follows.
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
Table 2. Simple words and their soundex
codes.
Word
Soundex code
]»]=OnTnO?meaning
]65
×[ýVî]çX =DKF[COCP?
×[ý355
Zgõ×c÷Ì^çã_ãcg÷ãTöX
Zõ794735
affection
meaning alive
[RJnJKCN'J'V'P],
meaning would have
analysed
4.1. Non-word detection module
The dictionary based non-word detection module,
reads an entire document containing Assamese text with
misspellings, compares each word in the document with
words present in the dictionary (lexicon of valid words),
detects words not present in the dictionary i.e.
misspellings, and creates an output text document.
4.2. Soundex code generation module
The output of the non-word detection module
(misspelled Assamese word) is fed to the Soundex code
generation module, which computes the Soundex code of
the misspelled word according to the encoding rules
worked out. For example, for the misspelling %Eõ_`×»Ì^ç
(Correct word: %Eõ_`»ÝÌ^ç =nMnNnZTKC? meaning alone)
a code
%24269 is generated.
4.3. Isolated-word error correction module
The isolated-word error correction module reads the
Soundex code for the misspelled word generated by the
above module and, performs an approximate match to find
similar Soundex codes in the dictionary (each Assamese
word, along with its Soundex code is stored in the
dictionary). The valid words corresponding to those
similar Soundex codes are returned as suggested
corrections. Hashing has been employed for fast access of
codes stored in the dictionary. The minimum edit distance
method is then employed for ranking the suggested
corrections. For example, consider the Assamese text in
Figure 1 that contains some misspelled words. The
detection module detects the misspelled words, while the
Soundex code generation module generates the appropriate
codes that aid in preparing a list of suggestions. The output
is summarized in Table 3.The suggestion generation
routine uses the Soundex codes to generate correct
alternatives for the misspelled words. Table 4 depicts the
suggestions generated. It may be noted that the correct
words also feature in these lists of suggested corrections.
5. Conclusion and Future directions
Strategies for developing a Spell Checker for
Assamese have been discussed. The checker initially goes
for a dictionary look up. A suggestion generator provides
alternatives in case the look up fails to find the word in the
dictionary. Suggestion generation uses three techniques
and ranks the suggestions before presenting them to the
user. Studies on text documents with more than 5000
words including juktaksharas have revealed the
satisfactory performance of the Spell Checker. Integration
of the Spell Checker application with the Assamese to
English online-dictionary developed at the Resource
Centre for Indian Language Technology Solutions, Indian
Institute of Technology Guwahati is also being
investigated. The on-line dictionary allows users to choose
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
between one of the two languages, enter a word and search
for its equivalent in the other language.
]+ EõUçä$Oôç aÇWýã_gç * ]+ TöçEõ EõUçä$Oôç aÇ×Wýã_gçãcg÷TöX * ×a
Eõç³V[ý å^X _ç×Gä$K÷ * TöçEõ EÇõEÇõã» Eõçã]ç»ã_ * ]+ $Jôç×EõFX
L_ã_gç * ]+ GgçTöä$Oôç Fç×³V$K÷ã_gç * ×YPöç×F×X ]+ãÌ^+
FÇ³V×$K÷ã_gç *
]ç×_ãÌ^ Yç×X $Rôç×_ YÇ×_ã[ýç» LÝÌ^ç+ »Fç %ç»Ó QöçIø» Eõ»ç
[ýçã[ýãc÷ a]Ì^Tö ZÇõ_×X ZÇõã_ã» =Y×$Jô Yã» * ^×V ZÇõ_»
m×»Tö YçXÝ $Rô_ç Xc÷Ì^ , ZÇõ_ YÇ×_ £Eõç+ å_ã»×_ ^ç[ý,
AYçãc÷ç ZÇõ_ TöçTö XÇZÇõã_ * ]çXÇc÷» LÝ¾Xä$OôçC AãEõçãLçYç
ZÇõ_ YÇ×_» ×X×$JôXç * »'V» TöçYTö ZÇõ_ YÇ×_ £Eõç+ å^ç¾ç»
Vã» aeaç»» XçXç HçTö - YÒ×TöHçTöTö ]çXÇãc÷ç ×[ý[ýÐTö éc÷ Yã»
* VÇF - Eõrô , %aÇF , %`ç×Ü™öãÌ^ ]çXÇc÷Eõ c÷Töç` Eõã» *
åTöãX %¾ºc÷çTö aeaç»» YÒ×Tö ]çXÇc÷» å]çc÷ Eõ×] å^ç¾ç
Ø‘öç\öç×¾Eõ - LÝÌ^ç+ UEõçä$Oôç ×[ýQÍö¶‘öXç» ×X×$JôXç éc÷ Yã» * ]+
EõUçä$Oôç Zgõ×c÷Ì^ã_gç *
Figure 1. Assamese text with some misspellings
Work on integrating the Spell Checker with the
online dictionary, as a CGI program resident on the Web
Server is being carried out.
The dictionaries, which comprise the databases,
contain information on the word, its meaning, grammatical
category, transliteration, pronunciation, antonyms and
synonyms. Additional information containing the Soundex
code of all Assamese words will help enhance the speed of
spell checking. A list of all possible Assamese bigram
sequences has been compiled, and possible updates are
being made. Code for detection of misspelled words using
a bigram lookup has also been developed. Incorporation of
a rule-base for spell checking [19,20] and accounting for
keyboard adjacency factors, are some of the features that
may be considered for enhancing this implementation. For
example, as a rule, only S =P? can appear after the letters
@ =TK? » =T? b =Z? and not X =P?This means if the
words
=TKP? TÊöS =VTKPn?ѽ are misspelled as @X
=TKP? and TÊöX =VTKPn?respectively, these words may
be detected as misspellings by the program that
implements this rule.
@S
Table 3. Correcting words using the soundex algorithm.
Misspelled word
Soundex code of
misspelled word
Correct word
Phonetic pronunciation of correct
word/meaning
RJnJKCN7have analyzed
Zgõ×c÷Ì^ã_gç
Zõ794
Zgõ×c÷Ì^çã_gç
aÇWýã_gç
aÇ34
aÇ×Wýã_gç
ZWFJKN7 asked
aÇ×Wýã_gçãcg÷TöX
aÇ34735
aÇ×Wýã_gçãcg÷ãTöX
ZWFJKN7J'V'Pwould have asked
Eõç³V[ý
Eõç31
Eõç×³V[ý
MCPFKDn will cry
Eõçã]ç»ã_
Eõç564
Eõçã]ç×»ã_
MCO7TKN7bit
L_ã_gç
L44
L_çã_gç
\DnNCN7 have lit
Fç×³V$K÷ã_gç
Fç324
Fç×³V×$K÷ã_gç
MJCPFKN7had dug
FÇ³V×$K÷ã_gç
FÇ324
FÇ×³V×$K÷ã_gç
MJWPFKUKN7had pound
Table 4. The soundex codes and the suggestions.
Soundex code
Suggestions generated
Zõ794
Zgõ×c÷Ì^çã_gç Zgõ×c÷Ì^ç×_ Zgõ×c÷Ì^çã_ Zgõ×c÷Ì^ç_ç
aÇ34
aÇ×Wýã_gç aÇ×Wý_ç aÇ×Wý×_ aÇ×Wýã_
aÇ34735
aÇ×Wýã_gçãcg÷ãTöX aÇ×Wý_çãcg÷ãTöX aÇ×Wý×_ãcg÷ãTöX
Eõç31
Eõç×³V[ý Eõç×³V[ýç Eõç×³V×[ý
Eõç564
Eõçã]ç×»ã_ Eõçã]ç×»ã_gç Eõçã]ç×»_ç Eõçã]ç×»×_
L44
L_çã_gç L_ç_ç L_ç×_ L_çã_
Fç324
Fç×³V×$K÷ã_gç Fç×³V×$K÷_ç Fç×³V×$K÷×_ Fç×³V×$K÷ã_
FÇ324
FÇ×³V×$K÷ã_gç FÇ×³V×$K÷_ç FÇ×³V×$K÷×_ FÇ×³V×$K÷ã_
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
Acknowledgements
[14] Indian Language Software from Lingua. [Online].
Available: http://www.lingua-uk.com/1indian.htm
The authors wish to express their acknowledgements
to the Ministry of Information Technology, Government of
India, for the funding made available for this work, as
also to all the Investigators and Personnel involved in this
project.
[15] Assamese Design Guide. [Online].
Available:
http://www.iitg.ernet.in/rcilts/newassamesedesign.pdf
References
[1] Karen Kukich, “Techniques for Automatically Correcting
Words in Text”, ACM Computing Surveys Vol. 24, No.4 (Dec.),
1992, pp. 377-439.
[2] SPELLINK 2002. [Online].
Available: http://www.trantor.fi/Spellink_2002.htm
[3] AutoSpell-Award winning spell check products. [Online].
Available: http://www.spellchecker.com/
[4] Spell check software for web pages: spell checker SDKs for
custom written programs. [Online].
Available: http://www.spelling-software.com/
[5] SpellOnline: International spell checking for the web.
[Online]. Available: http://www.spellonline.com/
[6] F. J. Damerau, “A technique for computer detection and
correction of spelling errors”, Communications of the ACM
Vol.7, No. 3(Mar.), 1964, pp.171-176.
[7] Gonzalo Navarro, “A guided tour to Approximate String
Matching.” ACM Computing Surveys Vol.33, No.1 (Mar.), 2001,
pp. 31-88.
[8] J. J. Pollock, and A. Zamora, “Automatic spelling correction
in scientific and scholarly text”, Commun. ACM Vol. 27, No.4
(Apr.), 1984, pp.358-368.
[9] J.R. Ullmann, “A binary n-gram technique for automatic
correction of substitution, deletion, insertion, and reversal errors
in words”, Computer Journal Vol. 20, No.2, 1977, pp.141-147.
[10] Iwanska and Shapiro, Natural Language Processing and
Knowledge Representation, Universities Press (India) Limited,
Hyderabad,India,2001.
[11] D. Jurafsky, and J.H. Martin, An Introduction to Natural
Language Proccessing, Computational Linguistics and Speech
Recognition, Prentice Hall Inc, New Jersey, U.S.A., 2000.
[12] CPAN Modules. [Online]. Available:
http://www.cpan.org/modules/01modules.index.html
[13] C-DAC: GIST - Products - iLEAP. [Online].
Available:
http://www.cdacindia.com/html/gist/products/ileap.asp
Proceedings of the Language Engineering Conference (LEC’02)
0-7695-1885-0/02 $17.00 © 2002 IEEE
[16] Bharati, Akshar, Chaitanya, Vineet, Sangal, and Rajeev,
Natural Language Proccessing A Paninian Perspective, Prentice
Hall of India Private Limited, Connaught Circus, New Delhi –
110001, (Mar.), 1999.
[17] Info on the Soundex Algorithm. [Online].
Available:
http://www.bluepoof.com/Soundex/info.html
[18] Soundex Conversion Program. [Online].
Available:
http://searches.rootsweb.com/cgibin/Genea/soundex.sh
[19] Hem Chandra Barua, Hem Kosha (The Assamese-English
Dictionary) Edited and Published by Debananda Barua, Eleventh
Edition Enlarged, 2000.
[20] Nagendranath Choudhuri, Asamiya Vyakaran, Lawyers
Book Stall, Panbazar, Guwahati.

Download Report

Design and Implementation of a Spell Checker for Assamese.pdf

Paperzz.com

Your Paperzz