Analysis of Noun, Pronoun and Adjective

International Journal of Computational Linguistics and Natural Language Processing
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
Analysis of Noun, Pronoun and Adjective Morphology
for NLization of Punjabi with
EUGENE
Harinder Singh
Parteek Kumar
Computer Science & Engineering Department
Thapar University, Patiala, India
[email protected]
Computer Science & Engineering Department
Thapar University, Patiala, India
[email protected]
Abstract—Morphological analysis of various parts of speech is an
important activity in order to design a machine translation system
for a language. This paper describes morphological analysis of
Punjabi nouns, pronouns and adjectives for developing Universal
Networking Language (UNL) based Machine Translation (MT)
system for this Language. All headwords which are involved in
UNL-to-NL dictionary always have some attributes and relation
label, by using them, system is able to generate the sentence very
close to its natural form. The phonetic properties of the language
are handled by the noun, adjective, pronoun and verb morphology
of the NLizer like, EUGENE (dEep-to-sUrface GENErator). This
application software makes use of inflection paradigms during the
modification process. These inflection paradigms are designed on
the basis of analysis of Punjabi morphology, that is, EUGENE
made morphological modifications of headword to various forms on
the basis of its gender, number, person, case, paradigm information
suggested by its attributes, relation, etc. semantic information. In
this paper, the categories of morphology that have been identified
for the purpose of conversion of UNL expression to equivalent
Punjabi language sentences and explanation and implementation of
noun, adjective, pronoun and verb morphologies are described in
detail.
Keywords: Universal Networking Language, EUGENE,
Inflection Paradigm, Morphology, Machine translation, UNL
expression.
I.
INTRODUCTION
Morphology involves the mapping of headwords, which
are identified from the UNL expression which are present in
UNL-to-NL dictionary, to their more natural meanings.
Headwords words are changed (something is added or removed)
to get proper sense of words in terms of gender, number, tense,
aspect and modality [1].
UNL is an Interlingua for knowledge representation in
the context of machine translation. UNL is an electronic
language for computers to express and exchange information [2].
Three building blocks of UNL are (1) Semantic Relations, (2)
Attributes and (3) Universal Words. UNL representation of a
sentence is expressed in the form of a semantic net called UNL
graph. Consider the following English sentence for its UNL
representation:
‘Sukwinder played football in the garden’
It’s UNL representation shall be:
{unl}
agt(play:01.@past, Sukhwinder:02)
obj(play:01.@past, football:03)
plc(play:01.@past, garden:04)
{/unl}
In this expression, agt (agent), obj (object) and plc
(place) are the semantic relations and the words play,
Sukhwinder, football, and garden are the Universal Words
(UWs). UWs are language words. UWs can be annotated with
attributes like number, tense, etc., which provide further
information about how the concept is being used in a specific
sentence.
B. UNL Attributes for Representation of Information
UNL attributes are used to describe the subjectivity
information of sentences. These store the information about what
is said from the speaker’s point of view. UNL has 87 primary
attributes (this number can be augmented by user defined ones)
to express the semantic content of a sentence [3][4]. Table 1
describes some of the UNL attributes used to represent the
knowledge extracted from an input sentence.
A. UNL Based Machine Translation System: The Framework
Harinder Singh et.al.
436
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
Table 1: UNL Attributes
UNL Attributes
Concept
Time with respect to the speaker
@past, @present, @future
Speaker's view of Reference
@generic, @def, @indef
Speaker’s view of Quantity
@multal, @extra
Speaker’s view about Numbers
@pl
Speaker’s view about Gender
@male, @female
Speaker’s view about aspect
@habitual, @perfective and @progressive
C. ROLE of Morphology in UNL Based MT System
UNL based MT can be done by two tools given on
UNDL foundation’s website, which are, IAN (Interactive
ANalyzer) for UNLization and EUGENE (dEep-to-sUrface
GENErator) for NLization. The process of converting a source
language (natural language) expression into the UNL expression
is referred to as UNLization and the process of converting UNL
expressions into a target language representation is called
NLization. The process of UNLization consists of four main
stages: parsing of input sentence, extraction of universal words
from bi-lingual dictionary, resolution of UNL relations and
generation of UNL attributes [5][6]. NLization process consists
of three main stages: morphological generation of lexical words,
function word insertion and syntax planning [3]. In order to
perform morphological generation, analysis of morphology of a
language is very important. This has been attempted for Punjabi
language in this paper.
II.
III.
RELATED WORK
Analysis of Punjabi grammar has earlier been carried
out by Chander and Duni in 1964, Harjeet Singh Gill in 1986,
Harkirat Singh in 1988, Puar and Joginder Singh in 1990 and
Aditya Joshi in 2000. Their studies form the basis for the Natural
Language Processing (NLP) systems for Punjabi language.
Mandeep Singh Gill in 2008 has developed a rule based part of
speech tagger and morphological analyzer and generator for
Punjabi. With the help of Gurpreet Singh Lehal in 2008, he has
also developed a grammar checker for Punjabi language [1].
Analysis of Hindi grammar for part of speech tagger
has been performed by Debasri Chakrabarti and Pushpak
Bhattacharyya in 2002. The Tamil morphology analysis for
development of Tamil EnConverter for EnConversion of Tamil
UNL has been performed by Dhanabalan in 2002 [1].
Harinder Singh et.al.
Bengali morphology has been analyzed with respect to
UNL for Bengali EnConverter by Md. Nawab Yousuf Ali, S. M.
Abdullah Al-Mamun, Jugal Krishna Das and Abu Mohammad
Nurannabi in 2008. UNL based analysis and generation of
Bengali case structure constructs have also been performed by
Kuntal Dey and Pushpak Bhattacharyya in 2005. Arabic
grammar generator has been proposed for the development of
Arabic MT System based on UNL by Magdy Nagi, Noha Adly
and Sameh Alansary in 2009. Hindi grammar has been analyzed
to create UNL based MT system for Hindi language. Hindi
generation rules have been created for Hindi DeConverter by
Vijay Dwivedi in 2002 and Ajay Nalawade in 2007. The UNL
generation rules for Hindi EnConverter have been created by
analyzing Hindi grammar by G. Giri and Leena in 2000 and
Sachi Dave and Pushpak Bhattacharyya 2001 [1]. The relevant
work on Punjabi language also leads to establish that UNL
related work has not been done for Punjabi language.
MORPHOLOGY
In this section, it is explained that how a HW in the
UNL expression can be converted so that generated sentence in
natural language should be very close to its natural form using
the EUGENE framework. There are three categories of
morphology that have been identified for the purpose of
conversion of UNL expression to equivalent Punjabi language
sentences [1]. These are:



Attribute label resolution morphology
Relation label resolution morphology and
Noun, adjective, pronoun and morphology.
A. Attribute label resolution morphology
Attribute label morphology deals with the generation of
Punjabi words on the basis of UNL attributes attached to a node
437
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing
and its grammatical attributes retrieved from UNL-NL
dictionary and words are changed in this phase depending on
their Gender, Number, Person, Tense, Aspect (GNPTA) and
vowel ending information. Punjabi has ten vowels, represented
as ਾ (ਆ ā), ਿਾ (ਇ i), ਾ (ਈ ī ), ਾ (ਉ u ), ਾ (ਊ ū), ਾ (ਏ ē), ਾ (ਐ
ai), ਾ (ਓ ō), ਾ (ਔ au) and ਮਕਤ muktā (ਅ a) which has no sign.
Vowels other than ਅ a (ਮਕਤ ) muktā are represented by
accessory signs written around (i.e., below, above, to the right or
to the left) the consonant signs, popularly known as signs for
matras [7]. The attribute label morphology also deals with the
generations of articles in the target language. For example,
definite articles (typically arise from demonstratives meaning
‘that’)
are represented in UNL expression by
‘@def’attribute and it results the generation of word ਉਹ uh
‘that’ in Punjabi. Similarly, indefinite articles (typically
arise from adjectives meaning ‘one’) are represented by
‘@indef’ attribute and this results into the generation of
Punjabi word ਇੱਕ ikk or nothing depending upon the number of
the words it qualifies in the attribute label morphology [8]. But
in this paper, only the attribute label morphology for pronoun,
nouns and adjectives is focused.
B. Relation label resolution morphology
The relation label morphology manages the
prepositions in English or postpositions in Punjabi, because
prepositions in English are similar to postpositions in Punjabi.
These link noun, pronoun, and phrases to other parts of the
sentence. Some Punjabi postpositions are ਨ nē, ਨੂੰ nūṃ, ਉੱਤ uttē
‘over’, ਦ dā ‘of’, ਕਲੋਂ kōlōṃ ‘from’, ਨੜ nēḍaē
‘near’, ਲ ਗ lāgē ‘near’ etc. In Punjabi, postpositions
follow the noun or pronoun unlike English, where these precede
the noun or pronoun, and thus termed prepositions [1]. Insertion
of these words in the generated output depends upon the
information encoded in the UNL relations of a given UNL
expression. In relation label morphology, most UNL relation
labels introduce postpositions (also known as function words or
case markers) between child and the parent node during the
generation process. The generation of these words depends upon
UNL relation and the conditions imposed on parent and child
nodes’ attributes of the UNL relation. For the generation of these
words a rule base has been prepared. Let us illustrate this
concept with an example English sentence given in (1.1) and its
equivalent Punjabi sentence given in (1.2). The UNL expression
for this example sentence is given in (1.3).
The boy translated the sentence from English to Punjabi. …(1.1)
ਮੂੰਡ ਨ ਅੂੰਗਰਜ ਤੋਂ ਪੂੰਜ ਬ ਿਿਚ ਿ ਕ ਅਨਿ ਦ ਕ ਤ ।
…(1.2)
muṇḍē nē aṅgrēzī tōṃ pañjābī vic vāk dā anuvād kītā.
{unl}
agt(translate:01.@past, boy:02)
src(translate:01.@past, English:03)
Harinder Singh et.al.
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
gol(translate:01.@past, Punjabi:04)
obj(translate:01.@past, sentence:05)
{/unl}
…(1.3)
Here, the case markers ਨ nē, ਤੋਂ tōṃ ‘from’ , ਿਿਚ vic
‘to’ and ਦ dā are inserted in the morphed words due to the
presence of UNL relations ‘agt’, ‘src’, ‘gol’ and
‘obj’, respectively, in the UNL expression given in (1.3).
C. Noun, pronoun, adjective morphology
With attribute and relation label morphology, the system
is able to generate the sentence very close to its natural form.
The phonetic properties of the language are handled by the noun,
adjective, pronoun and verb morphology of the EUGENE.
In EUGENE framework special inflection paradigms
are written to generate the sentences close to their natural form.
Inflectional paradigms are sets of rules that are used to generate
inflections out of the base forms. In the dictionary, only base
forms (ਕਿਤਾਬ kitāb book ) are stored as follow:
[ਕਿਤਾਬ]{}"book"
(LEX=N,POS=NOU,NUM=SNG,GEN=FEM,PAR=M2);
…(1.4)
And the inflection (ਕਿਤਾਬਾਂ kitābīāṃ books) is generated
through rules. The decision, which rule is to be used for making
inflection to particular word is made by paradigm number (gray
shaded part ‘PAR=M2’) referred in the dictionary entry of the
word. These inflections are of A-rule(affixation rule) [9] type.
A-rule is the formalism used for generating affixes (prefixes,
suffixes, infixes) as follow:

prefixation
CONDION := “ADDED” < DELETED;

suffixation
CONDION := DELETED > “ADDED” ;

infixation
CONDITION := [REFERENCE] > “ADDED”
CONDITON := “ADDED” < [REFERENCED]

replacement
CONDITION := DELETED : “ADDED”;
Where:

CONDITION = tag (such as “PLR”, “FEM”, etc)
or list of tags (“FEM&PLR”) that indicates when
the rule should be applied.
438
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing

ADDED (between quotes) = the string to be added ;

REFERENCE (between square brackets) = the reference

Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
kōlōṃ’, ‘ਮਰ ਲ ਗਓਂ’ ‘mērē lāgiōṃ’ etc. for singular
number and to ਅਸੀਂ asīṃ, ਸ ਨੂੰ sānūṃ, ਸ ਥੋਂ sāthōṃ, ‘ਸ ਡ ਕਲੋਂ ’
‘sāḍē kōlōṃ’, ‘ਸ ਡ ਲ ਗਓਂ’ ‘sāḍē lāgiōṃ’ etc. for plural
string (between quotes) or the position (without quotes) of
number depending on the case information of the sentence [1].
the string to be added;
2) Inflection of pronouns on the basis of UNL relation
label and tense.
In a UNL based NLization, pronouns are also inflected
on the basis of UNL relation labels and the tense information
provided by UNL attributes of UW1 used in a UNL expression
like rel(UW1,UW2).
In all below example UNL sentences UW2
‘00:01.@3.@male’ (third person male pronoun) is used and its
paradigm number is defined to be ‘PAR=M7’ in its dictionary
entry (1.3), Thus paradigm rule (2.1) will be fired to modify the
HW depending upon the attribute it has.
DELETED = the string (between quotes) or the number of
characters (without quotes) to be deleted.
D. Steps for creating inflectional paradigms
1) Determine the inflectional categories for the part-ofspeech. The inflectional categories describe the differences
between the possible forms of the same headword.
2) The same part-of-speech may involve different
inflectional categories. In Punjabi, for instance, some nouns,
such as ਕਿਤਾਬ kitāb book, only inflect in number (SNG and
PLR); other nouns, such as ਗੱ ਡੀ gāddī car, inflect in number and
in gender (MCL&SNG,MCL&PLR,FEM&SNG,FEM&PLR).
3) Rules are not cumulative. Combine inflectional
categories in one same condition because it's not possible to
apply rules sequentially. For instance, it's not possible, in
(%x,M7):= (%x,M7,+FLX(SNG&BEN&PAS:=1>"ਸ ਨੂੰ ";
SNG&AGT&PAS:=1>"ਸ ਨੇ";
SNG&POS&MCL&^DET:=1>"ਸ ਦਾ";
SNG&POS&FEM&^DET:=1>"ਸ ਦੀ";
Punjabi, to write simply FEM:=0>"ੀੀ"; and PLR:=0>"ਆਂ";. It's
PLR &POS&MCL&^DET:=1>"ਸ ਦੇ";
necessary to write FEM&PLR:=0>"ੀੀਆਂ";. This happens
PLR&POS& FEM&^DET:=1>"ਸ ਦੀਆਂ";
because, for the time being, it's not possible to tell the machine
in which order the rules should be applied, i.e., the result could
PLR&BEN&PAS&DET:=0>"ਨਾਂ ਨੂੰ ";
be "ਗੱ ਡਆਂੀੀ" instead of "ਗੱ ਡੀਆਂ", if we define the number and
the gender separately.
The complete inflection paradigm for one or more
words (belonging to same inflection categories for part-ofspeech) looks like (1.5).
(%x,M7):=(%x,M7,+FLX(Atrbt1&Atrbt2&…:=0>"strg1"
;Atrbt3&Atrbt4&…:=2>"strg2";...;));
…(1.5)
a.
Pronoun morphology
Pronouns are inflected by case, tense, number and
gender information in a sentence. It has been observed that
pronoun morphology also depends on UNL relation labels. The
pronoun morphology for Punjabi language is presented in this
section.
1) Inflection of pronouns on the basis of case and
number.
Pronouns are inflected by case and number, e.g.,
personal pronoun ਮੈਂ maiṃ ‘i’, changes its form to ਮਨੂੰ
mainūṃ, ਮਥੋਂ maithōṃ, ‘ਮਰ ਲਈ’‘mērē laī’, ‘ਮਰ ਕਲੋਂ ’ ‘mērē
Harinder Singh et.al.
PLR&PAS&DET:=0>"ਨਾਂ ਨੇ";
SNG&POS&MCL&DET:=0>"ਨਾਂ ਦਾ";
SNG&POS&FEM&DET:=0>"ਨਾਂ ਦੀ";
PLR &POS&MCL&DET:=0>"ਨਾਂ ਦੇ";
PLR&POS&FEM&DET:=0>"ਨਾਂ ਦੀਆਂ";));
…(2.1)
Examples of ‘agt’, ‘ ben’ and ‘ pos’ Relations.
 Example English sentence with past tense: He ate apples.
…(2.2)
UNL expression for this example sentence is given in
(2.2).
{unl}
agt(eat:03.@past, 00:01.@3.@male)
obj(eat:03.@past, apple:03.@pl)
{/unl}
…(2.3)
Equivalent Punjabi sentence: ਉਸ ਨ ਸਬ ਖ ਧ ।
…(2.4)
Transliterated Punjabi sentence: us nē seb khādhē.
 Example English sentence with singular number: He gave
a book to him.
…(2.5)
439
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing
UNL expression for this example sentence is :
{unl}
agt(give:03.@past, 00:05.@3.@male)
obj(give:03.@past, book:03)
ben(give.@past, 00:01.@3.@male)
{/unl}
…(2.6)
Equivalent Punjabi sentence: ਉਸ ਨੇ ਉਸਨੂੰ ਕਿਤਾਬ ਕਦੱ ਤੀ ।
…(2.7)
Transliterated Punjabi sentence: us nē us nūṃ kitāb dittī.
 Example English sentence: I ate his fruits.
...(2.8)
UNL expression for this example sentence is,
{unl}
agt(eat:03.@past, 00:05.@1)
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
obj(eat:03.@past, fruit:02.@pl)
pos(fruit:02.@pl, 00:01.@3.@male)
{/unl}
Equivalent Punjabi sentence: ਮੈਂ ਉਸ ਦੇ ਫ਼ਲ ਖਾਧੇ ।
...(2.9)
…(2.10)
Transliterated Punjabi sentence: maiṃ us dē fal
khādhē.
Following table depicts different inflections can
be made to [ਉਹ]{}"00.@3" and [ਉਹ]{}"00.@3.@pl"
by using
the same (2.10) paradigm rule.
Table 2: Third person pronoun like he, she, they.
S. No.
Attributes
Transformed to
[ਉਹ]{}"00.@3" (male/female)
[ਉਹ]{}"00.@3.@pl"
1.
1 SNG,BEN,PAS,^TST
ਉਸਨੂੰ
ਉਹ
2.
2 SNG,PAS,TSTD
ਉਸਨੇ
ਉਹ
3.
SNG,POS,MCL, ^DET
ਉਸਦਾ
ਉਹ
4.
SNG,POS,FEM,^DET
ਉਸਦੀ
ਉਹ
5.
PLR ,POS,MCL, ^DET
ਉਸਦੇ
ਉਹ
6.
PLR,POS, FEM,^DET
ਉਸ ਦੀਆਂ
ਉਹ
7.
PLR,BEN,^TST, DET
ਉਸ
ਉਹਨਾਂ ਨੂੰ
8.
PLR,PAS,TSTD,DET
ਉਸ
ਉਹਨਾਂ ਨੇ
9.
SNG,POS,MCL,PAS, DET
ਉਸ
ਉਹਨਾਂ ਦਾ
10.
SNG,POS,FEM, DET
ਉਸ
ਉਹਨਾਂ ਦੀ
11.
PLR ,POS,MCL, DET
ਉਸ
ਉਹਨਾਂ ਦੇ
12.
PLR,POS,FEM,DET
ਉਸ
ਉਹਨਾਂ ਦੀਆਂ
Similarly different tables can be made for inflecting
[ਇਸ]{}"00.@3" ‘is’ (third person singular pronoun, like ‘it’),
[ਇਹ]{}"00.@3.@pl" ‘eh’ (third person plural pronoun, like
‘these’) , [ਤੂੰ ]{}"00.@2" ‘tūṃ’ (second person singular pronoun,
like ‘you’), [ਤੁਸੀਂ]{}"00.@2.@pl" ‘tūsi’ (second person plural
pronoun, like ‘yours’) , [ਮੈਂ]{}"00.@1" ’maiṃ’ (second person
singular pronoun, like ‘I’ ) and [ਅਸੀਂ]{}"00.@1.@pl" ‘asim’
(second person plural pronoun, like ‘we’) by using some other
paradigm.
b. Noun morphology
Noun morphology deals with the properties of nouns to
identify their behavior in the generation process. The nouns are
analyzed on the basis of Gender, Number, Person, Case (GNPC)
and their paradigm information. Punjabi noun paradigms are
identified on the basis of their vowel ending.
Harinder Singh et.al.
Suppose relation of type rel(UW1.@att1, UW2), where
UW1 is a noun, the morphology of noun of relation ‘rel’ and
other UW2 in the UNL expression and depend on the its own
attributes . The concept of noun morphology is illustrated with
example sentences given below.
 Example English sentence without relation: Many other
books.
…(3.1)
UNL expression for this example sentence is,
{unl}
book:05.@other.@multal
{/unl}
…(3.2)
Equivalent Punjabi sentence: ਿਈ ਹੋਰ ਕਿਤਾਬਾਂ ।
…(3.3)
Transliterated Punjabi sentence: kai hōr kitābīāṃ.
Dictionary entry for noun ਕਿਤਾਬ‘kitāb ’ is given in (3.4)
[ਕਿਤਾਬ]{}"book"
(LEX=N,POS=NOU,NUM=SNG,GEN=FEM,PAR=M2);
…(3.4)
440
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing
Inflection paradigm (3.5) will be fired to generate the
noun morphology for above noun.
(%x,M2):=(%x,-M5,+FLX(SNG&FEM:=0>"";
PLR&FEM:=0>”ੀਾੀਂ”;SNG&MCL:=1>"ੀਾ";
PLR&MCL:=1>"ੀੇ ";));
…(3.4)
In Punjabi output given in (3.3) noun ਕਿਤਾਬ kitāb for
the HW‘ book:05’is changed to ‘ਕਿਤਾਬਾਂ’ ‘kitābīāṃ’, because
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
it has UNL attributes ‘@multal’ and ‘FEM’
expressions given in (3.2).
in
UNL
In noun morphology, a part of the word is removed
from the end and a new phoneme is added to the end of the word
during the generation process. There is a group of nouns that
change their form during the generation process while others
retain their form and does not change in the generation form.
The rule base for noun morphology is given in Table 3.
Table 3: Nouns like book, car.
S. No.
Attributes
[ਗੱ ਡੀ]{}"car"
[ਕਿਤਾਬ]{}"book"
Letters
removed
Modified
HW
ਕਿਤਾਬ
letters
removed
1.
SNG,FEM
-
2.
PLR,FEM
-
ਕਿਤਾਬਾਂ
-
ਗੱ ਡੀਆਂ
3.
SNG,MCL
-
-
ੀੀ
ਗੱ ਡਾ
4.
PLR ,MCL
-
-
ੀੀ
ਗੱ ਡੇ
c. A djective morphology
Unlike nouns, gender and number information of
adjectives are not directly embedded in UNL expression. Most
of the adjectives exhibit concordance with their head nouns and
their heads are identified using relation label in UNL expression
[10]. Adjective morphology depends on the gender, number and
suffix information of the head noun. Implementation process of
adjective morphology is illustrated with the help of UNL
sentence given below.
 Example English sentence with plural adjective:
Beautiful flowers.
…(4.1)
UNL expression for this example sentence is,
{unl}
mod(car:07.@pl, beautiful:05)
{/unl}
…(4.2)
Equivalent Punjabi sentence: ਸੋਹਣੀਆਂ ਗੱ ਡੀਆਂ ।
…(4.3)
Transliterated Punjabi sentence: sōhṇīāṃ gaddīāṃ.
Here, UW ‘beautiful:05’ is identified as an
adjective with HW ਸਹਣ sōhaṇ ‘beautiful’ from UNL-NL
dictionary as depicted in (4.7).
-
Modified
HW
ਗੱ ਡੀ
[ਸੋਹਣ]{}"beautiful"
(LEX=J,POS=ADJ,NUM=SNG,PAR=M5)<pun,0,0>…(4.4)
The paradigm (4.5) will be fired to generate the
morphology for above example ਸੋਹਣ sōhaṇ ‘beautiful’.
(%x,M5):=(%x,-M5,+FLX(SNG&MCL:=0>" ੀਾ";
SNG&FEM:=0>” ੀੀ”;PLR&MCL:=0>"ੀੇ";PLR:=0>"ਆਂ";));
…(4.5)
In Punjabi output given in (4.3), HW ਸੋਹਣ sōhaṇ
‘beautiful’ is changed to ਸੋਹਣੀਆਂ sōhṇīāṃ ‘beautiful’,
because UW1 ‘car’ is identified as head noun and it has
‘FEM’ and ‘@pl’ attributes in UNL-NL dictionary and
UNL expression, respectively.
Following table (Table 7.) depicts different inflections
be
made
adjectives
[ਸੋਹਣ]{}"beautiful"
and
can
[ਮਕਹੂੰ ਗ]{}"expensive"
by
using
paradigm
(4.5).
Table 4: The generation rule base for adjectives like beautiful and expensive etc.
S. No.
Attributes
Transformed to
[ਸੋਹਣ]{}"beautiful"
Harinder Singh et.al.
[ਮਕਹਿੰ ਗ]{}"expensive"
1.
SNG,FEM
ਸੋਹਣੀ
ਮਕਹੂੰ ਗੀ
2.
PLR,FEM
ਸੋਹਣੀਆਂ
ਮਕਹੂੰ ਗੀਆਂ
3.
SNG,MCL
ਸੋਹਣਾ
ਮਕਹੂੰ ਗਾ
4.
PLR ,MCL
ਸੋਹਣੇ
ਮਕਹੂੰ ਗੇ
441
www.ijclnlp.org
International Journal of Computational Linguistics and Natural Language Processing
IV.
RESULT AND DISCUSSION
In this paper, it is described with examples that, how
the information provided in UNL input sentence can be used for
morphological conversions of pronouns, nouns and adjectives to
form the Punjabi language words as close to the actual form as
possible using the EUGENE console [12]. About 150 sentences
of Corpus500 [13], 100 sentences of UC - A1 [14] and
135sentence of UC-A2 [15] (given as assignment by UNDL
Foundation) are processed till date with very good percentage of
accuracy. All these sentences are processed by same set of
written rules, inflection paradigms with merging dictionary
words of all these assignment sentences.
There is a feature available on the UNDL foundation’s
website, which is F-measure [16], which rates the output of
NLization (output generated using EUGENE tool) with the
expected Punjabi natural language sentences on the scale of (01). Both files saved in UTF-8 .txt format can be uploaded and it
gives F-measure for it. F-measures of: Corpus500 sentences
come out to be more than 90%, UC-A1 and UC-A2 sentences
are more than 80%. F-measure of UC-A1 is depicted in Figure1.
Vol 2 Issue 7 July 2013
ISSN 2279 – 0756
several words. If the behavior is irregular, i.e., it is restricted
only to a single word, it should be described as an inflectionalrule instead of an inflectional-paradigm. For instance, the plural
of the English word "foot" is better generated by an inflectional
rule rather than by an inflectional paradigm. Inflectional rules
are not included in the grammar. They are added directly to the
dictionary entry, in the dictionary. Compound forms should not
be included in paradigm, i.e., paradigms must deal only with
simple forms, i.e., forms that can be generated by prefixation,
infixation or suffixation.
VI.
FUTURE SCOPE
UNDL foundation has also made other new corpuses, which are
supposed to cover very basic linguistic phenomena, they created
a corpus which is UC-B1 [17], it involve five stories for children
like: The Hare and The Tortoise, The Bat and The Weasels,
Father and his sons, The Ants and the Grasshopper and The Man
and the Lion, which have sentences in form of paragraphs, these
stories also involve pronoun, noun and adjective and verb
morphologies. In future, these entire corpuses can be processed
using UNL-framework.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
Figure 1: F-measure of 100 sentences of UC-A1.
V.
CONCLUSION
While creating paradigms for words of same
morphological category, duplicate paradigms should not be
created, until it is really necessary, i.e., whether there is no
existing paradigm that may be used in order to generate the
intended inflections. Paradigms should not be written for a
single word; paradigms are used to describe the behavior of
Harinder Singh et.al.
[11]
[12]
[13]
[14]
[15]
[16]
Kumar P, Sharma R K (2012) UNL Based Machine Translation System for
Punjabi Language PhD Dissertation, Thapar University, Patiala, India.
Uchida, Hiroshi, Zhu Meiying & Tarcisio Della Senta. (1999). The UNL, A
Gift for a Millennium. Japan: Institute of Advanced Studies, The United
Nations University.
Singh, Smriti, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya &
Om P. Damani. 2007. Hindi Generation from Interlingua. Paper presented
at Machine Translation Summit, Copenhagen.
‘Universal Networking Language (UNL) Specifications, Version 2005’
[online] http://www.undl.org/unlsys/unl/unl2005 (Accessed 12 August
2010).
Giri G, L. 2000. Semantic Net Like Knowledge Structure Generation from
Natural Languages. IIT Bombay B Tech Dissertation.
Dave, Shachi, Jignashu Parikh & Pushpak Bhattacharyya. 2001. Interlingua
Based English Hindi Machine Translation and Language Divergence.
Journal of Machine Translation (JMT). 16(4): 251-304.
Bahri U S, Walia P S 2003 Introductory Punjabi. Publication Bureu,
Punjabi University, Patiala.
Jain K 2005 UNL to Hindi generation lexicon and morphology. M. Tech.
Thesis, IIT Bombay, Mumbai.
R. Martins ‘Universal Networking Language (UNL): A-rule’ [Online]:
http://www.unlweb.net/wiki/A-rule (Accessed 10 November 2012).
Hrushikesh B 2002 Towards Marathi sentence generation from Universal
Networking Language. M. Tech. Thesis, IIT Bombay, Mumbai.
‘Universal Networking Language (UNL): EUGENE Tool version 1.0.1’
[online] http://dev.undlfoundation.org/generation/index.jsp. (Accessed 24
November 2012).
‘Corpus500’ [Online] http://www.unlweb.net/wiki/Corpus500, (Accessed
19 August 2012).
‘Corpus UC-A1’ [Online] http://www.unlweb.net/wiki/UC-A1, (Accessed
29 December 2013).
‘Corpus UC-A2’ [Online] http://www.unlweb.net/wiki/UC-A2, (Accessed
10 February 2003).
R. Martins ‘Universal Networking Language (UNL): F-measure’ [Online]:
http://www.unlweb.net/wiki/F-measure. (Accessed 30 November 2012).
‘Corpus UC-B1’ [Online] http://www.unlweb.net/wiki/UC-B1, (Accessed
10 February 2013).
442
www.ijclnlp.org