International Journal of Computational Linguistics and Natural Language Processing Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 Analysis of Noun, Pronoun and Adjective Morphology for NLization of Punjabi with EUGENE Harinder Singh Parteek Kumar Computer Science & Engineering Department Thapar University, Patiala, India [email protected] Computer Science & Engineering Department Thapar University, Patiala, India [email protected] Abstract—Morphological analysis of various parts of speech is an important activity in order to design a machine translation system for a language. This paper describes morphological analysis of Punjabi nouns, pronouns and adjectives for developing Universal Networking Language (UNL) based Machine Translation (MT) system for this Language. All headwords which are involved in UNL-to-NL dictionary always have some attributes and relation label, by using them, system is able to generate the sentence very close to its natural form. The phonetic properties of the language are handled by the noun, adjective, pronoun and verb morphology of the NLizer like, EUGENE (dEep-to-sUrface GENErator). This application software makes use of inflection paradigms during the modification process. These inflection paradigms are designed on the basis of analysis of Punjabi morphology, that is, EUGENE made morphological modifications of headword to various forms on the basis of its gender, number, person, case, paradigm information suggested by its attributes, relation, etc. semantic information. In this paper, the categories of morphology that have been identified for the purpose of conversion of UNL expression to equivalent Punjabi language sentences and explanation and implementation of noun, adjective, pronoun and verb morphologies are described in detail. Keywords: Universal Networking Language, EUGENE, Inflection Paradigm, Morphology, Machine translation, UNL expression. I. INTRODUCTION Morphology involves the mapping of headwords, which are identified from the UNL expression which are present in UNL-to-NL dictionary, to their more natural meanings. Headwords words are changed (something is added or removed) to get proper sense of words in terms of gender, number, tense, aspect and modality [1]. UNL is an Interlingua for knowledge representation in the context of machine translation. UNL is an electronic language for computers to express and exchange information [2]. Three building blocks of UNL are (1) Semantic Relations, (2) Attributes and (3) Universal Words. UNL representation of a sentence is expressed in the form of a semantic net called UNL graph. Consider the following English sentence for its UNL representation: ‘Sukwinder played football in the garden’ It’s UNL representation shall be: {unl} agt(play:01.@past, Sukhwinder:02) obj(play:01.@past, football:03) plc(play:01.@past, garden:04) {/unl} In this expression, agt (agent), obj (object) and plc (place) are the semantic relations and the words play, Sukhwinder, football, and garden are the Universal Words (UWs). UWs are language words. UWs can be annotated with attributes like number, tense, etc., which provide further information about how the concept is being used in a specific sentence. B. UNL Attributes for Representation of Information UNL attributes are used to describe the subjectivity information of sentences. These store the information about what is said from the speaker’s point of view. UNL has 87 primary attributes (this number can be augmented by user defined ones) to express the semantic content of a sentence [3][4]. Table 1 describes some of the UNL attributes used to represent the knowledge extracted from an input sentence. A. UNL Based Machine Translation System: The Framework Harinder Singh et.al. 436 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 Table 1: UNL Attributes UNL Attributes Concept Time with respect to the speaker @past, @present, @future Speaker's view of Reference @generic, @def, @indef Speaker’s view of Quantity @multal, @extra Speaker’s view about Numbers @pl Speaker’s view about Gender @male, @female Speaker’s view about aspect @habitual, @perfective and @progressive C. ROLE of Morphology in UNL Based MT System UNL based MT can be done by two tools given on UNDL foundation’s website, which are, IAN (Interactive ANalyzer) for UNLization and EUGENE (dEep-to-sUrface GENErator) for NLization. The process of converting a source language (natural language) expression into the UNL expression is referred to as UNLization and the process of converting UNL expressions into a target language representation is called NLization. The process of UNLization consists of four main stages: parsing of input sentence, extraction of universal words from bi-lingual dictionary, resolution of UNL relations and generation of UNL attributes [5][6]. NLization process consists of three main stages: morphological generation of lexical words, function word insertion and syntax planning [3]. In order to perform morphological generation, analysis of morphology of a language is very important. This has been attempted for Punjabi language in this paper. II. III. RELATED WORK Analysis of Punjabi grammar has earlier been carried out by Chander and Duni in 1964, Harjeet Singh Gill in 1986, Harkirat Singh in 1988, Puar and Joginder Singh in 1990 and Aditya Joshi in 2000. Their studies form the basis for the Natural Language Processing (NLP) systems for Punjabi language. Mandeep Singh Gill in 2008 has developed a rule based part of speech tagger and morphological analyzer and generator for Punjabi. With the help of Gurpreet Singh Lehal in 2008, he has also developed a grammar checker for Punjabi language [1]. Analysis of Hindi grammar for part of speech tagger has been performed by Debasri Chakrabarti and Pushpak Bhattacharyya in 2002. The Tamil morphology analysis for development of Tamil EnConverter for EnConversion of Tamil UNL has been performed by Dhanabalan in 2002 [1]. Harinder Singh et.al. Bengali morphology has been analyzed with respect to UNL for Bengali EnConverter by Md. Nawab Yousuf Ali, S. M. Abdullah Al-Mamun, Jugal Krishna Das and Abu Mohammad Nurannabi in 2008. UNL based analysis and generation of Bengali case structure constructs have also been performed by Kuntal Dey and Pushpak Bhattacharyya in 2005. Arabic grammar generator has been proposed for the development of Arabic MT System based on UNL by Magdy Nagi, Noha Adly and Sameh Alansary in 2009. Hindi grammar has been analyzed to create UNL based MT system for Hindi language. Hindi generation rules have been created for Hindi DeConverter by Vijay Dwivedi in 2002 and Ajay Nalawade in 2007. The UNL generation rules for Hindi EnConverter have been created by analyzing Hindi grammar by G. Giri and Leena in 2000 and Sachi Dave and Pushpak Bhattacharyya 2001 [1]. The relevant work on Punjabi language also leads to establish that UNL related work has not been done for Punjabi language. MORPHOLOGY In this section, it is explained that how a HW in the UNL expression can be converted so that generated sentence in natural language should be very close to its natural form using the EUGENE framework. There are three categories of morphology that have been identified for the purpose of conversion of UNL expression to equivalent Punjabi language sentences [1]. These are: Attribute label resolution morphology Relation label resolution morphology and Noun, adjective, pronoun and morphology. A. Attribute label resolution morphology Attribute label morphology deals with the generation of Punjabi words on the basis of UNL attributes attached to a node 437 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing and its grammatical attributes retrieved from UNL-NL dictionary and words are changed in this phase depending on their Gender, Number, Person, Tense, Aspect (GNPTA) and vowel ending information. Punjabi has ten vowels, represented as ਾ (ਆ ā), ਿਾ (ਇ i), ਾ (ਈ ī ), ਾ (ਉ u ), ਾ (ਊ ū), ਾ (ਏ ē), ਾ (ਐ ai), ਾ (ਓ ō), ਾ (ਔ au) and ਮਕਤ muktā (ਅ a) which has no sign. Vowels other than ਅ a (ਮਕਤ ) muktā are represented by accessory signs written around (i.e., below, above, to the right or to the left) the consonant signs, popularly known as signs for matras [7]. The attribute label morphology also deals with the generations of articles in the target language. For example, definite articles (typically arise from demonstratives meaning ‘that’) are represented in UNL expression by ‘@def’attribute and it results the generation of word ਉਹ uh ‘that’ in Punjabi. Similarly, indefinite articles (typically arise from adjectives meaning ‘one’) are represented by ‘@indef’ attribute and this results into the generation of Punjabi word ਇੱਕ ikk or nothing depending upon the number of the words it qualifies in the attribute label morphology [8]. But in this paper, only the attribute label morphology for pronoun, nouns and adjectives is focused. B. Relation label resolution morphology The relation label morphology manages the prepositions in English or postpositions in Punjabi, because prepositions in English are similar to postpositions in Punjabi. These link noun, pronoun, and phrases to other parts of the sentence. Some Punjabi postpositions are ਨ nē, ਨੂੰ nūṃ, ਉੱਤ uttē ‘over’, ਦ dā ‘of’, ਕਲੋਂ kōlōṃ ‘from’, ਨੜ nēḍaē ‘near’, ਲ ਗ lāgē ‘near’ etc. In Punjabi, postpositions follow the noun or pronoun unlike English, where these precede the noun or pronoun, and thus termed prepositions [1]. Insertion of these words in the generated output depends upon the information encoded in the UNL relations of a given UNL expression. In relation label morphology, most UNL relation labels introduce postpositions (also known as function words or case markers) between child and the parent node during the generation process. The generation of these words depends upon UNL relation and the conditions imposed on parent and child nodes’ attributes of the UNL relation. For the generation of these words a rule base has been prepared. Let us illustrate this concept with an example English sentence given in (1.1) and its equivalent Punjabi sentence given in (1.2). The UNL expression for this example sentence is given in (1.3). The boy translated the sentence from English to Punjabi. …(1.1) ਮੂੰਡ ਨ ਅੂੰਗਰਜ ਤੋਂ ਪੂੰਜ ਬ ਿਿਚ ਿ ਕ ਅਨਿ ਦ ਕ ਤ । …(1.2) muṇḍē nē aṅgrēzī tōṃ pañjābī vic vāk dā anuvād kītā. {unl} agt(translate:01.@past, boy:02) src(translate:01.@past, English:03) Harinder Singh et.al. Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 gol(translate:01.@past, Punjabi:04) obj(translate:01.@past, sentence:05) {/unl} …(1.3) Here, the case markers ਨ nē, ਤੋਂ tōṃ ‘from’ , ਿਿਚ vic ‘to’ and ਦ dā are inserted in the morphed words due to the presence of UNL relations ‘agt’, ‘src’, ‘gol’ and ‘obj’, respectively, in the UNL expression given in (1.3). C. Noun, pronoun, adjective morphology With attribute and relation label morphology, the system is able to generate the sentence very close to its natural form. The phonetic properties of the language are handled by the noun, adjective, pronoun and verb morphology of the EUGENE. In EUGENE framework special inflection paradigms are written to generate the sentences close to their natural form. Inflectional paradigms are sets of rules that are used to generate inflections out of the base forms. In the dictionary, only base forms (ਕਿਤਾਬ kitāb book ) are stored as follow: [ਕਿਤਾਬ]{}"book" (LEX=N,POS=NOU,NUM=SNG,GEN=FEM,PAR=M2); …(1.4) And the inflection (ਕਿਤਾਬਾਂ kitābīāṃ books) is generated through rules. The decision, which rule is to be used for making inflection to particular word is made by paradigm number (gray shaded part ‘PAR=M2’) referred in the dictionary entry of the word. These inflections are of A-rule(affixation rule) [9] type. A-rule is the formalism used for generating affixes (prefixes, suffixes, infixes) as follow: prefixation CONDION := “ADDED” < DELETED; suffixation CONDION := DELETED > “ADDED” ; infixation CONDITION := [REFERENCE] > “ADDED” CONDITON := “ADDED” < [REFERENCED] replacement CONDITION := DELETED : “ADDED”; Where: CONDITION = tag (such as “PLR”, “FEM”, etc) or list of tags (“FEM&PLR”) that indicates when the rule should be applied. 438 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing ADDED (between quotes) = the string to be added ; REFERENCE (between square brackets) = the reference Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 kōlōṃ’, ‘ਮਰ ਲ ਗਓਂ’ ‘mērē lāgiōṃ’ etc. for singular number and to ਅਸੀਂ asīṃ, ਸ ਨੂੰ sānūṃ, ਸ ਥੋਂ sāthōṃ, ‘ਸ ਡ ਕਲੋਂ ’ ‘sāḍē kōlōṃ’, ‘ਸ ਡ ਲ ਗਓਂ’ ‘sāḍē lāgiōṃ’ etc. for plural string (between quotes) or the position (without quotes) of number depending on the case information of the sentence [1]. the string to be added; 2) Inflection of pronouns on the basis of UNL relation label and tense. In a UNL based NLization, pronouns are also inflected on the basis of UNL relation labels and the tense information provided by UNL attributes of UW1 used in a UNL expression like rel(UW1,UW2). In all below example UNL sentences UW2 ‘00:01.@3.@male’ (third person male pronoun) is used and its paradigm number is defined to be ‘PAR=M7’ in its dictionary entry (1.3), Thus paradigm rule (2.1) will be fired to modify the HW depending upon the attribute it has. DELETED = the string (between quotes) or the number of characters (without quotes) to be deleted. D. Steps for creating inflectional paradigms 1) Determine the inflectional categories for the part-ofspeech. The inflectional categories describe the differences between the possible forms of the same headword. 2) The same part-of-speech may involve different inflectional categories. In Punjabi, for instance, some nouns, such as ਕਿਤਾਬ kitāb book, only inflect in number (SNG and PLR); other nouns, such as ਗੱ ਡੀ gāddī car, inflect in number and in gender (MCL&SNG,MCL&PLR,FEM&SNG,FEM&PLR). 3) Rules are not cumulative. Combine inflectional categories in one same condition because it's not possible to apply rules sequentially. For instance, it's not possible, in (%x,M7):= (%x,M7,+FLX(SNG&BEN&PAS:=1>"ਸ ਨੂੰ "; SNG&AGT&PAS:=1>"ਸ ਨੇ"; SNG&POS&MCL&^DET:=1>"ਸ ਦਾ"; SNG&POS&FEM&^DET:=1>"ਸ ਦੀ"; Punjabi, to write simply FEM:=0>"ੀੀ"; and PLR:=0>"ਆਂ";. It's PLR &POS&MCL&^DET:=1>"ਸ ਦੇ"; necessary to write FEM&PLR:=0>"ੀੀਆਂ";. This happens PLR&POS& FEM&^DET:=1>"ਸ ਦੀਆਂ"; because, for the time being, it's not possible to tell the machine in which order the rules should be applied, i.e., the result could PLR&BEN&PAS&DET:=0>"ਨਾਂ ਨੂੰ "; be "ਗੱ ਡਆਂੀੀ" instead of "ਗੱ ਡੀਆਂ", if we define the number and the gender separately. The complete inflection paradigm for one or more words (belonging to same inflection categories for part-ofspeech) looks like (1.5). (%x,M7):=(%x,M7,+FLX(Atrbt1&Atrbt2&…:=0>"strg1" ;Atrbt3&Atrbt4&…:=2>"strg2";...;)); …(1.5) a. Pronoun morphology Pronouns are inflected by case, tense, number and gender information in a sentence. It has been observed that pronoun morphology also depends on UNL relation labels. The pronoun morphology for Punjabi language is presented in this section. 1) Inflection of pronouns on the basis of case and number. Pronouns are inflected by case and number, e.g., personal pronoun ਮੈਂ maiṃ ‘i’, changes its form to ਮਨੂੰ mainūṃ, ਮਥੋਂ maithōṃ, ‘ਮਰ ਲਈ’‘mērē laī’, ‘ਮਰ ਕਲੋਂ ’ ‘mērē Harinder Singh et.al. PLR&PAS&DET:=0>"ਨਾਂ ਨੇ"; SNG&POS&MCL&DET:=0>"ਨਾਂ ਦਾ"; SNG&POS&FEM&DET:=0>"ਨਾਂ ਦੀ"; PLR &POS&MCL&DET:=0>"ਨਾਂ ਦੇ"; PLR&POS&FEM&DET:=0>"ਨਾਂ ਦੀਆਂ";)); …(2.1) Examples of ‘agt’, ‘ ben’ and ‘ pos’ Relations. Example English sentence with past tense: He ate apples. …(2.2) UNL expression for this example sentence is given in (2.2). {unl} agt(eat:03.@past, 00:01.@3.@male) obj(eat:03.@past, apple:03.@pl) {/unl} …(2.3) Equivalent Punjabi sentence: ਉਸ ਨ ਸਬ ਖ ਧ । …(2.4) Transliterated Punjabi sentence: us nē seb khādhē. Example English sentence with singular number: He gave a book to him. …(2.5) 439 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing UNL expression for this example sentence is : {unl} agt(give:03.@past, 00:05.@3.@male) obj(give:03.@past, book:03) ben(give.@past, 00:01.@3.@male) {/unl} …(2.6) Equivalent Punjabi sentence: ਉਸ ਨੇ ਉਸਨੂੰ ਕਿਤਾਬ ਕਦੱ ਤੀ । …(2.7) Transliterated Punjabi sentence: us nē us nūṃ kitāb dittī. Example English sentence: I ate his fruits. ...(2.8) UNL expression for this example sentence is, {unl} agt(eat:03.@past, 00:05.@1) Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 obj(eat:03.@past, fruit:02.@pl) pos(fruit:02.@pl, 00:01.@3.@male) {/unl} Equivalent Punjabi sentence: ਮੈਂ ਉਸ ਦੇ ਫ਼ਲ ਖਾਧੇ । ...(2.9) …(2.10) Transliterated Punjabi sentence: maiṃ us dē fal khādhē. Following table depicts different inflections can be made to [ਉਹ]{}"00.@3" and [ਉਹ]{}"00.@3.@pl" by using the same (2.10) paradigm rule. Table 2: Third person pronoun like he, she, they. S. No. Attributes Transformed to [ਉਹ]{}"00.@3" (male/female) [ਉਹ]{}"00.@3.@pl" 1. 1 SNG,BEN,PAS,^TST ਉਸਨੂੰ ਉਹ 2. 2 SNG,PAS,TSTD ਉਸਨੇ ਉਹ 3. SNG,POS,MCL, ^DET ਉਸਦਾ ਉਹ 4. SNG,POS,FEM,^DET ਉਸਦੀ ਉਹ 5. PLR ,POS,MCL, ^DET ਉਸਦੇ ਉਹ 6. PLR,POS, FEM,^DET ਉਸ ਦੀਆਂ ਉਹ 7. PLR,BEN,^TST, DET ਉਸ ਉਹਨਾਂ ਨੂੰ 8. PLR,PAS,TSTD,DET ਉਸ ਉਹਨਾਂ ਨੇ 9. SNG,POS,MCL,PAS, DET ਉਸ ਉਹਨਾਂ ਦਾ 10. SNG,POS,FEM, DET ਉਸ ਉਹਨਾਂ ਦੀ 11. PLR ,POS,MCL, DET ਉਸ ਉਹਨਾਂ ਦੇ 12. PLR,POS,FEM,DET ਉਸ ਉਹਨਾਂ ਦੀਆਂ Similarly different tables can be made for inflecting [ਇਸ]{}"00.@3" ‘is’ (third person singular pronoun, like ‘it’), [ਇਹ]{}"00.@3.@pl" ‘eh’ (third person plural pronoun, like ‘these’) , [ਤੂੰ ]{}"00.@2" ‘tūṃ’ (second person singular pronoun, like ‘you’), [ਤੁਸੀਂ]{}"00.@2.@pl" ‘tūsi’ (second person plural pronoun, like ‘yours’) , [ਮੈਂ]{}"00.@1" ’maiṃ’ (second person singular pronoun, like ‘I’ ) and [ਅਸੀਂ]{}"00.@1.@pl" ‘asim’ (second person plural pronoun, like ‘we’) by using some other paradigm. b. Noun morphology Noun morphology deals with the properties of nouns to identify their behavior in the generation process. The nouns are analyzed on the basis of Gender, Number, Person, Case (GNPC) and their paradigm information. Punjabi noun paradigms are identified on the basis of their vowel ending. Harinder Singh et.al. Suppose relation of type rel(UW1.@att1, UW2), where UW1 is a noun, the morphology of noun of relation ‘rel’ and other UW2 in the UNL expression and depend on the its own attributes . The concept of noun morphology is illustrated with example sentences given below. Example English sentence without relation: Many other books. …(3.1) UNL expression for this example sentence is, {unl} book:05.@other.@multal {/unl} …(3.2) Equivalent Punjabi sentence: ਿਈ ਹੋਰ ਕਿਤਾਬਾਂ । …(3.3) Transliterated Punjabi sentence: kai hōr kitābīāṃ. Dictionary entry for noun ਕਿਤਾਬ‘kitāb ’ is given in (3.4) [ਕਿਤਾਬ]{}"book" (LEX=N,POS=NOU,NUM=SNG,GEN=FEM,PAR=M2); …(3.4) 440 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing Inflection paradigm (3.5) will be fired to generate the noun morphology for above noun. (%x,M2):=(%x,-M5,+FLX(SNG&FEM:=0>""; PLR&FEM:=0>”ੀਾੀਂ”;SNG&MCL:=1>"ੀਾ"; PLR&MCL:=1>"ੀੇ ";)); …(3.4) In Punjabi output given in (3.3) noun ਕਿਤਾਬ kitāb for the HW‘ book:05’is changed to ‘ਕਿਤਾਬਾਂ’ ‘kitābīāṃ’, because Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 it has UNL attributes ‘@multal’ and ‘FEM’ expressions given in (3.2). in UNL In noun morphology, a part of the word is removed from the end and a new phoneme is added to the end of the word during the generation process. There is a group of nouns that change their form during the generation process while others retain their form and does not change in the generation form. The rule base for noun morphology is given in Table 3. Table 3: Nouns like book, car. S. No. Attributes [ਗੱ ਡੀ]{}"car" [ਕਿਤਾਬ]{}"book" Letters removed Modified HW ਕਿਤਾਬ letters removed 1. SNG,FEM - 2. PLR,FEM - ਕਿਤਾਬਾਂ - ਗੱ ਡੀਆਂ 3. SNG,MCL - - ੀੀ ਗੱ ਡਾ 4. PLR ,MCL - - ੀੀ ਗੱ ਡੇ c. A djective morphology Unlike nouns, gender and number information of adjectives are not directly embedded in UNL expression. Most of the adjectives exhibit concordance with their head nouns and their heads are identified using relation label in UNL expression [10]. Adjective morphology depends on the gender, number and suffix information of the head noun. Implementation process of adjective morphology is illustrated with the help of UNL sentence given below. Example English sentence with plural adjective: Beautiful flowers. …(4.1) UNL expression for this example sentence is, {unl} mod(car:07.@pl, beautiful:05) {/unl} …(4.2) Equivalent Punjabi sentence: ਸੋਹਣੀਆਂ ਗੱ ਡੀਆਂ । …(4.3) Transliterated Punjabi sentence: sōhṇīāṃ gaddīāṃ. Here, UW ‘beautiful:05’ is identified as an adjective with HW ਸਹਣ sōhaṇ ‘beautiful’ from UNL-NL dictionary as depicted in (4.7). - Modified HW ਗੱ ਡੀ [ਸੋਹਣ]{}"beautiful" (LEX=J,POS=ADJ,NUM=SNG,PAR=M5)<pun,0,0>…(4.4) The paradigm (4.5) will be fired to generate the morphology for above example ਸੋਹਣ sōhaṇ ‘beautiful’. (%x,M5):=(%x,-M5,+FLX(SNG&MCL:=0>" ੀਾ"; SNG&FEM:=0>” ੀੀ”;PLR&MCL:=0>"ੀੇ";PLR:=0>"ਆਂ";)); …(4.5) In Punjabi output given in (4.3), HW ਸੋਹਣ sōhaṇ ‘beautiful’ is changed to ਸੋਹਣੀਆਂ sōhṇīāṃ ‘beautiful’, because UW1 ‘car’ is identified as head noun and it has ‘FEM’ and ‘@pl’ attributes in UNL-NL dictionary and UNL expression, respectively. Following table (Table 7.) depicts different inflections be made adjectives [ਸੋਹਣ]{}"beautiful" and can [ਮਕਹੂੰ ਗ]{}"expensive" by using paradigm (4.5). Table 4: The generation rule base for adjectives like beautiful and expensive etc. S. No. Attributes Transformed to [ਸੋਹਣ]{}"beautiful" Harinder Singh et.al. [ਮਕਹਿੰ ਗ]{}"expensive" 1. SNG,FEM ਸੋਹਣੀ ਮਕਹੂੰ ਗੀ 2. PLR,FEM ਸੋਹਣੀਆਂ ਮਕਹੂੰ ਗੀਆਂ 3. SNG,MCL ਸੋਹਣਾ ਮਕਹੂੰ ਗਾ 4. PLR ,MCL ਸੋਹਣੇ ਮਕਹੂੰ ਗੇ 441 www.ijclnlp.org International Journal of Computational Linguistics and Natural Language Processing IV. RESULT AND DISCUSSION In this paper, it is described with examples that, how the information provided in UNL input sentence can be used for morphological conversions of pronouns, nouns and adjectives to form the Punjabi language words as close to the actual form as possible using the EUGENE console [12]. About 150 sentences of Corpus500 [13], 100 sentences of UC - A1 [14] and 135sentence of UC-A2 [15] (given as assignment by UNDL Foundation) are processed till date with very good percentage of accuracy. All these sentences are processed by same set of written rules, inflection paradigms with merging dictionary words of all these assignment sentences. There is a feature available on the UNDL foundation’s website, which is F-measure [16], which rates the output of NLization (output generated using EUGENE tool) with the expected Punjabi natural language sentences on the scale of (01). Both files saved in UTF-8 .txt format can be uploaded and it gives F-measure for it. F-measures of: Corpus500 sentences come out to be more than 90%, UC-A1 and UC-A2 sentences are more than 80%. F-measure of UC-A1 is depicted in Figure1. Vol 2 Issue 7 July 2013 ISSN 2279 – 0756 several words. If the behavior is irregular, i.e., it is restricted only to a single word, it should be described as an inflectionalrule instead of an inflectional-paradigm. For instance, the plural of the English word "foot" is better generated by an inflectional rule rather than by an inflectional paradigm. Inflectional rules are not included in the grammar. They are added directly to the dictionary entry, in the dictionary. Compound forms should not be included in paradigm, i.e., paradigms must deal only with simple forms, i.e., forms that can be generated by prefixation, infixation or suffixation. VI. FUTURE SCOPE UNDL foundation has also made other new corpuses, which are supposed to cover very basic linguistic phenomena, they created a corpus which is UC-B1 [17], it involve five stories for children like: The Hare and The Tortoise, The Bat and The Weasels, Father and his sons, The Ants and the Grasshopper and The Man and the Lion, which have sentences in form of paragraphs, these stories also involve pronoun, noun and adjective and verb morphologies. In future, these entire corpuses can be processed using UNL-framework. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] Figure 1: F-measure of 100 sentences of UC-A1. V. CONCLUSION While creating paradigms for words of same morphological category, duplicate paradigms should not be created, until it is really necessary, i.e., whether there is no existing paradigm that may be used in order to generate the intended inflections. Paradigms should not be written for a single word; paradigms are used to describe the behavior of Harinder Singh et.al. [11] [12] [13] [14] [15] [16] Kumar P, Sharma R K (2012) UNL Based Machine Translation System for Punjabi Language PhD Dissertation, Thapar University, Patiala, India. Uchida, Hiroshi, Zhu Meiying & Tarcisio Della Senta. (1999). The UNL, A Gift for a Millennium. Japan: Institute of Advanced Studies, The United Nations University. Singh, Smriti, Mrugank Dalal, Vishal Vachhani, Pushpak Bhattacharyya & Om P. Damani. 2007. Hindi Generation from Interlingua. Paper presented at Machine Translation Summit, Copenhagen. ‘Universal Networking Language (UNL) Specifications, Version 2005’ [online] http://www.undl.org/unlsys/unl/unl2005 (Accessed 12 August 2010). Giri G, L. 2000. Semantic Net Like Knowledge Structure Generation from Natural Languages. IIT Bombay B Tech Dissertation. Dave, Shachi, Jignashu Parikh & Pushpak Bhattacharyya. 2001. Interlingua Based English Hindi Machine Translation and Language Divergence. Journal of Machine Translation (JMT). 16(4): 251-304. Bahri U S, Walia P S 2003 Introductory Punjabi. Publication Bureu, Punjabi University, Patiala. Jain K 2005 UNL to Hindi generation lexicon and morphology. M. Tech. Thesis, IIT Bombay, Mumbai. R. Martins ‘Universal Networking Language (UNL): A-rule’ [Online]: http://www.unlweb.net/wiki/A-rule (Accessed 10 November 2012). Hrushikesh B 2002 Towards Marathi sentence generation from Universal Networking Language. M. Tech. Thesis, IIT Bombay, Mumbai. ‘Universal Networking Language (UNL): EUGENE Tool version 1.0.1’ [online] http://dev.undlfoundation.org/generation/index.jsp. (Accessed 24 November 2012). ‘Corpus500’ [Online] http://www.unlweb.net/wiki/Corpus500, (Accessed 19 August 2012). ‘Corpus UC-A1’ [Online] http://www.unlweb.net/wiki/UC-A1, (Accessed 29 December 2013). ‘Corpus UC-A2’ [Online] http://www.unlweb.net/wiki/UC-A2, (Accessed 10 February 2003). R. Martins ‘Universal Networking Language (UNL): F-measure’ [Online]: http://www.unlweb.net/wiki/F-measure. (Accessed 30 November 2012). ‘Corpus UC-B1’ [Online] http://www.unlweb.net/wiki/UC-B1, (Accessed 10 February 2013). 442 www.ijclnlp.org
© Copyright 2025 Paperzz