Uppsala Persian Dependency Treebank Annotation Guidelines Mojgan Seraji Carina Jahani Beáta Megyesi Joakim Nivre Department of Linguistics and Philology Uppsala University June 2013 Contents 1 Introduction 1.1 Uppsala Persian Corpus . . . . . . . . . . . . . . 1.1.1 Sentence Segmentation and Tokenization 1.1.2 Morphological Annotation . . . . . . . . . 1.2 Data Selection for UPDT . . . . . . . . . . . . . 1.3 Syntactic Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 5 5 6 6 7 2 Dependency Relations 2.1 Accusative Marker (acc) . . . . . . . . . . . 2.2 Adjectival Complement (acomp) . . . . . . 2.3 Adjectival Complement in LVC (acomp-lvc) 2.4 Adverbial Clause Modifier (advcl ) . . . . . 2.5 Adverbial Modifier (advmod ) . . . . . . . . 2.6 Adjectival Modifier (amod ) . . . . . . . . . 2.7 Appositional Modifier (appos) . . . . . . . . 2.8 Auxiliary (aux ) . . . . . . . . . . . . . . . . 2.9 Passive Auxiliary (auxpass) . . . . . . . . . 2.10 Coordination (cc) . . . . . . . . . . . . . . . 2.11 Clausal Complement (ccomp) . . . . . . . . 2.12 Complementizer (complm) . . . . . . . . . . 2.13 Conjunct (conj ) . . . . . . . . . . . . . . . . 2.14 Copula (cop) . . . . . . . . . . . . . . . . . 2.15 Object of Comparative (cpobj ) . . . . . . . 2.16 Comparative Modifier (cprep) . . . . . . . . 2.17 Dependent (dep) . . . . . . . . . . . . . . . 2.18 Topic Dependent (dep-top) . . . . . . . . . 2.19 Vocative Dependent (dep-voc) . . . . . . . . 2.20 Determiner (det) . . . . . . . . . . . . . . . 2.21 Direct Object (dobj ) . . . . . . . . . . . . . 2.22 Direct Object in LVC (dobj-lvc) . . . . . . . 2.23 Foreign Word (fw ) . . . . . . . . . . . . . . 2.24 Marker (mark ) . . . . . . . . . . . . . . . . 2.25 Multi-Word Expression (mwe) . . . . . . . 2.26 Negation Modifier (neg) . . . . . . . . . . . 2.27 Noun Compound Modifier (nn) . . . . . . . 2.28 NP as Adverbial Modifier (npadvmod ) . . . 2.29 Nominal Subject (nsubj ) . . . . . . . . . . . 2.30 Nominal Subject in LVC (nsubj-lvc) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 9 9 10 10 10 11 11 11 12 12 13 13 13 14 14 15 15 15 16 16 16 17 17 17 18 18 18 19 19 19 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.31 2.32 2.33 2.34 2.35 2.36 2.37 2.38 2.39 2.40 2.41 2.42 2.43 2.44 2.45 2.46 2.47 2.48 Passive Nominal Subject (nsubjpass) . . . Numerical Structure (num) . . . . . . . . Element of Compound Number (number ) Parataxis (parataxis) . . . . . . . . . . . . Object of a Preposition (pobj ) . . . . . . . Possession Modifier (poss) . . . . . . . . . Preconjunct (preconjunct) . . . . . . . . . Predeterminer (predet) . . . . . . . . . . . Prepositional Modifier (prep) . . . . . . . Prepositional Modifier in LVC (prep-lvc) . Phrasal Verb Particle (prt) . . . . . . . . Punctuation (punct) . . . . . . . . . . . . Quantifier Phrase Modifier (quantmod ) . . Relative Clause Modifier (rcmod ) . . . . . Relative (rel ) . . . . . . . . . . . . . . . . Root (root) . . . . . . . . . . . . . . . . . Temporal Modifier (tmod ) . . . . . . . . . Open Clause Complement (xcomp) . . . . 3 Example Sentences 3.1 Example 1 . . . . 3.2 Example 2 . . . . 3.3 Example 3 . . . . 3.4 Example 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 20 20 21 21 21 22 22 22 23 23 23 23 24 24 24 25 25 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 27 30 32 35 A Non-Separating Whitespace 39 B UPDT Dependency Labels 45 2 List of Figures 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Syntactic annotation of a Persian sentence illustrated with Persian words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic annotation of the Persian sentence “If these researchers made a correct guess, their findings can be used to determine the volume and the exact thickness of the mentioned planet.” . . . . Syntactic annotation of the Persian sentence illustrated with Persian words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic annotation of the Persian sentence “In order to be able to understand the beautiful works of this artist and enjoy those, we should open our mind to unusual things and unfamiliar styles.” Syntactic annotation of the Persian sentence illustrated with Persian words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic annotation of the Persian sentence “Perhaps this reputation is well known for the series of his famous works that were done as Bronography (humorous incorporation of the name and style of his work) by the engraving technique.” . . . . . . . . . . Syntactic annotation for the Persian sentence illustrated with Persian words. . . . . . . . . . . . . . . . . . . . . . . . . . . . . Syntactic annotation for the Persian sentence “But he was always ready to change his styles in a courageous way and because of this we see a numerous diversity among his works: from simple and pleasent illustrations for children to bitter, mysterious and complex images for adults.” . . . . . . . . . . . . . . . . . . . . . 3 28 29 30 31 33 34 36 37 List of Tables 1.1 1.2 Part-of-speech tags in UPC. . . . . . . . . . . . . . . . . . . . . . Syntactic relations in UPDT with new relations in italics. . . . . A.1 A.2 A.3 A.4 A.5 Pronominal clitics. . . . . . . . . . . . . . . . . . . Personal endings. . . . . . . . . . . . . . . . . . . . Copula clitics. . . . . . . . . . . . . . . . . . . . . . Verbal stems in the formation of compound words. Adjectival and nominal suffixes. . . . . . . . . . . . 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 8 40 41 41 42 43 Chapter 1 Introduction This document describes the syntactic annotation used in the Uppsala Persian Dependency Treebank (UPDT). In this introductory chapter, we first give a brief description of the corpus on which the treebank is based, including the guidelines for sentence segmentation, tokenization, and morphological annotation. We then describe the princples of data selection and the overall approach to syntactic annotation. The dependency relations used in the annotation are defined in Chapter 2, and annotation examples are given in Chapter 3. 1.1 Uppsala Persian Corpus Uppsala Persian Corpus (UPC), introduced in (Seraji et al., 2012), is a modified version of the Bijankhan corpus (Bijankhan, 2004) and currently the largest freely available corpus of Persian (Farsi) with manually validated linguistic annotation. The original corpus was created from on-line texts from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art. The entire corpus consists of 2,703,265 tokens, annotated with part-of-speech tags and morpho-syntactic and partly semantic features. UPC differs from the original version by having added sentence segmentation and more consistent tokenization, making it more appropriate for syntactic annotation. 1.1.1 Sentence Segmentation and Tokenization In UPC, sentences are separated by one of the punctuation marks ‘.’, ‘!’, ‘? ’, or combinations thereof. In addition, the punctuation mark ‘:’ has been treated as a sentence separator when used to introduce a list of alternatives. Tokenization has been made more consistent, compared to the original corpus, by treating as separate tokens all words separated by whitespace or punctuation, except in cases where single words with internal white space can be identified deterministically and unambiguously. In the latter cases, listed in Appendix A, whitespace has been replaced with zero-width non-joiner to make sure that tokens in the treebank never contain internal whitespace. Clitics attached to their host words without whitespace have not been separated from their hosts but are given a special analysis in the syntactic annotation instead. 5 1.1.2 Morphological Annotation The morphological annotation in UPC consists of atomic part-of-speech tags that encode a subset of the features found in the original Bijankhan corpus. The tag set is listed with explanations in Table 1.1. ADJ CMPR ADJ INO ADJ ADJ SUP ADJ VOC ADV COMP ADV I ADV LOC ADV NEG ADV ADV TIME CLITIC CON DELM DET FW INT SYM N PL N SING N VOC NUM P PREV PRO V AUX V IMP V PA V PP V COP V PRS V SUB Comparative adjective Participle adjective Adjective Superlative adjective Vocative adjective Adverb of comparison Adverb of interrogation Adverb of location Adverb of negation Adverb Adverb of time Accusative marker Conjunction Delimiter Determiner Foreign Word Interjection Symbol Plural noun Singular noun Vocative noun Numeral Preposition Preverbal particle Pronoun Auxiliary verb Imperative verb Past tense verb Past participle verb Verb copula Present tense verb Subjunctive verb Table 1.1: Part-of-speech tags in UPC. 1.2 Data Selection for UPDT We extracted the first 10,000 sentences of UPC to serve as our treebank data, of which 6,000 have been annotated for the first release. The average sentence length in this sample is 25 words. 6 1.3 Syntactic Annotation We use a syntactic annotation scheme based on dependency structure, where each dependency relation is annotated with a functional category, indicating the grammatical function of the dependent to the head. The annotation scheme is based on Stanford Typed Dependencies (STD) (de Marneffe et al., 2006), which has become a de facto standard for English. For our Persian dependency treebank, however, we have adapted the scheme when the syntactic relations could not be adequately analyzed using existing categories in STD. Hence, the scheme has been extended to include a number of new relations, while at the same time trying to keep as many relations as possible in common with the original STD scheme for English. The dependency annotation of a sentence always forms a tree including all tokens of the sentence (including punctuation) and rooted at an artificial root node prefixed to the sentence, which means that we adopt the so-called basic version of STD (with punctuation retained), as opposed to the collapsed version, where some tokens may not correspond to nodes in the dependency structure and a single node may have more than one incoming arc. Altogether we have added 10 new relations.1 Table 1.2 lists all relations used in the syntactic annotation of UPDT, with new relations in italics. In general, every token in a sentence is assigned a syntactic head and one dependency label. However, in the case of unsegmented clitics, we use complex labels where the first label indicates the main syntactic function while subsequent labels mark clitic elements. In addition, a backward slash (\) indicates that the following element is proclitic, while a forward slash (/) marks an enclitic element. Thus, the label poss/pc is assigned to a word that has the main function poss and an enclitic pc element. By contrast, the label ccomp\poss is used for (the head of) a clausal complement with a proclitic poss element. In Table 1.2, we only list atomic labels. A complete list of all (simple and complex) labels attested in the treebank (with frequency information) can be found in Appendix B. In order to annotate and correct our syntactic annotation in a tree structure we used the free software TrEd tree editor.2 TrEd (Hajič et al., 2001) is fully programmable and customizable graphical user interface for tree-like structures and was used as the main annotation tool for the Prague Dependency Treebank. From TrEd we export annotations in the CoNLL-X format (Buchholz & Marsi, 2006), which is the official distribution format of UPDT. 1 Note that we have excluded the following relations from the original STD, for which we have not found any use: abbreviation modifier (abbrev ), agent (agent), attributive (attr ), clausal subject (csubj ), clausal passive subject (csubjpass), expletive (expl), infinitival modifier (infmod), indirect object (iobj ), participial modifier (partmod), prepositional complement (pcomp), possessive modifier (possessive), purpose clause modifier (purpcl). 2 TrEd is licensed under the GNU General Public License and is available at http://ufal.mff.cuni.cz/ pajas/. 7 Category acc acomp acomp-lvc advcl advmod amod appos aux auxpass cc ccomp complm conj cop cpobj cprep dep dep-top dep-voc det dobj dobj-lvc fw mark mwe neg nn npadvmod nsubj nsubj-lvc nsubjpass num number parataxis pobj poss preconj predet prep prep-lvc prt punct quantmod rcmod rel root tmod xcomp Description Accusative marker Adjectival complement Adjectival complement in light verb construction Adverbial clause modifier Adverbial modifier Adjectival modifier Appositional modifier Auxiliary Passive auxiliary Coordination Clausal complement Complementizer Conjunct Copula Object of comparative Comparative modifier Dependent Topic Dependent Vocative Dependent Determiner Direct object Direct object in light verb construction foreign word Marker Multi-word expression Negation modifier Noun compound modifier Nominal adverbial modifier Nominal subject Nominal subject in light verb construction Passive nominal subject Numeric modifier Element of compound number Parataxis Object of a preposition Possession modifier Preconjunct Predeterminer Prepositional modifier Prepositional modifier in light verb construction Phrasal verb particle Punctuation Quantifier phrase modifier Relative clause modifier Relative Root Temporal modifier Open clausal complement Table 1.2: Syntactic relations in UPDT with new relations in italics. 8 Chapter 2 Dependency Relations This chapter provides a systematic description of dependency relations in UPDT. Every relation is given a definition followed by one or more examples consisting of five elements: Persian sentence (written right to left), English gloss, English translation, annotation with Persian words, annotation with English glosses. Annotations are in the format relation(head, dependent). 2.1 Accusative Marker (acc) An accusative marker is a clitic attached to the direct object.1 @ K AK ðP ð ÉJ m' AK. @P IJ . ËX ª¯@ ð ø AJ KX àPñK YKñJ .YKPú× Adolf Born the world-ez reality -râ with imagination and dream link hits. Adolf Born links the reality world with imagination and dream. , @P) acc( AJ KX acc(world, -râ) 2.2 Adjectival Complement (acomp) An adjectival complement of a verb is an adjectival phrase which functions as the complement (like an object of the verb). áK @ Ø AK Q¢ éK. èYJ ®« . YPú× áºÜ This idea to thought impossible reaches. This idea seems to be impossible. Ø AK ) acomp( YPú×, áºÜ acomp(reaches, impossible) 1 When the direct object is definite it is always followed by râ; when the direct object is indefinite but individuated it may or may not be followed by râ under certain conditions (Lazard, 1992). 9 2.3 Adjectival Complement in LVC (acomp-lvc) An adjectival complement in a light verb construction (LVC) forms a complex lexical predicate together with the verb.2 K øAëèñJ ð øXA«Q« øAëQ g ø @QK YK @P Xñk áë X YK AK. ð@ PAK@ ¸PX ø@QK. ¬ñËAÓA . Õæ J» For perception-ez works-ez she/he should mind-ez oneself -râ receptive-ez things-ez unusual and methods-ez unfamiliar do. For understanding her/his work we should open our minds to unusual things and unfamiliar methods. , ø@QK YK ) acomp-lvc( Õæ J» acomp-lvc(do, receptive) 2.4 Adverbial Clause Modifier (advcl ) An adverbial clause modifier is a clause modifying the verb (temporal clause, conditional clause, etc.). ,YJ AK QK áK @ ém' AJk ø@QK. YK@ ñKú× Aë à @ øAëéJ¯AK à@ QÂëð . èXP Yg IPX m ð Ñm.k J j . XQ à P@Q¯ PñK. QÓ èPAJ J ¯X IÓA If this researchers correct guess hit be, findings-ez/results-ez they can for diagnosis-ez volume and thickness-ez exact-ez planet-ez aforementioned place take. If these researchers made a correct guess, their findings/results can determine the exact volume and thickness of the aforementioned planet. advcl ( XQ Ã, èXP ) advcl (take, hit) 2.5 Adverbial Modifier (advmod ) An adverbial modifier of a word is a (non-clausal) adverb or adverbial phrase that serves to modify the meaning of the word. 2 The internal complements of LVCs need to be distinguished from ordinary complements, because the latter sometimes duplicate the function of the internal complements. An alternative would have been to analyze the internal complements using the mwe relation for multi-word expressions. However, because LVCs are so prevalent in Persian, we have chosen to distinguish them from other multi-word expressions like compound prepositions and conjunctions. 10 . PAK@ Qå PX l'. PYJK. àPñK . YJ P H Ag éK. HAK The arts-ez Born gradually in journals to publish reached. Born’s arts were published gradually in journals. advmod ( YJ P, l' . PYJK.) advmod (reached, gradually) .XñK. èXAÓ @ é Òë ð@ She/he always ready was. She/he was always ready. advmod ( èXAÓ @, é Òë) advmod (ready, always) 2.6 Adjectival Modifier (amod ) An adjectival modifier of a nominal is any adjectival phrase that serves to modify the meaning of the nominal. KAÓ úG AJ KX ÈAJ.KX éK. ð@ AêÓX@ øP@QºK úÃYKP PX QKúGY . I@ She/he to after-ez world-ez more-lasting in life-ez repetitive-ez people is. She/he is after a more lasting world in people’s repetitive life. KAÓ , QKúGY ) amod ( úG AJ KX amod (world-ez, more-lasting) amod ( úÃYKP , øP@QºK) amod (life-ez, repetitive-ez) 2.7 Appositional Modifier (appos) An appositional modifier of a nominal is another nominal that serves to modify the first. It includes parenthesized examples. XPAKQK , é@ Q¯ ék PAg PñÓ@ QK Pð . AK. XAK.@ÐC@ PX ð@ ¯CÓ .XQ» HA . , Qñ» She/he in Islam Abad with Bernard Kouchner, minister-ez affairs-ez foreignez French, meeting did. She/he met Bernard Kouchner, French foreign mininster, in Islam Abad. . , QK Pð ) appos( XPAKQK appos(Bernard, minister) 2.8 Auxiliary (aux ) An auxiliary of a clause is a non-main verb of the clause, e.g. modal auxiliary, . (be), and á @X (have) in a composed tense. àXñK 11 ð@ éJ@Y K Xñk. ð CJ.¯ é» XPA . @P ø Q g Yë@ñkú× . I@ She/he wants something -râ build that before exist not-had is. She/he wants to create something that has not existed before. ) . , Yë@ñkú× aux ( XPA aux (build, wants) K , I@ ) aux ( éJ@Y aux (not-had, is) 2.9 Passive Auxiliary (auxpass) A passive auxiliary of a clause is a non-main verb of the clause which contains the passive information. . Y èYK X úæÖÞ éÓñ¢ JÓ P@ h. PAg èPAJ á Ëð@ The first planet outside of system-ez solar seen became . The first planet outside the solar system was sighted. auxpass( èYK X, Y) auxpass(seen, became) 2.10 Coordination (cc) A coordination is the relation between an element of a conjunct and the coordinating conjunction word of the conjunct. We take one conjunct of a conjunction, normally the first, as the head of the conjunction. @ K AK ðP ð ÉJ m' AK. @P IJ . ËX ª¯@ ð øAJ KX àPñK YKñJ .YKPú× Adolf Born the world-ez reality -râ with imagination and dream link hits. Adolf Born interfuses/links the reality world with imagination and dream. cc( ÉJ m', ð) cc(imagination, and) ñkPX á Jk àñ J»A K àA J»AK IËðX èXQºK hQ¢Ó @P úæ@ AÓ@ Xð Q ¯@ ð@ . I@ She/he added but govement-ez Pakistan so far such solicitation -râ raise not-done is. She/he added but the goverment of Pakistan has not yet raised such solicitation. cc( èXQºK , AÓ@) cc(not-done, but) 12 2.11 Clausal Complement (ccomp) A clausal complement of a verb or adjective is a dependent clause with an internal subject which functions like an object of the verb, or adjective. Ó é» YëXú× àA Aëúæ PQK. á ÂKAJ ¸PñK ñJ K P@ QKBAK. ñJ »ñK QîD PX AëéJK Që . I@ Studies indication gives that average-ez costs in city-ez Tokyo higher than New York is. Studies show that average costs in Tokyo city are higher than in New York. ccomp( YëXú×, QKBAK.) ccomp(gives, higher) AK ðA¿ éÓ@X@ Aë H YÓ áK @ é» YëXú× àA Aëúæ PQK. . I @X Studies indication gives that this search until long-times continuation had. Studies show that this search continued for a long time. @X ) ccomp( YëXú×, I ccomp(gives, had) 2.12 Complementizer (complm) A complementizer of a clausal complement (ccomp) is the word introducing it. It will be the subordinating conjunction é» (that). Ó é» YëXú× àA Aëúæ PQK. á ÂKAJ ¸PñK ñJ K P@ QKBAK. ñJ »ñK QîD PX AëéJK Që . I@ Studies indication gives that average-ez costs in city-ez Tokyo higher than New York is. Studies show that average costs in Tokyo city are higher than in New York. complm(QKBAK., é») complm(higher, that) AK ðA¿ éÓ@X@ Aë H YÓ áK @ é» YëXú× àA Aëúæ PQK. . I @X Studies indication gives that this search until long-times continuation had. Studies show that this search continued for a long time. @X , é») complm( I complm(had, that) 2.13 Conjunct (conj ) A conjunct is a relation between two elements connected by a coordinating conjunction, such as ð (and), AK (or), etc. We treat conjunctions asymmetrically: 13 The head of the relation is the first conjunct and other conjunctions depend on it via the conj relation. @ K AK ðP ð ÉJ m' AK. @P IJ . ËX ª¯@ ð øAJ KX àPñK YKñJ .YKPú× Adolf Born the world-ez reality -râ with imagination and dream link hits. Adolf Born interfuses/links the reality world with imagination and dream. conj ( ÉJ m', AK ðP) conj (imagination, dream) 2.14 Copula (cop) A copula is the relation between the complement of a copula verb and the copula verb. We take the copula verb as a dependent of its complement, except when the complement is a prepositional phrase (second example below). . YJÓ Që ¹K àPñK . I@ Born an artist is . Born is an artist. Që , I@ ) cop( YJÓ cop(artist, is) ñÓ AK. áK @ XPñÓ PX Ó ø@QK. AÒZ@X Ñë éºJ. àB . ÐXñK. AJ.KP@ PX HPñ In case-ez this with officials-ez network likewise constantly for consultation in contact was . In this case I was in contact constantly with network officials for consultation too. root(ROOT, ÐXñK.) root(ROOT, was) prep( ÐXñK., PX) prep(was, in) 2.15 Object of Comparative (cpobj ) The object of a comparative is the complement of a preposition-like conjunction or adverb introducing a comparative modifier (cf. English ‘a child’ in ’he cries like a child’). QÊ ¯ ÈAjJ@ ø@QK. ðP áK @ AK. H@ .XQ Ãú× P@Q¯ AÓP @ XPñÓ Õç' XBAK ÉJÓ PP@ This method for extraction-ez metals-ez with value like-ez palladium under test arrangement takes. This method is tested for the extraction of valuable metals like palladium. cpobj ( ÉJÓ, Õç' XBAK) cpobj (like-ez, palladium) 14 2.16 Comparative Modifier (cprep) The comparative modifier relation is used for comparative constructions that resemble prepositional phrases but are introduced by conjunctions or adverbs and can be analyzed as elliptical comparative clauses cf. English ‘like a child’ in ‘he cries like a child’). QÊ ¯ ÈAjJ@ ø@QK. ðP áK @ . H@ .XQ Ãú× P@Q¯ AÓP @ XPñÓ Õç' XBAK ÉJÓ PP@AK This method for extraction-ez metals-ez valuable like-ez palladium under test arrangement takes. This method is tested for the extraction of valuable metals like palladium. QÊ ¯ , ÉJÓ) cprep( H@ cprep(metals, like) 2.17 Dependent (dep) The dependent relation is used when it is impossible to determine a more precise dependency relation between two words, or when the dependency relation is deemed to rare or insignificant to merit its own label. In the following example, the past participle verb éJ¯Qà (taken) is placed in circumposition3 to emphasize the preposition P@ (from) as a point of departure. Aëém' ø@QK á ËX ñ P@ ... ð PñÓQÓ ,qÊK øQK ðA AK éJ¯Qà . . ð èXA øAëø PAQK .AêËAÃP QK . ø@QK. èYJ j K ... from illustrations-ez simple and pleasant for children taken , to images-ez bitter, mysterious and complex for adults. ... from simple and pleasant illustration for children to bitter, mysterious and complex images for adults. dep( P@ , éJ¯QÃ) dep(from, taken) 2.18 Topic Dependent (dep-top) The topic dependent relation is used for a fronted element that introduces the topic of a sentence. It is often anaphorically related to the subject or object of the main clause. 3 Circumposition implies a position where a prepositional phrase is surrounded by prepositions, more specifically, containing the presence of a preposition and a postposition. 15 ¯ ð@ . XðPú× QÂK X øAg. Qº She/he her/his-thought place-ez other goes. She/he her/his thought goes elsewhere. dep-top( XðPú×, ð@) dep-top(goes, she/he) 2.19 Vocative Dependent (dep-voc) The vocative dependent relation is used for a vocative element, usually a proper name or pronoun. ÐAÖß @P IK @ Y « YK AK. ñK @P @X . úæ» Dara you should your-meal -râ finish do. Dara you should finish your meal. , @P @X) dep-voc( úæ» dep-voc(do, Dara) 2.20 Determiner (det) A determiner is the relation between a nominal head and its determiner. . èXP Yg IPX QK áK @ ém' AJk ...YJ AK à@ QÂëð If this researchers correct guess hit be ... If these researchers made a correct guess ... QÂëð QK, áK @) det( à@ det(researchers, this) 2.21 Direct Object (dobj ) The direct object of a verb is the nominal which is the (accusative) object of the verb. @ K AK ðP ð ÉJ m' AK. @P IJ . ËX ª¯@ ð ø AJ KX àPñK YKñJ . YKPú× Adolf Born the-world-ez reality -râ with imagination and dream link hits . Adolf Born interfuses/links the reality world with imagination and dream. , øAJ KX ) dobj ( YKPú× dobj (hits, world) 16 2.22 Direct Object in LVC (dobj-lvc) A direct object in a light verb construction (LVC) forms a complex lexical predicate together with the verb. ... XQ» é¯A @ øð She/he addition did ... She/he added ... ) dobj-lvc( XQ», é¯A@ dobj-lvc(did, addition) 2.23 Foreign Word (fw ) Complete phrases or sentences quoted in another language than Persian are not given an internal syntactic analysis. Instead all the words are connected in a chain with the first word as the head and all relations marked as fw. (The incoming arc to the head of the chain is however assigned a regular syntactic relation reflecting its role in the larger sentence.) ù® ... ÐAÒªË@ éêk. ñK. : I®Ã ð@ She/he said: face praying (for rain) clouds ... She/he requested: rain from clouds with the blessings of his face ... fw ( éêk. ñK., ù® ) fw (face, praying) 2.24 Marker (mark ) A marker of an adverbial clause modifier (advcl) is the word introducing it. It will be a subordinating conjunction different from é» (that), e.g., the multi-word expressions é» úæ¯ð (when), é» úÍAg PX (while), é» QÃ@ (if), etc. ... èYJ .k @P XñÒm× é®K é» úÍAg PX YÔg @ Ahmad while collar-ez Mahmoud -râ attached ... While Ahmad attached Mahmoud’s collar... mark ( èYJ .k, [é» úÍAg]PX) mark (attached, while) . èXP Yg IPX QK áK @ ém' AJk ... YJ AK à@ QÂëð If this researchers correct guess hit be ... If these researchers made a correct guess ... , èXP ) mark ( ém' AJk mark (hit, if) 17 2.25 Multi-Word Expression (mwe) The multi-word expression (modifier) relation is used for certain multi-word expressions that behave like a single function word, in particular conjunctions @ Xñk. ð AK. (despite), é» áK @ éK. ék. ñK AK. and prepositions. Examples include: é» áK Òë ð (and also), é» áK @ øAg. éK. é» áK @ QK. èðC« (in addition to), á Jj @ Q£Ag éK. (because of), ÉJ J.¯ P@ (such as), AÓ@ ð (and but), (instead of), é» áK éK. (for the reason of), Q£Ag éK. (for the sake of), ÉJ ËX éK. (because of), IÊ« é» áK @ AK (rather than). The first token of the multi-word expression is treated (with respect to) as the head of the expression with subsequent elements attached in a chain with each word being dependent on the immediately preceding one with the mwe relation. IÊ« Pñ á Jk èXñK. P@YÓ PX ø@èPAJ éK. XAg áK @ é» Xñú× I@ . éK. PQË So thought become that this vibration to reason-ez gravity-ez a-planet in orbit been is. It is thought that this vibration has been due to the gravity of a planet in orbit/it is thought that, this vibration has been caused by the gravity of a planet in orbit. ) mwe( éK., IÊ« mwe(to, reason) 2.26 Negation Modifier (neg ) The negation modifier is the relation between a negation word and the word it modifies. ¯ éºÊK. ú«AÒJk. @ HC éK J ®m ' hQ£ ªÓ . èXñK. úÆJëQ Project-ez research no issues-ez social but cultural been. The research project has not been a social issue but a cultural problem. , éK ) ªÓ neg( HC neg(issues, no) 2.27 Noun Compound Modifier (nn) A noun compound modifier of a nominal is any noun that serves to modify the head noun. In UPDT, this relation is used also for compound names, with the first name as the head. 18 , é@ Q¯ ék PAg PñÓ@ QK Pð XPAKQK . AK. XAK.@ÐC@ PX ð@ ¯CÓ .XQ» HA . , Qñ» She/he in Islam Abad with Bernard Kouchner, minister-ez affairs-ez foreignez French, meeting did. She/he met Bernard Kouchner, French foreign mininster, in Islam Abad. . , Q ñ» ) nn( XPAKQK nn(Bernard, Kouchner) 2.28 NP as Adverbial Modifier (npadvmod ) This relation captures various places where something syntactically a noun phrase is used as an adverbial modifier in a sentence. These usages include: (i) a measure phrase, which is the relation between and adjective, adjective/adverb/prepositional modifier and the head of a measure phrase modifying it; (ii) extent phrases that modify verbs but are not objects; (iii) financial constructions involving an adverbial noun phrase (iv) floating reflexives; and (v) certain other absolute noun phrase constructions. A temporal modifier (tmod) is a subclass of npadvmod which is distinguished as a separate relation. J@ ½K P@ Q . éK. IAÓ K. øQËA¿ 20 ... K@ àYK àA¾ ... YëXú× ø PQ ... 20 calory more than a cup yogurt to body energy gives ... ... 20 more calories than a cup of yogurt provides energy to the body ... K, øQËA¿) npadvmod (Q . npadvmod (more, calory) 2.29 Nominal Subject (nsubj ) A nominal subject is a noun phrase which is the syntactic subject of a clause. The governor of this relation might not always be a verb: when the verb is a copula verb, the root of the clause is the complement of the copula verb, which can be an adjective or noun. (When the complement is a prepositional phrase, the copula is taken as the root of the clause.) . èXP Yg IPX QK áK @ ém' AJk ... YJ AK à@ QÂëð If this researchers correct guess hit be ... If these researchers made a correct guess ... QÂëð QK) nsubj ( èXP , à@ nsubj (hit, researchers) 2.30 Nominal Subject in LVC (nsubj-lvc) A nominal subject in a light verb construction (LVC) forms a complex lexical predicate together with the verb. 19 ! ñ AK foot become! Get up! / On your feet! nsubj-lvc(ñ, AK) nsubj-lvc(become, foot) 2.31 Passive Nominal Subject (nsubjpass) A passive nominal subject is a noun phrase which is the syntactic subject of a passive clause. .Y èYK X úæÖÞ éÓñ¢ JÓ P@ h. PAg èPAJ á Ëð@ The first planet outside of system-ez solar seen became. The first planet outside the solar system was sighted. nsubjpass( èYK X, èPAJ ) nsubjpass(seen, planet) 2.32 Numerical Structure (num) A numeric modifier of a noun is any number phrase that serves to modify the meaning of the noun. YJ ®ñà .XPñkú× 3 ÐA Sam 3 sheep eats. Sam eats 3 sheep. num( YJ ®ñÃ, 3 ) num(sheep, 3) 2.33 Element of Compound Number (number ) An element of a compound number is a part of a number phrase or currency amount. IÓ@ ÊJ Ó 466 YK AK. ð@ Q« PBX àñJ .YJ» I k@XQK She/he should 466 million dollar compensation pay do. She/he should pay $ 466 million for compensation. ÊJ Ó) number (PBX, àñJ number ($, million) 20 2.34 Parataxis (parataxis) The parataxis relation (from Greek for ‘place side by side’) is a relation between the main verb of a clause and other sentential elements, such as a sentential parenthetical, or a clause after colon (:) or semicolon (;). .Y QK X H@éPYÓ :PXAÓ Mother: your-school late became. Mother: you are late for school. parataxis(PXAÓ, QK X) parataxis(mother, late) 2.35 Object of a Preposition (pobj ) The object of a preposition is the head of a noun phrase following the preposition. (The preposition in turn may be modifying a noun, verb, etc.) úÃYKP éJ »QK PX ð@ ... YJ»ú× She/he in Turkey life does ... She/he lives in Turkey. pobj (PX, éJ »QK) pobj (in, Turkey) 2.36 Possession Modifier (poss) The possession modifier relation holds between a noun and its possessive determiner, or a genitive complement. In Persian a noun is usually followed by a modifier or a genitive complement with ezâfe marking on the head noun.4 The relation poss is used when the modifier is a noun, pronoun or infinitive, except in the case of compound names where the nn relation is used instead. (For adjectival and participial modifiers in ezâfe constructions, the amod relation is used.) In case of lexicalized units without ezâfe the relation is defined as mwe. NB: The name poss is unfortunate for this relation, which covers much more than the narrow possessive relation. However, for conformance with STD for English, we have retained the label rather than renaming it to genitive modifier (genmod ), or even nominal modifier (nmod ), which would be more appropriate. 4 An ezâfe (-ez) is an unstressed enclitic particle that links the elements within noun phrase, adjective phrase and prepositional phrase indicating the semantic relation between the joined elements and is represented by the short vowel /e/ after consonants or /ye/ after vowels. The orthographic realization of ezafe occurs only in special cases where the element ends in the vowels /u:/, and /a/, as well as where the element ends in the silent “h” (the silent “h”; “ è”, is pronounced as e, for instance, in “ èYJK . ” (servant) and “ øèYJK . ” when it takes ezafe.) 4 21 ém'. IX hand-ez child child’s hand , ém'.) poss( IX poss(hand-ez, child) 2.37 Preconjunct (preconjunct) A preconjunct is the relation between the head of a coordinated phrase and a word that appears at the beginning bracketing a conjunction (such as either, both, neither in English). P øAJ KX PX ék ð@ ... .XP@X m' Ðñm.' øAJ KX PX ék ð úæAK ... she/he also in the-world-ez mathematics and also in the-world-ez astronomy expertise has. ... she/he has expertise in the world of both mathematics and astronomy. preconj (PX, ék) preconj (in, also) 2.38 Predeterminer (predet) A predeterminer is the relation between a noun and a word that precedes and modifies the meaning of its determiner. ... AëÈA áK @ ÐAÖß All-ez these years ... All these years ... predet( AëÈA, ÐAÖß) predet(years, all) 2.39 Prepositional Modifier (prep) A prepositional modifier of a verb, adjective, or noun is any prepositional phrase that serves to modify the meaning of the verb, adjective, noun, or even another preposition. ... YJ»ú× úÃYKP éJ »QK PX ð@ She/he in Turkey life does ... She/he lives in Turkey. , PX) prep( YJ»ú× prep(does, in) 22 2.40 Prepositional Modifier in LVC (prep-lvc) A prepositional modifier in a light verb construction (LVC) forms a complex lexical predicate together with the verb. JÓ HAK . PAK@ Qå PX àPñK . YJ P H Ag éK. ø Yª Works-ez Born in publications-ez numerous to print reached . His works appeared/published in numerous publications. prep-lvc( YJ P, éK.) prep-lvc(reached, to) 2.41 Phrasal Verb Particle (prt) The verb particle relation holds between the verb and its particle. . YÓ @ Yë@ñk PX HPñ ék éK. To what form in will come . How will it be. prt( YÓ @, PX) prt(come, in) 2.42 Punctuation (punct) This relation is used for any piece of punctuation in a clause. . Y èYK X úæÖÞ éÓñ¢ JÓ P@ h. PAg èPAJ á Ëð@ The first planet outside of system-ez solar seen became . The first planet outside the solar system was sighted. punct( èYK X, . ) punct(seen, .) 2.43 Quantifier Phrase Modifier (quantmod ) A quantifier modifier is an element modifying the head of a quantifier phrase. (These are modifiers in complex numeric quantifiers, not other types of ‘quantification’.) èX XðYg I«A .YëXú× àA @P ÁKP éK. é®J ¯X Clock about-ez ten minute to ring-bell -râ show does. The clock shows about ten minutes before the bell. quantmod ( èX, XðYg) quantmod (ten, about) 23 2.44 Relative Clause Modifier (rcmod ) A relative clause modifier of a noun is a relative clause modifying the noun. The relation points from the noun to the head of the relative clause, normally a verb. QK ñ éK. I@ øPAg. ð@ ÉJ m' é¢J k PX ¡® ¯ é» @P úG AëQ g ð@ .Y»ú× She/he things -râ that only in scope-ez imagination-ez she/he/her/his running is to illustration draw. She/he only portrays things that is running within the scope of his imagination. rcmod ( úG AëQ g, øPAg.) rcmod (things, running) 2.45 Relative (rel ) A relative of a relative clause is the relative marker “ é»” /ke/ introducing it (and which cannot be analyzed as a relative pronoun). QK ñ éK. I@ øPAg. ð@ ÉJ m' é¢J k PX ¡® ¯ é» @P úG AëQ g ð@ .Y»ú× She/he things-ez -râ that only in scope-ez imagination-ez she/he/her/his running is to illustration draw. She/he only portrays things that is running within the scope of his imagination. rel ( øPAg., é») rel (running, that) 2.46 Root (root) The root grammatical relation points to the root of the sentence. A fake node ‘ROOT’ is used as the governor. The ROOT node is indexed with ‘0’, since the indexation of real words in the sentence starts at 1. The root of the sentence is normally a verb but can in the case of copula constructions be a noun, pronoun, adjective and adverb. The copula is taken as the root of the sentence only when its complement is a prepositional phrase (analyzed as prep). g é» I@ . Ag ú¾J. ËA éJ@Y K A«X@ èAÆjJ ë àPñK . I@ Born never claim not-had is that creator-ez a-style particular is. Born has never claimed that he is a creator of a particular style. K ) root(ROOT, éJ@Y root(ROOT, not-had) 24 YJÓ Që ¹K ð@ . I@ She/he an artist is. She/he is an artist. Që ) root(ROOT, YJÓ root(ROOT, artist) ¯ ð@ PA¿ èXAªË@ ñ . I@ Work-ez she/he/her/his outstanding is. His work is outstanding. root(ROOT, èXAªË@ ñ¯) root(ROOT, outstanding) IJ ¯ PX Xñk AK. àXQ» . m ÈAg PX éJ K @ øñÊg. Ð@YÓ ,Èð@ øAë IÒ . I@ In parts-ez first, constantly front-ez mirror in position-ez talk doing with herself/himself is. In the first parts, she/he is constantly talking to herself/himself in front of a mirror . ) root(ROOT, I@ root(ROOT, is) 2.47 Temporal Modifier (tmod ) A temporal modifier of a verb, noun or adjective is a bare noun constituent that serves to modify the meaning of the constituent by specifying a time. (Other temporal modifiers are prepositional phrases and are introduced as prep.) éJ j øYJJ Ëð@P PX éJ Yà .Y éJ» . . JK ñKñK. Õç'A g Mrs. Bhutto thursday-ez last in Rawalpindi killed became. Mrs. Bhutto was killed last thursday in Rawalpindi. , éJ. j . JK ) tmod ( éJ» tmod (killed, thursday) 2.48 Open Clause Complement (xcomp) An open clause complement (xcomp) of a verb or adjective is a clause complement without its own subject, whose reference is determined by an external subject. These complements are always non-finite. 25 . á ¯P èXAÓ @ YJ @ú× she/he stands ready-ez to-go. she/he stands ready to go. xcomp( èXAÓ @, á ¯P) xcomp(ready, to-go) 26 Chapter 3 Example Sentences In this chapter we present a more detailed study of a few sentences to provide an understanding of how different relations in UPDT are used together to build a complete analysis of a Persian sentence. The sentences have been selected to illustrate as many different relations as possible, in particular those that have been added specifically for the analysis of Persian, and we will not repeat the same analysis over and over again. For the convenience of readers that do not understand Persian, dependency trees are depicted both with Persian tokens and with English glosses. 3.1 Example 1 Figures 3.1 and 3.2 show a sentence consisting of the subordinate clause “If this researchers correct guess hit-3sg-pp1 be-3pl-sub”2 (If these researchers made a correct guess) and the main clause “findings-ez they can for diagnosis-ez volume and thickness-ez exact-ez planet-ez mentioned under-ez usage place take-3sgsub” (their findings can be used to determine the volume and the exact thickness of the mentioned planet). The subordinate clause is an adverbial clause modifier with the root “hit-3sgpp” marked by the label advcl and governing the subordinate conjunction “if”, the nominal subject “researchers”, which in turn is modified by the determiner “this”, the adverbial modifier “correct”, the preverbal direct object “guess” in light verb construction with “hit-3sg-pp”, and the auxiliary verb “be-3pl-sub”. The second part of the sentence consists of the main clause with the sentence root “take-3sg-sub”. The main clause contains the nominal subject “findings” with its genitive complement “their” connected with an ezâfe construction, the auxiliary verb “can”, two prepositional modifiers, and the preverbal direct object “place” in light verb construction with “take-3sg-sub”. The first prepositional modifier starts with the preposition “for” followed by the prepositional object “diagnosis” with the genitive complement “volume” connected by an ezâfe construction. The head of the genitive complement “volume” is coordinated with “exact thickness” (with the adjective “exact” modifying the head “thickness”) 1 Past Participle 2 Subjunctive 27 گیرد root زده advcl مورد قرار . dobj-lvc punct prep استفاده pobj برای prep ، یتواند یافتههای م punct nsubj aux پژوهشگران آنها باشند حدس درست nsubj advmod dobj-lvc aux poss تشخیص pobj حجم چنانچه mark این det poss سیاره ضخامت و cc conj poss مزبور دقیق amod amod یتواند برای تشخیص حجم و ضخامت دقیق سیاره چنانچه این پژوهشگران درست حدس زده باشند یافتههای آنها م مزبور مورد استفاده قرار گیرد. Figure 3.1: Syntactic annotation of a Persian sentence illustrated with Persian words. 28 take root hit advcl ، findings can punct nsubj aux for prep under prep . place dobj-lvc punct diagnosis usage pobj pobj if researchers correct guess be their mark nsubj advmod dobj-lvc aux poss volume this det poss and thickness planet cc conj poss exact mentioned amod amod Figure 3.2: Syntactic annotation of the Persian sentence “If these researchers made a correct guess, their findings can be used to determine the volume and the exact thickness of the mentioned planet.” 29 کنیم root کنیم advcl ذهن ، باید punct aux و درک dobj-lvc cc برای آثار بتوانیم mark aux dobj این mwe که mwe زیبای amod این det را هنرمند poss acc ببریم conj لذت از prep dobj-lvc dobj acomp خود را poss acc آنها pobj Figure 3.3: Syntactic annotation of the Persian sentence illustrated with Persian words. and the entire coordinated noun phrase is modified by the genitive complement “mentioned planet” (with the adjective “mentioned” modifying the head “planet”). The second prepositional modifier “under” has its prepositional object “usage” as its child node. Finally, the final punctuation has the root of the sentence as its head, as all punctuation marks that function as sentence separators. Example 2 Figures 3.3 and 3.4 show a sentence starting with the adverbial clause complement “for this that can-1pl-sub works-ez beautiful-ez this artist -râ perception do-1pl-sub and of those pleasure take-1pl-sub” (In order to be able to understand the beautiful works of this artist and enjoy those) and ending with the main clause “should mind-ez own -râ receptive-ez things-ez unusual and methods-ez 30 چیزهای poss شیوههای و غیرعادی amod cc conj باید ذهن خود,برای این که بتوانیم آثار زیبای این هنرمند را درک کنیم و از آنها لذت ببریم .را پذیرای چیزهای غیرعادی و شیوههای نامالوف کنیم 3.2 . punct پذیرای نامالوف amod do root do advcl ، should punct aux for can works aux dobj mark this mwe that mwe perception and dobj-lvc cc beautiful amod artist poss râ acc this det take conj of pleasure prep dobj-lvc own poss râ acc things poss unusual amod those pobj Figure 3.4: Syntactic annotation of the Persian sentence “In order to be able to understand the beautiful works of this artist and enjoy those, we should open our mind to unusual things and unfamiliar styles.” 31 . punct mind receptive dobj acomp and cc methods conj unfamiliar amod unfamiliar do-1pl-sub” (we should open our mind to unusual things and unfamiliar styles). The adverbial clause modifier with the head “do-1pl-sub” consists of the subordinate conjunction “for this that (in order to)”, the auxiliary verb “can”, the direct object “works-ez beautiful-ez this artist -râ” (the beautiful works of this artist), the preverbal direct object “perception” in light verb construction with “do-1pl-sub”, the coordinating conjunction “and”, and the coordinated verb phrase “of those pleasure take-1pl-sub” (enjoy those). Here, we try to have a closer look at the complex subtrees in this adverbial clause: The subordinate conjunction “for this that (in order to)” is composed of three words. The first word “for” is labeled as mark and placed as the head node. The rest of the words are connected into a chain, annotated as a multi-word expression with the label mwe. The direct object subtree with the label dobj has “works” as its head, modified by the adjective “beautiful”, the genitive complement “artist” with its determiner “this” in ezâfe construction, and the accusative marker -râ with scope over the entire noun phrase. The coordinated verb phrase has the head node “take-1pl-sub” with the label conj, governing the prepositional modifier “of” with its prepositional object “those” and the preverbal direct object “pleasure” in light verb construction with “take1pl-sub”. The main clause is rooted at the verb do-1pl-sub, which governs the auxiliary verb “should”, the direct object “mind-ez own -râ” (our3 mind), and the adjectival complement “receptive-ez things-ez unusual and methods-ez unfamiliar” (receptive to unuusal things and unfamiliar styles). The direct object is headed by “mind”, which is linked to its genitive complement “own” by an ezafe construction; the accusative marker -râ is attached directly to the direct object head noun. The adjectival complement includes the adjective “receptive” as the head of the subtree followed by the genitive complement “things”, modified by the adjective “unusual” (with ezâfe) and coordinated with “and methodsez unfamiliar” (and unfamiliar methods)”, where “unfamilar” is an adjectival modifier of “methods”. 3.3 Example 3 The trees depicted in Figures 3.5 and 3.6 show a sentence composed of the copula verb “be-3sg-pres” with the adverbial modifier “maybe”, the nominal subject “fame” with its determiner “this”, and a complex prepositional modifier introduced by the preposition “to”.4 The prepositional object of “to” is headed by “sake”, which is connected to its genitive complement “works” by an ezâfe construction with the relation poss. The noun “works” subsequently takes the determiner “series” and the adjectival modifier “famous-her/his” (in an ezâfe construction), and is modified by a relative clause. Since the node “famous3 The pronominal subject is implied in the verb since verbs in Persian inflect for person and number 4 Normally, the copula verb would not be taken as the root of the sentence but as a dependent of the predicative complement. However, in accordance with the Stanford dependencies for English, we make an exception when the predicative complement has the form of a prepositional phrase, which is then treated as a prepositional modifier of the copula verb. 32 باشد root شاید به شهرت advmod nsubj prep . punct کارهای خاطر pobj این det poss انجام rcmod مشهورش سری det amod/pc یشد م با prep cop که dep تکنیک pobj با prep بورونوگرافی nn گراوور که rel عنوان pobj nn ) تلفیقی ( punct appos punct از prep اسم pobj شیوه conj طنزآمیز amod و cc کارش poss/pc شاید این شهرت به خاطر سری کارهای مشهورش باشد که با عنوان بورونوگرافی )تلفیقی طنزآمیز یشد. از اسم و شیوه کارش( که با تکنیک گراوور انجام م Figure 3.5: Syntactic annotation of the Persian sentence illustrated with Persian words. 33 be root fame to nsubj prep maybe advmod this det sake pobj . punct works poss accomplishment rcmod series famous-her/his det amod/pc that with rel prep that dep by was-becoming prep cop title pobj technique pobj Bronography nn engraving nn ( an-incorporation ) punct appos punct humorous of amod prep name pobj and technique cc conj work-her/his poss/pc Figure 3.6: Syntactic annotation of the Persian sentence “Perhaps this reputation is well known for the series of his famous works that were done as Bronography (humorous incorporation of the name and style of his work) by the engraving technique.” 34 her/his” includes a pronominal clitic,5 the relation is marked by the complex label amod/pc, indicating the main function amod together with a pronominal clitic pc. The relative clause modifier is introduced by the relativizer “ke”, which is not a relative prounoun and therefore marked with the underspecified label rel. The root of the relative clause is the predicative complement “accomplishment”, which governs two prepositional modifiers and the copula verb “become3sg-past-perfect-continuous”. In addition, there is a misplaced relativizer “ke” (probably the result of an editing mistake), which is analyzed as dep due to the fact that the grammatical construction cannot be conceived properly. The first prepositional modifier inside the relative clause has “title” as its prepositional object, modified by the noun compound modifier “Bronography”, which in turn has an appositional modifier in parentheses. The appositional modifier is headed by “an-incorporation” with the adjectival modifier “humorous” and the prepositional modifier “of name and technique-ez work-her/his”. The latter contains a prepositional object consisting of two coordinated nouns, “name” and “technique”, the second of which has a genitive complement in ezâfe construction. 3.4 Example 4 Figures 3.7 and 3.8 show a sentence introduced by the coordinating conjunction “but” and consisting of two main clauses coordinated by “and”: “always ready been that methods herself/himself râ to way courageous change give” (he was always ready to change his styles in a courageous way) and “to sake this in among works her/his variation numerous râ witness-are : from illustrations simple and pleasant for children taken to images bitter , mysterious and complex for adults” (because of this we see a numerous diversity among his works: from simple and pleasant illustrations for children to bitter, mysterious and complex images for adults). The first clause consists of the predicative complement “ready”, the root of the tree, governing the adverbial modifier “always”, the copula verb “been”, and a clausal complement of the adjective “ready”. The clausal complement is introduced by the complementizer “ke” and headed by the verb “give”, governing the direct object “methods herself/himself râ”, the prepositional modifier “to way courageous”, and the preverbal direct object “change” in light verb construction with the verb “give”. The second clause is headed by “witness” with the copula “are” attached to it, hence the complex label conj /cop. It governs two prepositional modifiers and a direct object. The first prepositional modifier consists of “to” which takes “sake” with its determiner “this” as its object, while the second prepositional modifier consists of the multiword preposition “in among” and the prepositional object “works” with the possessive modifier “her/his”. In addition, two more prepositional modifiers are embedded in this clause, namely: “from illustrations-ez simple and pleasent for children taken” (from simple and pleasant illustrations for children) and “to images-ez bitter, mysterious and complex for adult” (to bitter, mysterious and complex images for 5 Possessiveness is expressed here by the pronominal clitic form of the full personal pronoun ð@ (U). 35 @ (-aŝ); 3sg, which is the bound آماده root شاهدیم conj/cop . punct تا prep از : punct prep گرفته dep تصویرسازیهای pobj تصاویری pobj برای prep بزرگسالها pobj و دهد ccomp cc تلخ amod مرموز conj ، punct برای ساده amod prep تنوع dobj تغییر در به به prep dobj-lvc prep prep را بیشماری آثار بین mwe pobj amod acc او poss بوده همیشه cc conj cop روشهای که complm dobj خاطر شکل را خود poss acc pobj pobj همین det جسورانهای amod بچهها دلنشین و cc conj pobj پیچیده و cc conj بلکه همیشه آماده بوده که روشهای خود رابه شکل جسورانهای تغییر دهد و به خاطر همین در بین آثار او تنوع بیشماری را شاهدیم :از تصویرسازیهای ساده و دلنشین برای بچهها گرفته تا تصاویری تلخ ,مرموز و پیچیده برای بزرگسالها. Figure 3.7: Syntactic annotation for the Persian sentence illustrated with Persian words. 36 ready root witness-are give and ccomp cc but always been cc conj cop that methods to change to in complm dobj prep dobj-lvc prep prep own râ poss acc . punct conj/cop variation dobj to prep : from punct prep sake way among works numerous râ illustrations pobj amod acc pobj mwe pobj pobj courageous this amod det her/his poss simple amod taken dep for prep bitter amod images pobj for prep and pleasent children ، mysterious punct conj cc conj pobj adults pobj and complex cc conj Figure 3.8: Syntactic annotation for the Persian sentence “But he was always ready to change his styles in a courageous way and because of this we see a numerous diversity among his works: from simple and pleasent illustrations for children to bitter, mysterious and complex images for adults.” 37 adults). Noteworthy is the past participle “taken” forming a discontinuous multiword expression with the preposition “from”. The rest of this subtree has a syntactic analysis similar to those explained previously. 38 Appendix A Non-Separating Whitespace As a general rule, whitespace is taken as a token separator in UPDT. However, in the following cases, where whitespace can deterministically and unambiguously be identified as token-internal, it has instead been replaced by zero-width nonjoiner (ZWNJ) (to create a single token). -” /-ân/, “ àAK -” /-yân/, “ àAÇ -” 1. Nouns and plural suffixes “ Aë-” /-hâ/ , “ à@ -” /-ât/ , and “ áK -” /-in/, e.g.,: /-gân/, “ H@ Aë H. AJ» ....................... AëH. AJ» (books) ....................... à@ QgX (girls) à@ QgX (students) ................... àAK ñj. @X ñj. @X àAK PAJ ...................... àAÇPA J (stars) àAÇ QëA¢ ..................... H@ QëA¢ (demonstrations) H@ .................... áK Q¯AÓ (passengers) áK Q¯AÓ 2. Any noun and the indefinite clitic “ ø@-” /-i/ when it ends in silent “h”1 , e.g.,: ø@ éKA g ....................... ø@éKA g (a house) 3. Any noun indicating trade names and the abstract suffix “ ø-” /-ye/ or “ úG-” /-i/, e.g.,: 1 The silent “h”; “ è”, is a consonantal “-h” that represents a terminal “-e” and is treated as the vowel “-e” when it takes ezafe, as in ” éKA g” /xâneh/ (house) and “ øéKA g” /xâne–ye/ when it takes ezafe”. 39 ø QÃP P ........................ øQÃP P (goldsmith’s trade) úG @ñKA K ...................... úG @ñKA K (bakery) 4. Any noun and the abstract suffix “ ø-” /-ye/ in forming adjectives, e.g.,: ø Q»Ag ..................... ø Q»Ag (gray) (ashes + /-ye/ ............. gray) 5. Any adjective and the abstract suffix “ ø-” /-ye/ in forming nouns, e.g.,: ¯ ........................ ø QÓQ ¯ (redness) ø QÓQ (red + /-ye/ .............. redness) 6. Nouns and different pronominal clitics shown in Table A.1, e.g.,: àA K Q¯X ...................... àA KQ¯X (your office) Pronominal Clitics Ð H àAÓ àA K àA Pronunciations /-m/ /-t/ /-š/ /-mân/ /-tân/ /-šân/ Corresponding in English my your her/his our yours their Table A.1: Pronominal clitics. 7. Any preceding word and either personal endings or copula clitics in 1st, 2nd singular as well as 1st, 2nd, and 3rd plural shown in Table A.2 and Table A.3 , e.g.,: @ (they have come) YK@ èYÓ @ ....................... YK@èYÓ (come + /-and/ .......... they have come) 8. Nouns and verbal stems in compound words. Verbal stems shown in Table A.4 are usually used as the second element of a compound word and serve as derivational suffix, e.g.,: 40 Personal Endings Ð ø ∅ Õç' YK YK Pronunciations /-am/ /-i/ ∅ /-im/ Corresponding in English I you she/he we /-id/ /-and/ you they Table A.2: Personal endings. Copula Clitics Ð@ ø@ ∅ Õç' @ YK @ YK@ Pronunciations /-am/ /-i/ ∅ /-im/ Corresponding in English I you she/he we /-id/ /-and/ you they Table A.3: Copula clitics. 9. Suffixes shown in Table A.5 and their adjacent words in forming adjectiveadverbs or adjective-nouns: 10. The negative prefix “- AK ” /nâ-/ (-im, -in, -un, -less) and adjectives or verbal stems, as well as the negative prefix “- úG” /bi-/ (-im, -in, -un, -less) . and adjectives, e.g.,: (a) the negative prefix “- AK ” /nâ-/ and adjectives, e.g.,: K (incorrect) IPX AK ..................... IPXA (/nâ/ + correct ......... incorrect) (b) the negative prefix “- AK ” /nâ-/ and verbal stems, e.g.,: AJ AK ................ AJ A K (unknown) (/nâ/ + know ...... unknown) (c) the negative prefix “- úG” /bi-/ and adjectives, e.g.,: . (careless) I ¯X úG. ...................... I ¯XúG . (/bi/ + care .............. careless) 41 Verbal stems úæK Q¯ @- XñË@Q Ó @ K@ P@Y XðYK@ Q ÂK@ Pð @ AK QK YK QKá»@ P@XQK PðQK QKèð QK ñK AÒJ KøPñk Q g à@X àAP à@QK P@P @XP ø P ø PA ø Pñ i.J ẠAJ àA ¯ àA J» ñK H. AK - Pronunciations /-âfari:ny/ /-âlu:d/ /-âmi:z/ /-andâz/ /-andu:d/ /-angi:z/ /-âvar/ /-pâŝ/ /-pazi:r/ /-parâkan/ /-pardâz/ /-parvar/ /-pari:ŝ/ /-paẑu:h/ /-pu:ŝ/ /-peymâ/ /-xory/ /-xi:z/ /-dân/ /-resân/ /-rizân/ /-zâ/ /-zodâ/ /-zy/ /-sâzy/ /-su:zy/ /-sanj/ /-ŝekan/ /-ŝenâs/ /-feŝân/ /-konân/ /-nevi:s/ /-yâb/ Example words úæK Q¯@ ¬C Jk@ XñË @ H. @ñk ® ¯ñÓ Q Ó @ IJ K@ Õæk P@Y XðYK@ Q ¯ QÂK@ I ®Â Pð @ èYJ k H. @ AK QK YK I. @ QK éªK A úæ»@ ÈAJ k P@XQK PðQK Ð@X úæ QK ¸@P X@ èð QK @X èP P ñK AÒJ K @ñë øPñk @ Y « Q g Qm ¹K Q ¯ à@X I @ àAP . à@ QK P ÀQK. Ak @P I @XP K ø P H. @ JkA àAÒ ø PA @ ø Pñ éJºK i.J KA ¯ àñ ẠAJ á ÓP àA ¯ àAg àAJ» èYJ k. ñK t' PAK H. AK PP@ Preprocessed words úæK Q¯@ ¬C Jk@ XñË @H. @ñk ® ¯ñÓ Q Ó @ IJ K@Õæk P@Y XðYK@ Q ¯ QÂK@ I ®Â Pð @èYJ k H. @ AK QK YK I. @ QK éªK A úæ»@ ÈAJ k P@XQK PðQK Ð@X úæ QK ¸@P X@ èð QK @X èP P ñK AÒJ K @ñë øPñk@ Y « Q gQm Q ¯ à@X¹K àAPI @ . à@ QK PÀQK. Ak @P I @XP K .@ ø PH àAÒJkA ø PA @ ø Pñ JºK i.Jé KA ¯ àñ ẠAJ á ÓP àA ¯ àAg àAJ»èY J k. ñKt ' PAK H. AK PP@ Table A.4: Verbal stems in the formation of compound words. 42 Translations dispute maker sleepy successful perspective pitch wonderful funny sprinkler vulnerable rumors dreamer animal husbandry agnosia scholar armored airplanes dining early riser physicist injurious / ill-wisher fall allergen stress desensitization aquatic building fire punctilious outlaw geologist zealot laughing historian assessor Suffixes PA¸ éKAÇ QÃ- Pronunciation /-sâr/ /-ak/ /-gâneh/ /-gar/ úÃá ÃYJÓ ¸AK P@ðPðYKð IK /-gy/ /-gin/ /-mand/ /-nâk/ /-vâr/ /-var/ /-vand/ /-yat/ Example words Processed words PA ÐQå ¸ Qå ém' éKAÇ . QÃ Õæ úà XQå¯@ á Ã Õæk K YJÓ HðQ ¸AK I kð P@ð YJ Ó@ Pð á m YKð QîD Q»@ IK PAÐQå ¸Qå ' éKAÇém . QÃÕæ úÃXQå¯@ á ÃÕæk K YJÓ HðQ ¸AK I kð P@ð YJ Ó@ Pð á m YKðQîD Q»@ IK Translations ashamed little boy childish unjust depressed angry rich terrible hopefully eloquent citizen majority Table A.5: Adjectival and nominal suffixes. 11. Between nouns and the indefinite suffix “ ø-” /-ye/ in forming indefinite nouns, e.g.,: ø Qå ........................ øQå (a boy) (boy + /-ye/ ............. a boy) 12. Between verbal stems and the suffix “ ¸@-” /-âk/ in forming nouns, e.g.,: ¸@ Pñk ...................... ¸@P ñk (feed) (eat + /-âk/ .............. feed) 13. Between verbal past stems and the suffix “P@-” /-âr/ in forming nouns, e.g.,: P@ YK Qk ...................... P@YK Qk (buyer) (buy + /-âr/ ............. buyer) 14. Between verbal present stems and the suffix “PAÇ-” /-gâr/ in forming nouns, e.g.,: @ ...................... PAÇPñÓ @ (instructor) PAÇ PñÓ (instruct + /-gâr/ .... instructor ) 43 15. Between nouns and the suffix éK@ - /-âneh/ in forming adverbs, e.g.,: (manly) éK@ XQÓ ...................... éK@XQÓ (man + /-âneh/ ...... manly) 44 Appendix B UPDT Dependency Labels 6000 2535 360 681 3 2 655 8 2 4157 11 9205 3 62 583 3 2287 217 7657 4021 55 1 1 12 1 6 8 2022 8629 34 85 2 3 4426 1 acc acomp acomp-lvc acomp-lvc/pc acomp/pc advcl advcl/cop advcl/pc advmod advmod/pc amod amod/cop amod/pc appos appos/pc aux auxpass cc ccomp ccomp/cop ccomp\cpobj ccomp\nsubj ccomp/pc ccomp/pc\cop ccomp\pobj ccomp\poss complm conj conj/cop conj/pc conj\pobj conj\poss cop cop/pc 45 185 2 187 375 3 68 5 62 3929 3723 16 4185 17 123 2 168 733 1773 1 105 3339 1 490 8658 7 146 1 189 2872 313 194 6 4 16237 12 1 162 16071 6 44 151 49 51 15643 41 554 49 1 102 13442 cpobj cpobj/pc cprep dep dep/pc dep-top dep-top/pc dep-voc det dobj dobj/acc dobj-lvc dobj-lvc/pc dobj/pc dobj/pc-lvc fw mark mwe mwe/pc neg nn nn/cop npadvmod nsubj nsubj-lvc nsubjpass nsubjpass/pc nsubj/pc num number parataxis parataxis/cop parataxis/pc pobj pobj/cop pobj\cop pobj/pc poss poss/acc poss/cop poss/pc preconj predet prep prep/det prep-lvc prep/pc prep/pobj prt punct 46 75 1408 2 9 2 2 2 1410 5917 1 1 65 6 13 7 382 133 quantmod rcmod rcmod\amod rcmod/cop rcmod/pc rcmod\pobj rcmod\poss rel root root\amod root\conj root/cop root/pc root\pobj root\poss tmod xcomp 47 Bibliography M. Bijankhan (2004). ‘The Role of the Corpus in Writing a Grammar: An Introduction to a Software’. Iranian Journal of Linguistics 19. S. Buchholz & E. Marsi (2006). ‘CoNLL-X Shared Task on Multilingual Dependency Parsing’. In Proceedings of the 10th Conference on Computational Natural Language Learning (CoNLL), pp. 149–164. M.-C. de Marneffe, et al. (2006). ‘Generating Typed Dependency Parses from Phrase Structure Parses’. In Proceedings of the 5th International Conference on Language Resources and Evaluation (LREC). J. Hajič, et al. (2001). ‘Prague Dependency Treebank: Annotation Structure and Support’. In Proceedings of the IRCS Workshop on Linguistic Databases, pp. 105–114. G. Lazard (1992). A Grammar of Contemporary Persian. Mazda Publishers. M. Seraji, et al. (2012). ‘A Basic Language Resource Kit for Persian’. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC). 48
© Copyright 2026 Paperzz