Structural Prediction in Incremental Dependency Parsing Niels Beuck and Wolfgang Menzel University of Hamburg {beuck,menzel}@informatik.uni-hamburg.de Abstract. For dependency structures of incomplete sentences to be fully connected, nodes in addition to those corresponding to the words in the sentence prefix are necessary. We analyze a German dependency corpus to estimate the extent of such predictive structures. We also present an incremental parser based on the WCDG framework and describe how the results from the corpus study can be used to adapt an existing weighted constraint dependency grammar for complete sentences to the case of incremental parsing. Keywords: Parsing, Incrementality, Weighted Constraints, Dependency Grammar 1 Motivation A dependency structure of a sentence is a fully connected, single headed, directed acyclic graph, where every node corresponds to a word in the given sentence. For incomplete sentences, as encountered during incremental parsing, achieving connectedness is often not possible when the dependency structure is limited to the nodes corresponding to words in the sentence prefix. This problem occurs for every word that is attached to the right, as there will be at least one sentence prefix where the word but not its head is part of the prefix. There are two possible approaches to deal with this problem: Either to drop the connectedness requirement for the dependency structures of incomplete sentences (from now on called partial dependency analyses, PDA) or allow additional nodes in the structure as needed to establish connectedness. In the first variant, words are left unattached until a suitable head appears. While this approach to incremental dependency parsing has been successfully used in parsers like MaltParser [1], we argue that it is unsatisfactory when partial dependency structures are of interest, for several reasons: From a psycholinguistic point of view: Humans not only have interpretations of partial sentences, but also have strong expectations about the integration of words even if their syntactic head is still missing [2]. From an application point of view: Information already contained in the perceived part of the sentence is missing in the dependency structure if words are left unconnected. In the incomplete sentence “Pick up the red”, the head of ‘red’ is still missing, but we can already argue that the object of ‘pick’ is something red. This is not limited to words close to the end of the prefix, like in a noun phrase under construction. It also applies to long distance dependencies like in German subclauses, where the verb comes last and all its arguments would need to stay unconnected. [3] proposed explicit dependencies between the arguments, but these are unnecessary if we provide an extra node for the expected verb, allowing early attachment of the arguments. From a parsing perspective: Parsers choose between alternative structures based on some measure of grammaticality or probability. The WCDG parser used in this work uses a set of weighted constraints to measure the grammaticality of dependency structures. To be able to reuse measures optimized for judging analyses of complete sentences, a structure similar to the final structure is required to provide a rating similar to the rating for the final structure. Otherwise, unintuitive structures with no possible counterpart in a complete sentence analysis might be rated unreasonably high and lead the parser astray. On the one hand, additional nodes in partial dependency analyses are desirable because they allow more informative structures. This leaves open the question of how to construct such structures in a parser. On the other hand, a constraint based parser would penalize partial dependency structures even if the constraint violation could and probably will be remedied by further input, making it hard to fairly rate and select a suitable dependency structure for a given sentence prefix. We propose that the second problem holds the answer to the first one. The very constraint violations that unnecessarily penalize partial dependency analyses lacking structural prediction can be used as indicators for how to extend these structures with the needed additional nodes. The remainder of this paper consists of two parts: First, we will present a constraint based incremental parsing algorithm able to make structural predictions about the upcoming input. It uses an existing weighted constraint dependency grammar and an existing transformation based parsing algorithm as well as a set of potential placeholders, so called virtual nodes (VNs), for structural prediction. Secondly, we will analyze a given corpus of dependency annotations to assess the additional structure needed to keep PDAs connected and to fulfill all valency requirements. From this result we can infer the set of VNs we have to provide to our algorithm to achieve a certain coverage. In addition, we analyze to what extend the given set of weighted constraints can be used to judge such structurally predictive partial analyses and where it has to be adapted, e.g. by adding suitable exceptions. 2 Related Work While to our knowledge this is the first strictly incremental dependency parser (strict in the sense of providing a fully connected structure for every input in- crement), strict incrementality has been explored already for the Tree Adjoining Grammar (TAG) formalism. [4] investigated the amount of structure needed for strictly incremental parsing by building gold standard annotations incrementally with a simulated parser. They introduced the term connection path to denote the set of nodes of the (connected) tree for increment n that are not part of the tree for increment n − 1 and not part of the subtree provided by the new word. While they worked with phrase structures for English and only regarded predictions needed to achieve connectedness, our work is on dependency structures for German and regards valency requirements in addition to connectedness. PLTAG (PsychoLinguistically motivated Tree Adjoining Grammar)[5] is a TAG variant, that allows strictly incremental parsing. There, connectedness is facilitated by so called prediction trees, non-lexicalized elementary trees whose nodes later need to be instantiated by nodes from lexicalized elementary trees. Prediction trees are not anchored in the input directly, but are integrated when necessary to connect the elementary tree of the new material to the structure built so far. As the number of possible connection paths grows polynomially with the number of prediction trees contained, the parser presented in [5] restricts connection paths to one prediction tree. The PLTAG parser deals with temporary ambiguity by keeping a beam of possible derivation trees, ranked by their probability learned from a training corpus. Thus, the best structure needs not be a monotonic continuation of the previously best ranked structure. The output of the incremental WCDG parser is non-monotonic, too, but instead of working with a beam of monotonically extended structures, it works by successively transforming a single structure. In contrast to the PLTAG parser, incremental WCDG achieves state of the art accuracy (see [5], p.230 and [6]). 3 The WCDG Framework Weighted Constraint Dependency Grammar (WCDG) [7] is a framework for dependency parsing by means of constraint optimization. The grammar consists of a set of constraints regarding single or pairs of dependency edges. So called context sensitive predicates allow constraints to access properties of the complete tree, e.g. to express the existence requirement for the subject of a verb. The constraints are weighted with a penalty score between 0 and 1, where 0 denotes a hard constraint. By applying all constraints of the grammar to a dependency structure, a grammaticality score for the structure can be calculated by multiplying the penalties of all constraint violations. Thus, parsing in WCDG is the search for the highest scored dependency structure. Every word can, in principle, be attached to any of the other words in the sentence, including the generic root node via any dependency type. The only restriction is that a word can only be attached to one head.1 All further restrictions, like projectivity, are 1 WCDG supports multiple levels, i.e. building multiple dependency structures at once, including constraints mediating between them. Yet, for a given level, a word can have only one head. expressed via constraints. The number of possible edges grows quadratic with the number words in the sentence, resulting in an exponential number of possible dependency structures. Thus, for non-trivial sentences, a complete search is usually not feasible. [7] presented a repair based heuristic algorithm called frobbing. The algorithm starts with an arbitrary initial structure and identifies candidates for replacement by looking at the constraint violation with the highest penalty. Replacement variants for the edges violating the selected constraint are then evaluated for whether they would prevent this conflict and one of them is applied. Several of these repair steps might have to be chained to escape from a local score maximum. This algorithm is inherently non-incremental, as it starts with a structure for the whole sentence. An incremental processing mode can be introduced by applying the procedure on prefixes of the input and using the resulting dependency structure as the initial structure for analyzing the extended prefix. In this incremental mode, edges selected in a previous step can be replaced later on, if necessary. This can happen when, given the additional input, they violate additional constraints or when an old constraint violation can be resolved given the new input. The algorithm guarantees no monotonicity at all, but in practice an attachment stability of around 70% has been observed [6]. 4 Structural Prediction with Virtual Nodes When parsing a sentence prefix instead of a full sentence with WCDG, several constraints might prevent the parser from choosing a prefix of the structure which would be selected for the full sentence, i.e. a structure only containing dependency edges also present in the complete structure. This is foremost due to ‘fragmentation’ constraints, penalizing unattached words. They force the parser to select an unsuitable attachment point that might be penalized by other constraints, but not as hard as the fragmented reading. Also, valency constraints are commonly violated in such prefix structures. There are two possible approaches to deal with these kinds of constraints when applied to partial dependency analyses: Either we prevent them from being applied to PDAs or we add as many additional nodes and edges to the dependency structure as needed to prevent the constraint violation. The first approach has the severe drawback of over-generation. Disabling constraints (or adding exceptions to them) will remove their ability to penalize ungrammatical input. In this paper we follow the second approach by providing a set of so called virtual nodes that can be integrated into the dependency structure if a constraint violation can be avoided by doing so, but there is no penalty if they stay unconnected. If being unconnected, they are not considered to be part of the structure. To prevent the parser from choosing a structure that includes virtual nodes even if another structure of equal or slightly worse score would be available, we S S He DIROB J BJ BJ SU SU a buys a buys He VS DE T [noun] S Fig. 1. Example for bottom-up prediction: The determiner is unattachable to any word directly, but attachable if predicting a noun OB BJ Er sieht [virtNoun] PP ET JA SU PN D ; Das Haus mit [virtNoun] Fig. 2. Examples for top-down predictions of an object and the kernel noun of a preposition (“he sees” and “the house with”) add a slight penalty to every analysis where a virtual node is used, i.e. connected. This way, the structurally predictive reading is chosen only if necessary, but not otherwise. No further modification to the algorithm is needed. During constraint optimization the virtual nodes are treated like other input nodes. The special handling of virtual nodes (i.e., the slight penalty for connected and no penalty for unconnected virtual nodes, in contrast to the harsh penalty for other unconnected nodes) is defined solely in the constraints of the WCD-Grammar. We can distinguish two different scenarios leading to structurally predictive readings. The scenario seen in Figure 1 is a bottom-up prediction, where the necessity to connect a new word leads to the prediction of a suitable head. As seen in Figure 2, dependents can be predicted, too, via top down prediction. Whenever a word has a valency that can not be satisfied by any of the other available word, e.g. missing subject or a transitive verb missing an object, filling the valency with a VN makes the corresponding constraint violation disappear. If we would allow some words to stay unattached, it cannot be guaranteed that there is any possible continuation of the sentence, where these unattached words could be attached without changing the already established part of the structure. For fully connected structurally predictive structures, in contrast, the absence of severe constraint violations licenses not only the integration of the otherwise unattachable words but also the acceptability of the whole structure. It anticipates a potential continuation of the sentence prefix, which does not introduce any additional constraint violations. The structural prediction serves as a kind of “certification” for the grammaticality of all the attachments in the PDA. It has to be assured that for every sentence prefix the grammar licenses a partial dependency analysis free of additional constraint violations. Besides providing a suitable set of virtual nodes, grammar modifications might be necessary to achieve this goal, as will be discussed in Section 6. To be able to predict structural relationships for yet unseen words, the incremental frobbing algorithm needs to be provided with a finite set of virtual nodes. The larger this set, the higher the coverage will be. Additional VNs, however, introduce a quadratic increase of potential dependency edges into the search space and, as a consequence, an exponential increase in the number of possible dependency structures. For the sake of computational feasibility, the number of VNs should be kept as small as possible. This can be achieved by using general VNs (which stand for whole classes of words such as nouns or verbs). Aside from the combinatoric problems of providing too many VNs, it also avoids the danger of many variants receiving the same grammaticality score and thus being indistinguishable by the parser. Even the order among VNs will remain unspecified. While the existing grammar is prepared to deal with underspecified values for many lexical features, attributes like the word form or the linear precedence relations among words are always considered being given explicitly. To allow constraints to ignore the arbitrary order/adjacency relations between VNs as well as their particular lexical instantiations, respective exceptions need to be added. Thus, the questions we have to answer are: – Which set of virtual nodes is needed to achieve a certain coverage – What changes to a constraint dependency grammar for full sentences are necessary? In the second half of the paper we will present an empirical investigation to answer these questions. 5 Data Preparation Not only we have no annotations for sentence prefixes available, but also it would be difficult to specify them uniquely, since they always depend on a hypothesis for a possible sentence continuation. Therefore, we generate them from complete sentence annotations. We want to find a monotonic sequence of prefixes resulting in the final annotation. Those prefixes are therefore not necessarily the most plausible for any given prefix. For every word in a given sentence, we generate an annotation up to that word by retaining certain nodes and dependency edges while dismissing others2 . 5.1 Prefix Baseline Annotation The basic variant of the procedure that generates connected PDAs is quite straightforward: – all words inside the prefix and dependencies between them are kept – all words that are heads of retained words are retained themselves, including the dependency edge between the head and the dependent. This rule applies recursively. 2 The software to reproduce this data preparation as well as the evaluation can be found at http://www.CICLing.org/2013/data/274 S J SUB DET TR AT DET TR AT Der kleine [virtNoun] [virtVerb] VS Der kleine [virtNN] Fig. 3. Example for assuming a complete sentence vs a fragment given the prefix “der kleine” (“the little”) KONJ SUBJ X AU KONJ BJ A SU BJ O X AU A OBJ Dass er ihn [VVPP] [VAINF] [VMFIN] ⇒ Dass er ihn [VVFIN] Fig. 4. Example for folding: When generating a PDA for the first three words of the clause “dass er ihn gesehen haben soll” (literally: “that he him saw has should” meaning:“that he is supposed to have seen him”) . All three verbs would be kept for connectedness, but are then folded into a single virtual verb Words not belonging to the prefix but nonetheless retained need to be ’virtualized’ to produce generic representatives of their part-of-speech category, the form of the virtual nodes used by the incremental parsing algorithm. The procedure of virtualization removes word form and lemma, keeps partof-speech information, removes all other lexical features, and blurs the original linear order among the nodes. 5.2 Prefix Annotation Variants Three parameters for the prefix generation algorithm have been considered: 1. Assuming complete sentences or not, i.e. whether we retain all nodes needed to connect up to the root or only retain those needed to connect all words in the prefix to each other (see Figure 3). 2. Top-down prediction. The connectedness criterion only provides for bottomup prediction. Additional nodes could be retained via top-down prediction, i.e. to fill obligatory valencies (see Figure 2). 3. Folding of connection paths. The connection paths extracted directly from the full sentence annotations might not represent the shortest possible connection path. This is especially true for chains of equally labeled dependencies which correspond to nested phrase structures. Folding these structures by only keeping one of the dependencies will lead to more plausible prefix annotations (see Figure 4). They correspond to the minimally recursive structures [4] investigated in their second experiment where they anticipated parser miss-analyses in their simulated parsing procedure. These three parameters have different consequences for the parser. Unless there are clear hints for a nested structure, the minimal tree, i.e. the one with the least number of virtual nodes, should always be preferred. WCDG achieves this due to the prediction penalty, selecting the PDA with a minimal amount of VNs among otherwise equally scored alternatives. The parser will not be able to utilize additional VNs set aside for additional nesting depth. Folding can be expected to provide a better estimation of the amount of VNs needed. While an incremental dependency parsing algorithm that provides bottom-up but no top-down prediction is possible, the parsing algorithm used here has no distinct mechanisms for the two prediction modes. If respective virtual nodes are provided, they will be used to satisfy valency constraints, unless those constraints are disabled for incomplete input. Therefore, by adding top-down prediction to the prefix generation algorithm, we get a better estimation for the VNs needed by the parser. Note that folding reduces the number of VNs, while top-down prediction increases it. In contrast to folding, where there is no choice for parser behavior, the parser can easily be changed between a full sentence and a fragment mode. Which mode is activated depends solely on whether there is a constraint for penalizing words other than finite verbs to be the root of the tree. Such a constraint would incite the prediction of a virtual finite verb (and possibly its subject) even if none is needed to establish connectedness. Thus, the choice between these modes affects the number of required virtual nodes. In this paper we only explored the fragment variant, as did [4] and [5]. The predictions in this mode are expected to be smaller and less speculative. 6 Experiments To determine the set of virtual nodes needed, we generate a partial but connected structure for every prefix of every sentence in an annotated corpus. A subset of the dependency edges of the complete tree is selected by using the procedure presented above. Given these structures, we can count how many additional nodes are needed to keep them connected. Also, by evaluating these PDAs with the WCD-grammar, we can determine whether they violate any constraints in addition to those violated in the complete sentence, as such additional constraint violations would prevent the prefix structure to be selected by the parser. Based on these results we can then modify the grammar accordingly. Using the 500 sentences from the Negra corpus [8] transformed to the dependency annotation scheme described in [9] and a broad coverage WCD-grammar for German [7] we perform the following procedure: 1. generate a set of prefix annotations from the gold standard annotation. 2. Calculate constraint violations that only appear on prefix analyses, but not in the corresponding complete sentence 3. check: (a) whether changes to the constraints themselves are needed (b) whether additional VNs corresponding to words in the complete sentence could avoid any of the additional constraint violations (c) whether VNs can be removed from the connection path systematically without triggering additional constraint violations 4. repeat from step 1 with different variants of the PDA generation algorithm 6.1 Prediction with Baseline Annotations We start with the most simple variant of the prefix generation algorithm, i.e. without folding or top-down prediction. The constraint violations encountered in the prefix annotations but not in the respective complete sentence annotation can be roughly categorized as follows: Prediction penalties As described in Section 4, there is a small penalty for integrating VNs into a structure. The respective constraint violations are expected in the difference lists, and can safely be ignored here. Order constraints, especially projectivity These constraint violations occur due to VNs being unordered. These constraints should simply not be applied to VNs. Hence, respective exceptions need to be added to them. Unsatisfied valencies As we did not account for valencies when deciding on what to keep in the prefix structures of this iteration, all kinds of valency constraints are violated, e.g. missing subject, missing object, missing kernel noun for prepositions, missing determiners for nouns, missing conjunctions and missing conjuncts. These are prime examples for top-down prediction, i.e. these conflicts will disappear when attaching a VN with a respective PoS and dependency type. For VNs missing determiners or infinitive markers ‘zu’, PoS variants without these valencies (NE and VVIZU respectively) will be selected instead of top-down-prediction. Punctuation Several constraints expect a comma or quotation marks between two words. If at least one of the words is virtual, the missing punctuation might still show up later on. Thus we will add respective exceptions to these constraints. Word form There are constraints accessing the word forms or lemmas. Many of these occurrences are exceptions for certain words. We will ignore them for now, as we neither want too detailed predictions (exploding the search space of the parser) nor do we want to allow idiosyncratic constructions before the licensing word appears in the input. For constraints deemed general enough, exceptions which prevent them to be applied to VNs are added. Table 1 shows the percentage of prefix structures containing a certain number of VNs, in total and coarsely grouped by part of speech. We also added a line for the number of VNs in the connection path, i.e. only the VNs needed in addition to those already present in the previous prefix, to connect the most recent word to the rest of the structure. This line is roughly comparable to the numbers presented in [4]. Their headless projections in the connection path correspond to our VNs. In both cases connection paths up to length 1 cover 98% of the observed prefixes and 2 cover over 99%. No occurrences of path longer than 4 are observed. Instead of 82.3% of prefixes not requiring any headless projections there, 80.8% of the words can be connected without VNs here. To estimate whether the minor difference results from differences in language (English VS German) or grammar formalism (phrase structure VS dependency structure), we ran our evaluation on 500 sentences from the WSJ corpus, i.e. Table 1. Results of experiment 1: The number of VNs needed for prefix structures, only regarding connectedness but no valencies; total VNs: number of VNs per PDA; CP length: only those VNs needed to connect the most recent word; note that the numbers in the columns don’t add up: the 4 VNs of a PDA might consist of 3 virtual verbs and 1 virtual nominal 0 1 total VNs verbs nominals adjectives other CP length 56.5% 68.8% 81.2% 97.5% 99.1% 80.8% WSJ 79.4% 17.8% 2 3 4 5+ 30.1% 10.8% 2.4% 0.2% 0% 26.0% 4.6% 0.7% < 0.1% 0% 18.7% 0.1% 0% 0% 0% 2.5% < 0.1% 0% 0% 0% 0.8% < 0.1% 0% 0% 0% 17.0% 2.0% 0.1% 0% 0% 2.7% 0.2% 0% 0% the same corpus used in [4], converted to dependency structures. The results (cp length only) are shown in the last line of Table 1 and are closer to our results, suggesting that the differences result from the grammar formalism. 6.2 Valency Respecting Prediction In the previous experiment, a large class of constraint violations were related to unsatisfied valencies. We therefore modified the prefix generation algorithm to keep words in the structure for the following cases: – Subjects of all finite verbs, virtual or not – Kernel nouns of prepositions – Dependents of a conjunction – Objects of non-virtual words – The second part of a truncation like ‘an- und verkaufen’ if the first is observed – The last part of a conjunction chain, if the first two are observed, e.g. in the sequence ‘A, B, C und D’, ‘und D’ would be kept and C skipped, if ‘A,B’ was observed – Certain adverbs that fill the role of conjunction in subclauses are kept, if the verbal head of the subclause is already kept for other reasons. Since the valencies of virtual words are not available to the parser, they all have to be considered optional. Therefore, no objects for virtual verbs are kept. 6.3 Prediction with Folded Structures The results for PDAs with folding are shown in Table 3. Compared to Table 1, the most noticeable difference is that the percentage of prefixes with two virtual verbs went down by around 3.5%, while the number for 1 virtual verb increased. As expected, there is no change in the zero column number, as folding will not remove the last virtual verb remaining. Table 2. Results of experiment 2: The number of VNs needed for prefix structures, regarding connectedness and valencies total VNs verbs nominals adjectives conjunction adverb preposition other 0 1 37.4% 62.6% 58.5% 96.1% 97.9% 99.4% 98.3% 99.2% 34.6% 30.8% 37.5% 3.8% 2.1% 0.4% 1.7% 0.9% 2 3 4 5+ 20.7% 5.8% 1.3% 0.2% 5.8% 0.8% < 0.1% 0% 3.9% 0.1% 0% 0% 0.1% 0% 0% 0% 0% 0% 0% 0% 0.2% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Table 3. Results of experiment 3: The number of VNs needed for prefix structures, regarding connectedness while folding recursive verb chains 0 total VNs verbs nominals adjectives other CP length 6.4 56.5% 68.8% 81.2% 97.5% 99.1% 81.15% 1 2 3 4 5+ 33.3% 9.3% 0.9% < 0.1% 0% 30.2% 1.0% 0.1% 0% 0% 18.8% 0.1% 0% 0% 0% 2.5% < 0.1% 0% 0% 0% 0.8% < 0.1% 0% 0% 0% 17.2% 1.6% < 0.1% 0% 0% Prediction with Valencies and Folding The results for the combined algorithm, folding plus valencies, can be seen in Table 4. As expected, the numbers are generally above those of Table 3 and below those of Table 2. Three VNs are sufficient to achieve a coverage of 99%. One line in the table that might surprise is that prepositions as VNs were observed (1.7%), since a preposition is the leftmost word in a PP. There are three different ways prepositions are predicted: to connect an adverb (arguably a very ambiguous and hard to predict case), to fill a verb’s valency for a prepositional object, and to complete a coordination of prepositional phrases, when the first PP and a conjunction like ‘und’ was already observed. The question remains, what part-of-speech category combinations among the VNs achieve which coverage. Some possible combinations are shown in Table 5. A set consisting of two nominals and one verb, as used in [6], covers nearly 90%. Adding a VN for adjectives, conjunctions, prepositions and adverbs improve the coverage towards 99%. Whether the parser can actually benefit from this improved coverage, especially of the latter two categories, will have to be evaluated, but is beyond the scope of this paper. Table 6 shows how VNs are distributed over the different prediction constellations. If a VN serves to connect two different words, e.g. it is the head of a determiner the object of a verb, the leftmost word is selected. The most common reason for prediction is predicting the kernel noun of a PP (18%), followed by predicting the subject of a finite verb (11%), predicting a clause initial conjunction (10%) and connecting a determiner (6%). Table 4. Results of experiment 4, the complete algorithm: The number of VNs needed for PDAs, regarding connectedness and valencies while folding nested structures total VNs verbs nominals adjectives conjunction adverb preposition other 0 1 37.4% 62.6% 58.5% 96.1% 97.9% 99.4% 98.3% 99.1% 36.7% 35.0% 37.5% 3.8% 2.1% 0.4% 1.7% 0.9% 2 3 4 5+ 20.6% 4.3% 0.9% 0.1% 2.3% 0.1% 0% 0% 3.9% 0.1% 0% 0% 0.1% 0% 0% 0% 0% 0% 0% 0% 0.2% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% 0% Table 5. Percentage of sentence prefixes that can be represented with a given set of VNs. N: nominal, V: verb, A: adjective, C: conjunction, P: preposition, Av: Adverb VN-Set no VNs 2∗N +V 2∗N +V +A 2∗N +2∗V +A 2∗N +2∗V +A+C 2∗N +2∗V +A+C +P 2 ∗ N + 2 ∗ V + A + C + P + Av 6.5 coverage (%) 37.4 89.4 93.0 94.7 96.7 98.3 98.7 Discussion This study only covers part of the problem of grammar development: We made sure for every prefix of every sentence there is a dependency structure that is scored at least as good as the dependency structure for the final sentence. So far, however, we did not consider the complementary case, namely to make sure that for the best scored partial dependency structure of every sentence prefix there actually is a continuation that would indeed be scored as good. This cannot be examined by a corpus study but only by inspecting actual parser behavior. 7 Conclusions In this paper we presented a parsing system based on the WCDG framework capable of incrementally parsing into connected dependency structures. The system uses a grammar consisting of weighted constraints as well as a set of virtual nodes. While there is an existing state of the art broad coverage WCD-grammar for German, that grammar is optimized for complete sentences, not for sentence prefixes. We conducted a corpus study to identify what changes to the existing grammar are needed, as well as to identify sets of virtual nodes suitable to cover the observed sentence prefixes. Our experiments complement those conducted by [4] by exploring another grammar formalism and valency driven top-down prediction in addition to connectedness-driven bottom-up prediction. Table 6. An overview over common reasons for why VNs where included in a PDA; bottom-up prediction to the left, top-down to the right; determined for the PDAs of experiment 4; reason reason connecting a - determiner - subject - clause initial conjunction - adverb - accusative obj. - preposition % of VNs 6.0 5.9 5.6 4.5 3.8 3.4 filling the valency: - kernel noun of a prep. - subject - the conjunct under a conjunction - accusative obj - conjunction to finish an incomplete coord. - auxiliary verb - predicate - subject clause - object clause % of VNs 18.0 11.4 10.3 5.3 3.5 2.7 3.4 2.7 1.1 References 1. Nivre, J., Hall, J., Nilsson, J., Chanev, A., Eryigit, G., Kübler, S., Marinov, S., Marsi, E.: Maltparser: A language-independent system for data-driven dependency parsing. Natural Language Engineering 13 (2007) 95–135 2. Sturt, P., Lombardo, V.: Processing coordinated structures: Incrementality and connectedness. Cognitive Science 29 (2005) 291–305 3. Bornkessel, I.: The Argument Dependency Model: A neurocognitive approach to incremental interpretation. Max Planck Institute of Cognitive Neuroscience (2002) 4. Lombardo, V., Sturt, P.: Incrementality and Lexicalism: a Treebank Study. In: Lexical Representations in Sentence Processing. John Benjamins (2002) 137–154 5. Demberg, V.: A Broad-Coverage Model of Prediction in Human Sentence Processing. PhD thesis, The University of Edinburgh (2010) 6. Beuck, N., Köhn, A., Menzel, W.: Incremental parsing and the evaluation of partial dependency analyses. In: DepLing 2011, Proceedings of the 1st International Conference on Dependency Linguistics. (2011) 7. Foth, K.A.: Hybrid Methods of Natural Language Analysis. PhD thesis, Uni Hamburg (2006) 8. Brants, T., Hendriks, R., Kramp, S., Krenn, B., Preis, C., Skut, W., Uszkoreit, H.: Das negra-annotationsschema. Negra project report, Universität des Saarlandes, Computerlinguistik, Saarbrücken, Germany (1997) 9. Daum, M., Foth, K., Menzel, W.: Automatic transformation of phrase treebanks to dependency trees. In: 4th Int. Conf. on Language Resources and Evaluation, LREC-2004. (2004)
© Copyright 2025 Paperzz