Dependency Parsing CMSC 723 / LING 723 / INST 725 MARINE CARPUAT [email protected] Slides credit: Joakim Nivre & Ryan McDonald Today’s Agenda • Midterm • P1 • Dependency grammars & parsing Two Types of Parsing ● Dependency: focuses on relations between words I saw a girl with a telescope ● Constituency: focuses on identifying constituents and their recursive structure S VP PP NP NP NP PRP VBD DT NN IN DT NN I saw a girl with a telescope Why is parsing difficult? Ambiguity I saw a girl with a telescope I saw a girl with a telescope Constituency: Example The funicular which goes to the top of Victoria Peak is one of the longest in the world. Dependency Grammars • Syntactic structure = lexical items linked by binary asymmetrical relations called dependencies Example Dependency Parse They hid the letter on the shelf Compare with constituent parse… What’s the relation? Criteria for dependency • D is likely to be the dependent of head H in construction C – H can often replace C and determines the category of C – H gives semantic specification of C, D specifies H – H is obligatory, D may be optional – H selects D and determines whether D is obligatory – The form of D depends on H (agreement) Some intuitive dependencies… Some trickier dependencies… Complex verb groups Subordination Coordination Prepositions Punctuation Dependencies ● Typed: Label indicating relationship between words prep nsubj dobj det pobj det I saw a girl with a telescope ● Untyped: Only which words depend I saw a girl with a telescope Dependency Treebanks ● ● ● ID 1 2 3 4 5 CoNLL File Format Standard format for dependencies Tab-separated columns, sentences separated by space Word ms. haag plays elianti . Base ms. haag play elianti . POS N N V N . POS2 NNP NNP VBZ NNP . Head 2 3 0 3 3 Type DEP NP-SBJ ROOT NP-OBJ DEP Data-driven dependency parsing Goal: learn a good predictor of dependency graphs Input: x Output: dependency graph/tree G Parsing strategies • Transition-based – – • Learn to predict transitions given input and history Predict new graphs using deterministic parsing algorithm Graph-based – – Learn to predict entire graphs given the input Predict new graphs using spanning tree algorithms Shift-Reduce ● ● ● ● Process words one-by-one left-to-right Two data structures ● Queue [Buffer] of unprocessed words ● Stack of partially processed words At each point choose one action ● shift: move one word from queue to stack ● reduce left: top word on stack is head of second word ● reduce right: second word on stack is head of top word Learn how to choose each action with a classifier Shift Reduce Example Stack Queue Stack I saw a girl Queue saw a girl shift I saw a girl I r left shift a girl I saw girl saw shift I a r left r right a girl saw I I shift saw a I saw girl girl a Classification for Shift-Reduce ● Given a state: Stack Queue girl saw a I ● Which action do we choose? shift ? r left ? saw a girl a saw I I ● Correct actions → correct tree r right ? girl girl saw I a Classification for Shift-Reduce ● ● ● ● We have a weight vector for “shift” “reduce left” “reduce right” ws wl wr Calculate feature functions from the queue and stack φ(queue, stack) Multiply the feature functions to get scores ss = ws * φ(queue,stack) Take the highest score ss > sl && ss > sr → do shift Features for Shift Reduce ● Features should generally cover at least the last stack entries and first queue entry Word: POS: stack[-2] stack[-1] queue[0] saw VBD a DET girl NN (-2 → second-to-last) (-1 → last) (0 → first) φW-2saw,W-1a = 1 φW-1a,W0girl = 1 φW-2saw,P-1DET = 1 φW-1a,P0NN = 1 φP-2VBD,W-1a = 1 φP-1DET,W0girl = 1 φP-2VBD,P-1DET = 1 φP-1DET,P0NN = 1 Algorithm Definition ● ● The algorithm ShiftReduce takes as input: ● Weights ws wl wr ● A queue=[ (1, word1, POS1), (2, word2, POS2), …] starts with a stack holding the special ROOT symbol: ● ● stack = [ (0, “ROOT”, “ROOT”) ] processes and returns: ● heads = [ -1, head1, head2, … ] Training Shift-Reduce Can be trained using perceptron algorithm Do parsing, if correct answer corr different from classifier answer ans, update weights e.g. if ans = SHIFT and corr = LEFT ws -= φ(queue,stack) wl += φ(queue,stack) Keeping Track of the Correct Answer (Initial Attempt) Assume we know correct head of each stack entry: stack[-1].head == stack[-2].id (left is head of right) → corr = RIGHT stack[-2].head == stack[-1].id (right is head of left) → corr = LEFT else → corr = SHIFT Problem: too greedy for right-branching dependencies stack[-2] stack[-1] queue[0] go go to school → RIGHT to id: 1 2 3 1 2 school head: 0 Keeping Track of the Correct Answer (Revised) Count the number of unprocessed children ● stack[-1].head == stack[-2].id (right is head of left) stack[-1].unproc == 0 (left no unprocessed children) → corr = RIGHT ● stack[-2].head == stack[-1].id (left is head of right) stack[-2].unproc == 0 (right no unprocessed children) → corr = LEFT ● else → corr = SHIFT Shift Reduce Training Algorithm ShiftReduceTrain(queue) initialize stack while |queue| > 0 or |stack| > 1: feats = MakeFeats(stack, queue) calculate ans calculate corr if ans != corr: wans -= feats wcorr += feats perform action according to corr Recap ● Dependency grammars ● What’s a dependency ● Some relations are clearer than others ● Dependency treenbanks ● An example of a transition-based parser ● ● ● Shift-Reduce parser Combined with classifier Can be trained using perceptron algorithm
© Copyright 2026 Paperzz