How does sentence order effect the understandability of documents? Joe Austerweil* Brown Laboratory for Linguistics Information Processing (BLLIP) *The research underlying the thesis is joint work with Micha Elsner and Eugene Charniak Results on Test Corpus Binary Classification: Airplane Test Discriminative (%) Barzilay and Lapata (SVM EGrid) 90 Barzilay and Lee (HMM) 74 Soricut and Marcu (Mixture) Unified (Relaxed EGrid/HMM) 94 Sentence Ordering: Airplane Test Barzilay and Lapata (SVM EGrid) Barzilay and Lee (HMM) Soricut and Marcu (Mixture) Unified (Relaxed EGrid/HMM) Kendall's τ 0.44 0.50 0.50 Introduction ● Classic language models do not distinguish between sentence orderings – n-gram models 1. Joe really likes Gilmore Girls. 2. It is his favorite television show. 3. Rory has a crush on him. ● vs. 1. It is his favorite television show. 2. Rory has a crush on him. 3. Joe really likes Gilmore Girls. Document coherence is “a property of well-written texts that... [are] easier to understand than ... random ... sentences” (Lapata and Barzilay 2005) Our Two Tasks ● Binary Classification 1. Joe really likes Gilmore Girls. 2. It is his favorite television show. 3. Rory has a crush on him. – ● vs. 1. It is his favorite television show. 2. Joe really likes Gilmore Girls. 3. Rory has a crush on him. ETS Essay Grading (Miltsakaki and Kukich 2004) Sentence Ordering – impose an ordering on a “bag of sentences” ?. Rory has a crush on him. ?. Joe really likes Gilmore Girls. ?. It is his favorite television show. – 1. Joe really likes Gilmore Girls. 2. It is his favorite television show. 3. Rory has a crush on him. Multi-Document Summarization – one coherent text from multiple news sources (Evans et. al. 2004) Two Main Types of Models ● ● Local Models - – Word Overlap and LSA (Foltz et al 1998) – Naïve Entity Grid (Lapata and Barzilay 2005) – Relaxed Entity Grid (Elsner et al 2007) Global Models - Does the document conform to a structured outline? – Hidden Markov Model (Barzilay and Lee 2004) Word Overlap and Latent Semantic Analysis (LSA) ● Word overlap (LSA) – co-occurrence of words (meanings) in groups of adjacent sentences Pro: – Semantically similar sentences are likely to be close in coherent texts (smooth topic transitions) Con: – Does not capture asymmetries ● – “car” implies “wheels,” but “wheels” does not imply “car” “car” subject vs. syntactically unimportant “car” Naïve Entity Grid 1: The commercial pilot, sole occupant of the airplane, was not injured . 2: The airplane was owned and operated by a private owner. 3: Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. 4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300. Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser) (Lapata and Barzilay 2005, Barzilay and Lapata 2005) Naïve Entity Grid 1: The commercial pilot, sole occupant of the airplane, was not injured . 2: The airplane was owned and operated by a private owner. 3: Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. 4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300. History (h = 2 is better) Generating role Transition from unknown to object Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser) (Lapata and Barzilay 2005, Barzilay and Lapata 2005) Naïve Entity Grid 1: The commercial pilot, sole occupant of the airplane, was not injured . 2: The airplane was owned and operated by a private owner. 3: Visual meteorological conditions prevailed for the personal cross country flight for which a VFR flight plan was filed. 4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300. History (h = 2 is better) Generating role Transition from object to Probability of a document is the product of all transition of syntactic roles (S,O,X,-) Also, condition on number of occurrences (salience) Two assumptions: 1. Markov independence between sentences (column) 2. Entity independence (row) Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser) (Lapata and Barzilay 2005, Barzilay and Lapata 2005) Naïve Entity Grid ● Pro: – Based on Centering Theory – Coherent texts repeat important (salient) nouns – Entity role transitions are somewhat predictable ● ● P(O | S vs. - in history)? Con: – Most likely transition is – to – ● – no nouns in most likely document! Entities are assumed independent Relaxed Entity Grid ● Each sentence has two sets of entities: Known and New. – ● For now, ignore new entities Entities compete for role “slots” Owner 1: The commercial pilot, sole occupant of the airplane, was not injured . 2: The _________ (O) was owned and operated by a private ______(ignored). ● Creates a sparse distribution – Too many possible known entity histories ● We use simple brute force normalization ● Works empirically, but is inconsistent Local Coherence Analysis A coherent entity grid at very low zoom: entities occur in long contiguous columns. A grid for a randomly permuted document tends to look like this. But what if we flip it? Or move around paragraphs? Hidden Markov Model (Global) ● Assume an underlying chain of topics generate the words of each sentence Recently, Britney Spears shaved her head because she went crazy Jon Stewart made fun of Britney Spears on the Daily Show. ● ● Each sentence is independent given its topic Each topic is independent given the previous topic (Barzilay and Lee 2004) Global Coherence Analysis ● ● Pro: – Domain-general global coherence modeling – Calculates coherence based on flow of topics – Can impose order on bag of sentences Con: – Either the states ignore local information or are sparse 1. Recently, an actress shaved her head 2.The actress is Natalie Portman. 3.She is not the first actress to shave her head on purpose. ● – Should there be a shaving_actress state? – What about shaving_singer state? Local models track entity transitions... Mixture Model Approach ● Log linear weighted combination of models: – ● Pro: – ● Global HMM, Naïve Entity Grid, 2 local word cooccurrence models Improved performance on sentence ordering task Con: – Models are independent of each other – Does not improve HMM state sparsity problems – Training is task specific – Non-generative – modularity? (Soricut and Marcu 2006) Our Generative Combined Model State qi q i=1 Known Entities ... E i =1 New entities Ei Wi Other words Ni Our Combined Generative Model ● ● Formulated as a series of Dirichlet and PitmanYor Processes Gibbs Sampling to learn: – ● Inner Parameters, Transition Probabilities, Number of Hidden Topic States Metropolis-Hastings Sampling to learn: – Hyperparameters ● Training is not task specific ● Uses less information than the Mixture Model – Could improve mixture model's performance? Experiments ● Airplane Corpus – 200 plane crash articles – ● Binary Classification (local) – ● avg. 11.5 sentences Original vs. random orderings Sentence Ordering (global) – Our model uses simulated annealing to find optimal ordering of sentences ● – Not scalable Kendall's Tau metric ● ● proportional to min. number of pairwise swaps ranges from -1 to 1. 0 signifies random ordering Results on Test Corpus Binary Classification: Airplane Test Discriminative (%) Barzilay and Lapata (SVM EGrid) 90 Barzilay and Lee (HMM) 74 Soricut and Marcu (Mixture) Unified (Relaxed EGrid/HMM) 94 Sentence Ordering: Airplane Test Barzilay and Lapata (SVM EGrid) Barzilay and Lee (HMM) Soricut and Marcu (Mixture) Unified (Relaxed EGrid/HMM) Kendall's τ 0.44 0.50 0.50 Relaxed Entity Grid Improvement 10-fold cross-validation results of different versions of our combined model: Airplane Development Generative EGrid Relaxed EGrid Unified (Generative EGrid/HMM) Unified (Relaxed EGrid/HMM) τ Discr. (%) 0.17 81 0.02 87 0.39 85 0.54 96 Best performance Unified Relaxed EGrid on all tasks ● Naïve EGrid is better on tau than Relaxed? ● Relaxed Entity Grid Improvement 10-fold cross-validation results of different versions of our combined model: Airplane Development Generative EGrid Relaxed EGrid Unified (Generative EGrid/HMM) Unified (Relaxed EGrid/HMM) τ Discr. (%) 0.17 81 0.02 87 0.39 85 0.54 96 Best performance Unified Relaxed EGrid on all tasks ● Naïve EGrid is better on tau than Relaxed? ● 40% begin: “This is preliminary information, subject to change, and may contain errors. Any errors in this report will be corrected when the final report has been completed.” Acknowledgments ● ● ● ● Parents Eugene – great advisor, preacher of generative models, and thesis comments Mark – thesis comments Micha – pair programming, whiteboard work, thesis comments, and friendship ● Regina Barzilay and Mirella Lapata - comments ● BLLIP – fun research group and comments ● Friends! ● Karen T. Romer Foundation for Summer Funding
© Copyright 2026 Paperzz