How does sentence order effect the understandability of documents?

How does sentence order effect the
understandability of documents?
Joe Austerweil*
Brown Laboratory for Linguistics
Information Processing (BLLIP)
*The research underlying the thesis is joint work with
Micha Elsner and Eugene Charniak
Results on Test Corpus
Binary Classification:
Airplane Test
Discriminative (%)
Barzilay and Lapata (SVM EGrid)
90
Barzilay and Lee (HMM)
74
Soricut and Marcu (Mixture)
Unified (Relaxed EGrid/HMM)
94
Sentence Ordering:
Airplane Test
Barzilay and Lapata (SVM EGrid)
Barzilay and Lee (HMM)
Soricut and Marcu (Mixture)
Unified (Relaxed EGrid/HMM)
Kendall's τ
0.44
0.50
0.50
Introduction
●
Classic language models do not distinguish
between sentence orderings
–
n-gram models
1. Joe really likes Gilmore Girls.
2. It is his favorite television show.
3. Rory has a crush on him.
●
vs.
1. It is his favorite television show.
2. Rory has a crush on him.
3. Joe really likes Gilmore Girls.
Document coherence is “a property of well-written
texts that... [are] easier to understand than ...
random ... sentences” (Lapata and Barzilay 2005)
Our Two Tasks
●
Binary Classification
1. Joe really likes Gilmore Girls.
2. It is his favorite television show.
3. Rory has a crush on him.
–
●
vs.
1. It is his favorite television show.
2. Joe really likes Gilmore Girls.
3. Rory has a crush on him.
ETS Essay Grading (Miltsakaki and Kukich 2004)
Sentence Ordering – impose an ordering on a
“bag of sentences”
?. Rory has a crush on him.
?. Joe really likes Gilmore Girls.
?. It is his favorite television show.
–
1. Joe really likes Gilmore Girls.
2. It is his favorite television show.
3. Rory has a crush on him.
Multi-Document Summarization – one coherent text
from multiple news sources (Evans et. al. 2004)
Two Main Types of Models
●
●
Local Models -
–
Word Overlap and LSA (Foltz et al 1998)
–
Naïve Entity Grid (Lapata and Barzilay 2005)
–
Relaxed Entity Grid (Elsner et al 2007)
Global Models - Does the document conform to
a structured outline?
–
Hidden Markov Model (Barzilay and Lee 2004)
Word Overlap and Latent Semantic
Analysis (LSA)
●
Word overlap (LSA) – co-occurrence of words
(meanings) in groups of adjacent sentences
Pro:
–
Semantically similar sentences are likely to be close
in coherent texts (smooth topic transitions)
Con:
–
Does not capture asymmetries
●
–
“car” implies “wheels,” but “wheels” does not imply “car”
“car” subject vs. syntactically unimportant “car”
Naïve Entity Grid
1: The commercial pilot, sole occupant of the airplane, was not injured .
2: The airplane was owned and operated by a private owner.
3: Visual meteorological conditions prevailed for the personal cross country
flight for which a VFR flight plan was filed.
4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300.
Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser)
(Lapata and Barzilay 2005, Barzilay and Lapata 2005)
Naïve Entity Grid
1: The commercial pilot, sole occupant of the airplane, was not injured .
2: The airplane was owned and operated by a private owner.
3: Visual meteorological conditions prevailed for the personal cross country
flight for which a VFR flight plan was filed.
4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300.
History (h = 2 is better)
Generating role
Transition from unknown to object
Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser)
(Lapata and Barzilay 2005, Barzilay and Lapata 2005)
Naïve Entity Grid
1: The commercial pilot, sole occupant of the airplane, was not injured .
2: The airplane was owned and operated by a private owner.
3: Visual meteorological conditions prevailed for the personal cross country
flight for which a VFR flight plan was filed.
4: The flight originated at Nuevo Laredo, Mexico, at approximately 1300.
History (h = 2 is better)
Generating role
Transition from object to Probability of a document is the product of all transition
of syntactic roles (S,O,X,-)
Also, condition on number of occurrences (salience)
Two assumptions:
1. Markov independence between sentences (column)
2. Entity independence (row)
Entities are the head of every noun phrase (using Charniak and Johnson 2005's parser)
(Lapata and Barzilay 2005, Barzilay and Lapata 2005)
Naïve Entity Grid
●
Pro:
–
Based on Centering Theory
–
Coherent texts repeat important (salient) nouns
–
Entity role transitions are somewhat predictable
●
●
P(O | S vs. - in history)?
Con:
–
Most likely transition is – to –
●
–
no nouns in most likely document!
Entities are assumed independent
Relaxed Entity Grid
●
Each sentence has two sets of entities: Known
and New.
–
●
For now, ignore new entities
Entities compete for role “slots”
Owner
1: The commercial pilot, sole occupant of the airplane, was not injured .
2: The _________ (O) was owned and operated by a private ______(ignored).
●
Creates a sparse distribution
–
Too many possible known entity histories
●
We use simple brute force normalization
●
Works empirically, but is inconsistent
Local Coherence Analysis
A coherent entity grid at very low zoom:
entities occur in long contiguous columns.
A grid for a randomly
permuted document
tends to look like this.
But what if we flip it?
Or move around
paragraphs?
Hidden Markov Model (Global)
●
Assume an underlying chain of topics
generate the words of each sentence
Recently, Britney Spears shaved her head because she went crazy
Jon Stewart made fun of Britney Spears on the Daily Show.
●
●
Each sentence is independent given its topic
Each topic is independent given the previous
topic
(Barzilay and Lee 2004)
Global Coherence Analysis
●
●
Pro:
–
Domain-general global coherence modeling
–
Calculates coherence based on flow of topics
–
Can impose order on bag of sentences
Con:
–
Either the states ignore local information or are sparse
1. Recently, an actress shaved her head
2.The actress is Natalie Portman.
3.She is not the first actress to shave her head on purpose.
●
–
Should there be a shaving_actress state?
–
What about shaving_singer state?
Local models track entity transitions...
Mixture Model Approach
●
Log linear weighted combination of models:
–
●
Pro:
–
●
Global HMM, Naïve Entity Grid, 2 local word cooccurrence models
Improved performance on sentence ordering task
Con:
–
Models are independent of each other
–
Does not improve HMM state sparsity problems
–
Training is task specific
–
Non-generative – modularity?
(Soricut and Marcu 2006)
Our Generative Combined Model
State
qi
q i=1
Known
Entities
...
E i =1
New entities
Ei
Wi
Other words
Ni
Our Combined Generative Model
●
●
Formulated as a series of Dirichlet and PitmanYor Processes
Gibbs Sampling to learn:
–
●
Inner Parameters, Transition Probabilities, Number
of Hidden Topic States
Metropolis-Hastings Sampling to learn:
–
Hyperparameters
●
Training is not task specific
●
Uses less information than the Mixture Model
–
Could improve mixture model's performance?
Experiments
●
Airplane Corpus – 200 plane crash articles
–
●
Binary Classification (local)
–
●
avg. 11.5 sentences
Original vs. random orderings
Sentence Ordering (global)
–
Our model uses simulated annealing to find optimal
ordering of sentences
●
–
Not scalable
Kendall's Tau metric
●
●
proportional to min. number of pairwise swaps
ranges from -1 to 1. 0 signifies random ordering
Results on Test Corpus
Binary Classification:
Airplane Test
Discriminative (%)
Barzilay and Lapata (SVM EGrid)
90
Barzilay and Lee (HMM)
74
Soricut and Marcu (Mixture)
Unified (Relaxed EGrid/HMM)
94
Sentence Ordering:
Airplane Test
Barzilay and Lapata (SVM EGrid)
Barzilay and Lee (HMM)
Soricut and Marcu (Mixture)
Unified (Relaxed EGrid/HMM)
Kendall's τ
0.44
0.50
0.50
Relaxed Entity Grid Improvement
10-fold cross-validation results of different versions of
our combined model:
Airplane Development
Generative EGrid
Relaxed EGrid
Unified (Generative EGrid/HMM)
Unified (Relaxed EGrid/HMM)
τ
Discr. (%)
0.17
81
0.02
87
0.39
85
0.54
96
Best performance Unified Relaxed EGrid on all tasks
● Naïve EGrid is better on tau than Relaxed?
●
Relaxed Entity Grid Improvement
10-fold cross-validation results of different versions of
our combined model:
Airplane Development
Generative EGrid
Relaxed EGrid
Unified (Generative EGrid/HMM)
Unified (Relaxed EGrid/HMM)
τ
Discr. (%)
0.17
81
0.02
87
0.39
85
0.54
96
Best performance Unified Relaxed EGrid on all tasks
● Naïve EGrid is better on tau than Relaxed?
●
40% begin: “This is preliminary information, subject to change, and
may contain errors. Any errors in this report will be corrected when
the final report has been completed.”
Acknowledgments
●
●
●
●
Parents
Eugene – great advisor, preacher of generative
models, and thesis comments
Mark – thesis comments
Micha – pair programming, whiteboard work, thesis
comments, and friendship
●
Regina Barzilay and Mirella Lapata - comments
●
BLLIP – fun research group and comments
●
Friends!
●
Karen T. Romer Foundation for Summer Funding

Download Report

How does sentence order effect the understandability of documents?

Paperzz.com

Your Paperzz