NLP course project Automatic headline generation

NLP course project
Automatic headline generation
Project description
● The content of the course will include the most
fundamental parts of language processing:
○
○
○
○
○
Tokenization, sentence boundaries
Morphology
Syntax
Semantics
etc.
● For the practical project, we are going to focus on
a very concrete application scenario where you
can apply most of the material taught in the
course.
Project description
Headline generation:
Given a newswire article,
generate a very short summary
containing the main fact in the article.
Headline generation
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Headline generation
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Headline generation
Switzerland working to clean up massive amounts of orbiting
debris
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Headline generation
Switzerland Announces Plan to Clean Up Space Junk
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Headline generation
Swiss Plan "Janitor Satellite" to Clean Up Space Junk
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Headline generation
Swiss looks to tidy up space junk
With the possibility of commercial space travel becoming a
reality in the near future, we are facing some serious safety
hurdles before the industry “takes off”. Chief among these
issues is something called the Kessler Syndrome. Named after
a NASA scientist, it’s a theory that as the space around the
planet gets more crowded, collisions are more likely to occur,
adding even more debris orbiting the globe. If left unchecked,
the amount of junk floating around our planet could theoretically
block out the sun, let alone make for safe space travel. Swiss
scientists don’t want this to happen and have decided to do
something about it.
Automatic Text Summarization
A summary is:
● "a concise representation of a document's content to enable
a reader to determine its relevance to a specific information"
(Johnson, 1995).
● "a text produced from one or more texts, that contains a
significant portion of the information in the original text(s),
and is no longer than half of the original text(s) (Hovy, 2003).
Automatic text summarization:
● Methods and techniques for obtaining summaries of textual
documents in a fully automated way.
Headline Generation
Single-document summarization producing very short
summaries.
Examples (February 19, 2012):
Syria 'disintegrating under crippling sanctions'
NASA celebrates 50th anniversary of John Glenn's flight
Iranian warships sail into the Mediterranean
Latvians say "no" in Russian language vote
Dow nears psychological milestone: 13000
Greek cabinet backs extra austerity measures
Whitney Houston's journey 'home' ends with private burial
Headline Generation
Single-document summarization producing very short
summaries.
Advantages (I):
● The output is just one sentence: not so much need to worry
about coherence and cohesion.
But he said President Bashar al-Assad's government would
fight to the end. Human rights groups have put the figure at
more than 7,000, while the government says at least 2,000
members of the security forces have been killed combating
"armed gangs and terrorists".
Headline Generation
Single-document summarization producing very short
summaries.
Advantages (II):
● The first sentence in news articles is usually a summary.
Syria 'disintegrating under crippling sanctions'
One of Syria's leading businessmen says its economy is being crippled by
foreign sanctions and that the government is slowly disintegrating.
NASA celebrates 50th anniversary of John Glenn's flight
Veterans of NASA's Project Mercury reunited Saturday to celebrate the 50th
anniversary of John Glenn's orbital flight, visiting the old launch pad and
meeting the astronaut himself.
Headline Generation
Single-document summarization producing very short
summaries.
Advantages (III):
● Selecting only one sentence from one document we do not
have to worry about removing redundancy in the summary.
Headline Generation
Single-document summarization producing very short
summaries.
Main challenges:
● How to find the most relevant information in the source
document. Cannot always benefit from redundancy.
● How to generate a very short, yet grammatical summary.
● Many headlines are equally good. How to evaluate
discriminating valid variations of headlines from irrelevant
ones.
DUC and TAC competitions
● International competitions of automatic summarizers:
○ Document Understanding Conferences (DUC) from 2000
to 2007.
○ Text Analysis Conferences (TAC), from 2008.
● Each competition defines one or several tasks and provides:
○ Development data for participants to tune their
summarizers.
○ Test data (usually available the last week).
○ Manual and automatic evaluation.
DUC-2004
http://www-nlpir.nist.gov/projects/duc/guidelines/2004.html
Tasks that year:
1. Very short single-document summaries (<= 75 bytes).
(Headline Generation)
2. Multi-document summaries (<= 665 bytes).
3. Very short cross-lingual single-document summaries.
4. Cross-lingual multi-document summaries.
5. Multi-document query-focused summarization, e.g. as
answer of [who is X?].
DUC-2004 Very Short Summaries
● A total of 500 documents.
● Generate a headline for each summary (<= 75 bytes).
● Summaries over the length limit are truncated.
● Evaluation using ROUGE n-gram matching.
DUC-2004 Very Short Summaries
ROUGE (Recall-Oriented Understudy for Gisting Evaluation):
● Inspired by BLEU for Machine Translation evaluation.
● Developed by Chin-Yew Lin.
● Compares the automatic summary with a set of manual
summaries.
● Counts the number of overlapping n-grams.
● Many different variations of ROUGE metrics.
DUC-2004 Very Short Summaries
● BLEU = precision of n-grams
● ROUGE = recall of n-grams
M: N-grams in
manual
summaries
A: N-grams in
automatic
summary
DUC-2004 Very Short Summaries
Example: ROUGE-1
M1: Switzerland working to clean up orbiting debris
M2: Switzerland Announces Plan to Clean Up Space Junk
A1: Swiss Plan "Janitor Satellite" to Clean Up Space Junk
Tokens in manual summaries: 15
Tokens in manual summaries which appear in the automatic
summary: 9
Recall = 9 / 15
DUC-2004 Very Short Summaries
Example: ROUGE-1
M1: Switzerland working to clean up orbiting debris
M2: Switzerland Announces Plan to Clean Up Space Junk
A2: Swiss looks to tidy up space junk
Tokens in manual summaries: 15
Tokens in manual summaries which appear in the automatic
summary: 5
Recall = 5 / 15
DUC-2004 Very Short Summaries
Example: ROUGE-1
M1: Switzerland working to clean up orbiting debris
M2: Switzerland Announces Plan to Clean Up Space Junk
A3: Space Junk Plan to Clean Up Switzerland
Tokens in manual summaries: 15
Tokens in manual summaries which appear in the automatic
summary: 11
Recall = 11 / 15 !
DUC-2004 Very Short Summaries
Other evaluation procedures:
● Extrinsic evaluation (e.g. long clicks on documents with
different titles or snippets; summaries for open-ended
answers in QA systems).
● Manual intrinsic evaluations (readability, informativeness,
coherence, overall quality, etc.)
● Many automatic metrics proposed, but ROUGE-SU4 (a
variation of ROUGE) ranks well in terms of rank correlation
to overall quality scores.
Techniques
Ideally, a summarizer would be implemented according to the
following architecture:
1. Process the input documents.
2. Understand them and transform them into a semantic representation
encoding what happened, what the entities involved where, the role of each
entity, and additional modifiers (time, location, etc.)
3. Reason or Inference on the semantic representation to decide what will
appear in the summary.
4. Generate the summary with Natural Language Generation
But the technology is not ready yet...
...to achieve > 95% readable, informative summaries for every possible domain
Techniques
Most existing summarizers use extractive technologies:
● Weight the sentences based on a combination of features:
○ Sentence position.
○ Sentence length.
○ Lexical cohesion scores (words in common with other
sentences, graph-based scoring algorithms, etc.).
○ Relevant entities mentioned in the sentence.
○ etc.
● Extract the top N sentences (N=1 for headlines) and remove
redundant sentences.
● Postprocessing. Sentence rewriting and compression.
Techniques
DUC proceedings:
● http://duc.nist.gov/pubs.html
TAC proceedings:
● 2008: http://www.nist.gov/tac/publications/2008/papers.html
● 2009: http://www.nist.gov/tac/publications/2009/papers.html
● 2010: http://www.nist.gov/tac/publications/2010/papers.html
ACL Anthology:
● http://aclweb.org/anthology-new/
Getting started
1. Download the ROUGE evaluation script from (you will need
to apply and wait for the response e-mail): http://berouge.
com/default.aspx
2. Download the data files at:
http://alfonseca.org/nlp-ethz.tar.gz
3. Decompress the file:
tar xvf - nlp-ethz.tgz | gunzip -c
4. Go to the docs directory and look at the input documents and
their format:
cd nlp-ethz/docs ; ls
5. Go to the eval directory and browse the manual headlines
(manual/) and the participants proposed headlines (peers/):
cd nlp-ethz/eval/manual ; ls
cd nlp-ethz/eval/peers ; ls
Getting started
6. Run ROUGE on the data:
ROUGE/RELEASE-1.5.5/ROUGE-1.5.5.pl \
-e ROUGE/RELEASE-1.5.5/data \
-a -c 95 -b 75
-m -n 4 -w 1.2 t1.rouge.in > output
Organization aspects
● You can work as individuals or 2-person groups.
● We now provide the development data on which you can
improve your systems. At the end of the term we will provide
to you the test data to verify that the systems were not over
fitting the development set. We will keep track of progress on
these sets.
● We will provide pointers to some additional off-the-shelf tools
that you can download and use, such as parsers.
● You can share the tools you develop that are useful to other
teams, e.g. to parse the input format of the files (this will be
valued in the project's final score).