NLP course project Automatic headline generation Project description ● The content of the course will include the most fundamental parts of language processing: ○ ○ ○ ○ ○ Tokenization, sentence boundaries Morphology Syntax Semantics etc. ● For the practical project, we are going to focus on a very concrete application scenario where you can apply most of the material taught in the course. Project description Headline generation: Given a newswire article, generate a very short summary containing the main fact in the article. Headline generation With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Headline generation With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Headline generation Switzerland working to clean up massive amounts of orbiting debris With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Headline generation Switzerland Announces Plan to Clean Up Space Junk With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Headline generation Swiss Plan "Janitor Satellite" to Clean Up Space Junk With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Headline generation Swiss looks to tidy up space junk With the possibility of commercial space travel becoming a reality in the near future, we are facing some serious safety hurdles before the industry “takes off”. Chief among these issues is something called the Kessler Syndrome. Named after a NASA scientist, it’s a theory that as the space around the planet gets more crowded, collisions are more likely to occur, adding even more debris orbiting the globe. If left unchecked, the amount of junk floating around our planet could theoretically block out the sun, let alone make for safe space travel. Swiss scientists don’t want this to happen and have decided to do something about it. Automatic Text Summarization A summary is: ● "a concise representation of a document's content to enable a reader to determine its relevance to a specific information" (Johnson, 1995). ● "a text produced from one or more texts, that contains a significant portion of the information in the original text(s), and is no longer than half of the original text(s) (Hovy, 2003). Automatic text summarization: ● Methods and techniques for obtaining summaries of textual documents in a fully automated way. Headline Generation Single-document summarization producing very short summaries. Examples (February 19, 2012): Syria 'disintegrating under crippling sanctions' NASA celebrates 50th anniversary of John Glenn's flight Iranian warships sail into the Mediterranean Latvians say "no" in Russian language vote Dow nears psychological milestone: 13000 Greek cabinet backs extra austerity measures Whitney Houston's journey 'home' ends with private burial Headline Generation Single-document summarization producing very short summaries. Advantages (I): ● The output is just one sentence: not so much need to worry about coherence and cohesion. But he said President Bashar al-Assad's government would fight to the end. Human rights groups have put the figure at more than 7,000, while the government says at least 2,000 members of the security forces have been killed combating "armed gangs and terrorists". Headline Generation Single-document summarization producing very short summaries. Advantages (II): ● The first sentence in news articles is usually a summary. Syria 'disintegrating under crippling sanctions' One of Syria's leading businessmen says its economy is being crippled by foreign sanctions and that the government is slowly disintegrating. NASA celebrates 50th anniversary of John Glenn's flight Veterans of NASA's Project Mercury reunited Saturday to celebrate the 50th anniversary of John Glenn's orbital flight, visiting the old launch pad and meeting the astronaut himself. Headline Generation Single-document summarization producing very short summaries. Advantages (III): ● Selecting only one sentence from one document we do not have to worry about removing redundancy in the summary. Headline Generation Single-document summarization producing very short summaries. Main challenges: ● How to find the most relevant information in the source document. Cannot always benefit from redundancy. ● How to generate a very short, yet grammatical summary. ● Many headlines are equally good. How to evaluate discriminating valid variations of headlines from irrelevant ones. DUC and TAC competitions ● International competitions of automatic summarizers: ○ Document Understanding Conferences (DUC) from 2000 to 2007. ○ Text Analysis Conferences (TAC), from 2008. ● Each competition defines one or several tasks and provides: ○ Development data for participants to tune their summarizers. ○ Test data (usually available the last week). ○ Manual and automatic evaluation. DUC-2004 http://www-nlpir.nist.gov/projects/duc/guidelines/2004.html Tasks that year: 1. Very short single-document summaries (<= 75 bytes). (Headline Generation) 2. Multi-document summaries (<= 665 bytes). 3. Very short cross-lingual single-document summaries. 4. Cross-lingual multi-document summaries. 5. Multi-document query-focused summarization, e.g. as answer of [who is X?]. DUC-2004 Very Short Summaries ● A total of 500 documents. ● Generate a headline for each summary (<= 75 bytes). ● Summaries over the length limit are truncated. ● Evaluation using ROUGE n-gram matching. DUC-2004 Very Short Summaries ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ● Inspired by BLEU for Machine Translation evaluation. ● Developed by Chin-Yew Lin. ● Compares the automatic summary with a set of manual summaries. ● Counts the number of overlapping n-grams. ● Many different variations of ROUGE metrics. DUC-2004 Very Short Summaries ● BLEU = precision of n-grams ● ROUGE = recall of n-grams M: N-grams in manual summaries A: N-grams in automatic summary DUC-2004 Very Short Summaries Example: ROUGE-1 M1: Switzerland working to clean up orbiting debris M2: Switzerland Announces Plan to Clean Up Space Junk A1: Swiss Plan "Janitor Satellite" to Clean Up Space Junk Tokens in manual summaries: 15 Tokens in manual summaries which appear in the automatic summary: 9 Recall = 9 / 15 DUC-2004 Very Short Summaries Example: ROUGE-1 M1: Switzerland working to clean up orbiting debris M2: Switzerland Announces Plan to Clean Up Space Junk A2: Swiss looks to tidy up space junk Tokens in manual summaries: 15 Tokens in manual summaries which appear in the automatic summary: 5 Recall = 5 / 15 DUC-2004 Very Short Summaries Example: ROUGE-1 M1: Switzerland working to clean up orbiting debris M2: Switzerland Announces Plan to Clean Up Space Junk A3: Space Junk Plan to Clean Up Switzerland Tokens in manual summaries: 15 Tokens in manual summaries which appear in the automatic summary: 11 Recall = 11 / 15 ! DUC-2004 Very Short Summaries Other evaluation procedures: ● Extrinsic evaluation (e.g. long clicks on documents with different titles or snippets; summaries for open-ended answers in QA systems). ● Manual intrinsic evaluations (readability, informativeness, coherence, overall quality, etc.) ● Many automatic metrics proposed, but ROUGE-SU4 (a variation of ROUGE) ranks well in terms of rank correlation to overall quality scores. Techniques Ideally, a summarizer would be implemented according to the following architecture: 1. Process the input documents. 2. Understand them and transform them into a semantic representation encoding what happened, what the entities involved where, the role of each entity, and additional modifiers (time, location, etc.) 3. Reason or Inference on the semantic representation to decide what will appear in the summary. 4. Generate the summary with Natural Language Generation But the technology is not ready yet... ...to achieve > 95% readable, informative summaries for every possible domain Techniques Most existing summarizers use extractive technologies: ● Weight the sentences based on a combination of features: ○ Sentence position. ○ Sentence length. ○ Lexical cohesion scores (words in common with other sentences, graph-based scoring algorithms, etc.). ○ Relevant entities mentioned in the sentence. ○ etc. ● Extract the top N sentences (N=1 for headlines) and remove redundant sentences. ● Postprocessing. Sentence rewriting and compression. Techniques DUC proceedings: ● http://duc.nist.gov/pubs.html TAC proceedings: ● 2008: http://www.nist.gov/tac/publications/2008/papers.html ● 2009: http://www.nist.gov/tac/publications/2009/papers.html ● 2010: http://www.nist.gov/tac/publications/2010/papers.html ACL Anthology: ● http://aclweb.org/anthology-new/ Getting started 1. Download the ROUGE evaluation script from (you will need to apply and wait for the response e-mail): http://berouge. com/default.aspx 2. Download the data files at: http://alfonseca.org/nlp-ethz.tar.gz 3. Decompress the file: tar xvf - nlp-ethz.tgz | gunzip -c 4. Go to the docs directory and look at the input documents and their format: cd nlp-ethz/docs ; ls 5. Go to the eval directory and browse the manual headlines (manual/) and the participants proposed headlines (peers/): cd nlp-ethz/eval/manual ; ls cd nlp-ethz/eval/peers ; ls Getting started 6. Run ROUGE on the data: ROUGE/RELEASE-1.5.5/ROUGE-1.5.5.pl \ -e ROUGE/RELEASE-1.5.5/data \ -a -c 95 -b 75 -m -n 4 -w 1.2 t1.rouge.in > output Organization aspects ● You can work as individuals or 2-person groups. ● We now provide the development data on which you can improve your systems. At the end of the term we will provide to you the test data to verify that the systems were not over fitting the development set. We will keep track of progress on these sets. ● We will provide pointers to some additional off-the-shelf tools that you can download and use, such as parsers. ● You can share the tools you develop that are useful to other teams, e.g. to parse the input format of the files (this will be valued in the project's final score).
© Copyright 2026 Paperzz