Semi-supervised Structured Prediction Models

Enhancing Diversiy, Coverage and Balance
for Summarization through Structure Learning
Liangda Li 1
Ke Zhou 1
1
Gui-Rong Xue 1
Hongyuan Zha 2
Department of Computer Science
Shanghai Jiao-Tong University
2
College of Computing
Georgia Institute of Technology
WWW 2009
Yong Yu 1
Outline
 Introduction
 Diversity, Coverage and Balance
 Optimization Problem and Structure Learning
Framework
 Experiments
 Conclusion
Introduction
Example for Search Results
Example for News Browsing
Traditional summarization approaches
Robert H. Bork, who once hoped for a Supreme Court seat, instead stood before the nation's highest court Monday.
there in the capacity
a lawyer _ apparently making him
the first
defeated
Supreme Court nominee ever to
He was
Consider
theof summarization
task
as
a binary
argue before the justices.
classification problem
Representing Citibank in a big-stakes battle, Bork argued that U.S. banks with branch offices overseas should not be
required to pay depositors after foreign governments seize or freeze those accounts.
Robert H. Bork, who once hoped for a Supreme Court seat, instead stood before the nation's highest court Monday.
He was
0-1
loss function
branded a conservative extremist by opponents.
Bork, nominated by then-President Reagan, has said he was the victim of a campaign of lies and distortions led by
liberals.
ThereRaise
serious
unbalance
recall
were no references
to that redundancy,
fight Monday as Bork engaged
in debate with theand
justiceslow
and opposing
lawyers
over the complexities of federal banking law and related matters.
problems.
Bork was treated much the same as any attorney who appears before the justices.
They questioned him vigorously, occasionally interrupting him for clarification and elaboration.
Justice Anthony M. Kennedy, who occupies the seat that eluded Bork, directed a few questions at Bork.
Bork, 63, a former federal appeals court judge, is a fellow at the American Enterprise Institute and will begin teaching
constitutional law in the fall at the George Mason University law school in Arlington, Va.
He will be paid $25,000 a year to teach one course each semester as a part-time professor.
Bork was only the sixth man this century to be denied a Supreme Court seat by the Senate and the 26th in its history.
Diversity, Coverage and Balance
Three Key Requirements in
Summarization
 Diversity: less redundant sentences
 Coverage: little information loss
 Balance: emphasize various aspects of the document in
a balance way
Example
 AP890616-0912 from DUC2001
13 AP890616-0192 4826
<DOC>
<DOCNO> AP890616-0192 </DOCNO>
<FILEID>AP-NR-06-16-89 0237EST</FILEID>
<FIRST>r f PM-Milken Bjt 06-16 0734</FIRST>
<SECOND>PM-Milken, Bjt,0757</SECOND>
<HEAD>Indicted Bond Trader Quits Drexel, Sets Up Own Firm</HEAD>
<BYLINE>By STEFAN FATSIS</BYLINE>
<BYLINE>AP Business Writer</BYLINE>
<DATELINE>NEW YORK (AP) </DATELINE>
<TEXT>
Michael Milken, the fallen Drexel Burnham Lambert financier, is striking out on his own.
But what isn't clear is whether loyal ex-colleagues will follow the Pied Piper of junk bonds to his
new firm _ and how long the venture will last with Milken facing a lengthy jail term.
……
 Generally three topics (views from different perspectives):
 Michael Milken himself
 Involved people
 The company
Example for Diversity

Milken
himself
A: Milken, 42, resigned Thursday after 19 years at Drexel, where he began
a wildly successful career that helped reshape corporate America in the
1980s through the pioneering use of low-grade securities called junk bonds.
B: Milken, who made a reported $550 million in 1987, said he is forming a
consulting firm to assist companies that want to raise money to start up,
grow or stay in business.
C: People involved in Milken's plans for the new firm, International Capital
Assets Group, said Milken does not intend to raid Drexel's Beverly Hills,
Calif., junk bond division, which he founded and ran until his March
indictment.
 B & C is better than A & B
 Select sentences belonging to different topics
Involved
People
Example for Coverage

Relevant
A: Milken, who made a reported $550 million in 1987, said he is forming a
consulting firm to assist companies that want to raise money to start up,
grow or stay in business.
B: Milken joined Drexel full-time in 1970 after graduating from the
University of Pennsylvania's Wharton School.
 A is better than B
Irrelevant
 Select sentences more relevant to one of the topics.
Example for Balance

Milken
himself
A: Milken, who made a reported $550 million in 1987, said he is forming a
consulting firm to assist companies that want to raise money to start up,
grow or stay in business.
B: But he faces $1.85 billion in forfeitures of alleged illegal profits and a
lengthy jail term if convicted on a 98-count fraud and racketeering
indictment, the government's largest securities crime prosecution to date.
C: A Drexel official, speaking on condition of anonymity, said the Wall Street
giant does not anticipate conflicts with Milken's new firm because it will not
be in the brokerage business.
D: “Michael Milken made many important contributions to Drexel Burnham,
and his resignation, although not unexpected, is a sad event,” Drexel stated.
 A & B & C is better than B & C & D
The
company
 For each topic, select the same percentage of sentences
according to its corresponding weight.
Optimization Problem and Structure
Learning Framework
Problem Formulation
 Predicate a summary: y* = argmaxy f(x,y)
 Learning a model: f(x,y) = <w,ψ(x,y)>
 Joint feature representation: ψ(x,y)
 Loss function: Δ(y, y’)
Structural Support Vector Machines (Tsochantaridis et al., 2005)
 Large margin approach:

The parameter c: controls the tradeoff between model complexity
and the sum of slacks variables

The constraints enforce the ground-truth summary a higher score.
Constraint for Diversity
 Diversity: little overlap
 The sum of summary sentences’ unique score should be
no more than the overall score when they are regarded
as a whole set.
 Each sentence should focus on different subtopics
Constraint for Coverage
 Coverage: cover all subtopics as much as possible
 Vector v: a sentence’s coverage of the subtopic set.
 Subtopic Coverage Degree
Subtopic Set
 A subtopic set T for each document,
each subtopic is associated with a set of words
 cover(t, s) is employed to define sentence s’s coverage
of subtopic t:
cover(t, s) represents the proportion of the words in the subtopic t
that also appear in the sentence s.
Example
 Each topic may owns several subtopics, which indicates its
importance
 Topic: Michael Milken himself
 Subtopic: Milken’s contribution, Milken’s fallen, Milken’s current situation.
 For subtopic t: Milken’s contribution
 a: Milken, who made a reported $550 million in 1987, said he is forming
a consulting firm to assist companies that want to raise money to start
up, grow or stay in business.
b: But he faces $1.85 billion in forfeitures of alleged illegal profits and a
lengthy jail term if convicted on a 98-count fraud and racketeering
indictment, the government's largest securities crime prosecution to
date.
c: “I am naturally disappointed to be forced to leave Drexel as part of
the firm's settlement with the government, but I look forward to the
opportunity of helping people build companies,” Michael Milken said in a
statement.
cover(t,a)=1; cover(t,b)=0; cover(t,c)=0.16.
Constraint for Balance
 Balance: relatively equal coverage for each subtopic
 Variation of subtopics’ coverage:
Combined Optimization Problem
Structure Learning
 Independence Graphs
 Measure the similarity between sentences
 Shrink the searching space
 Learning Algorithm
 Cutting plane algorithm
 Making Prediction
Experiments
Experiments Setup
 Dataset: DUC2001

Bigset, Docset1, Docset2


Bigset: contains147 document-summary pairs from DUC2001 dataset
Docset1, Docset2: two main subset of Bigset
 Evaluation Metric
 F1 Evaluation
 Rouge Evaluation
 Comparable to F1 evaluation
 ROUGE-N-R, ROUGE-N-P, ROUGE-N-F
gramn denotes the n-grams in document y
Overall Performance
 Our approach performs best
 Results on smaller data set show the robustness of our
approach
Constraint Selection
 Coverage-biased constraint makes the greatest
contribution to summarization.
 The model trained with all three constraints performs the
best
Conclusion
 Diversity, Coverage and Balance
 Prove to be of great importance to the summarization task.
 Structural Learning Framework
 Structural SVM
 Three constraints enforce diversity, coverage and balance
seperately
 Independence graphs and Cutting plane algorithm
 Experimental Results
 Our approach outperforms state-of-art ones.
 The constraint imporve the preformance significantly.
Thank you!