talk - Minjoon Seo

TowardsEnd-to-EndReasoning
forQuestionAnswering
MinjoonSeo
UniversityofWashington
September21,2016
@SKT-Brain
Whatisreasoning?
SimpleQuestionAnsweringModel
Whatis“Hello”in
French?
Bonjour.
Examples
• Mostneuralmachinetranslationsystems(Choetal.,2014;Bahdanau etal.
, 2014)
• Needveryhighhiddenstatesize(~1000)
• Noneedtoquerythedatabase(context)à veryfast
• Mostdependency,constituencyparser(Chenetal.,2014;Kleinetal.,2003)
• Sentimentclassification(Socher etal.,2013)
• Classifyingwhetherasentenceispositiveornegative
• Mostneuralimageclassificationsystems
• Thequestionisalways“Whatisintheimage?”
• Mostclassificationsystems
SimpleQuestionAnsweringModel
Whatis“Hello”in
French?
Bonjour.
Problem:parametricmodelhasfinite,pre-definedcapacity.
“Youcan’tevenfitinanentiresentenceintoasinglesmallvector!”
QAModelwithContext
Whatis“Hello”in
French?
Bonjour.
English
French
Hello
Bonjour
Thankyou
Merci
Context(KnowledgeBase)
Examples
• WikiQA(Yangetal.,2015)
• QASent(Wangetal.,2007)
• WebQuestions (Berant etal.,2013)
• WikiAnswer (Wikia)
• Free917(Cai andYates,2013)
• Manydeeplearningmodelswithexternalmemory (e.g.Memory
Networks)
QAModelwithContext
Fly
Whatdoesafrogeat?
Eats
IsA
(Amphibian, insect)
(Frog, amphibian)
(insect,flower)
(Fly,insect)
Context(KnowledgeBase)
Somethingismissing…
QAModelwithReasoningCapability
Fly
Whatdoesafrogeat?
Eats
IsA
FirstOrderLogic
(Amphibian, insect)
(Frog, amphibian)
(insect,flower)
(Fly,insect)
IsA(A, B)^IsA(C,D)^Eats(B,
D)à Eats(A,C)
Context(KnowledgeBase)
Examples
• Semanticparsing
• GeoQA (Krishnamurthyetal.,2013;Artzi etal.,2015)
• Sciencequestions
• AristoChallenge(Clarketal.,2015)
• ProcessBank (Berant etal.,2014)
• Machinecomprehension
• MCTest (Richardsonetal.,2013)
“Vague”linebetween
factoidQAandreasoningQA
• Factoid:
• Therequiredinformationisexplicitinthecontext
• Themodeloftenneedstohandlelexical/syntacticvariations
• Reasoning:
• Therequiredinformationmaynot beexplicitinthecontext
• Needtocombinemultiplefactstoderivetheanswer
• Thereisnoclearlinebetweenthetwo!
Ifourobjectiveisto“answer”difficult
questions…
• Wecantrytomakethemachinemorecapableofreasoning(better
model)
OR
• Wecantrytomakemoreinformationexplicitinthecontext(more
data)
QAModelwithReasoningCapability
Fly
Whatdoesafrogeat?
Whomakes
this?
Tellmeit’s not
me…
Eats
IsA
FirstOrderLogic
(Amphibian, insect)
(Frog, amphibian)
(insect,flower)
(Fly,insect)
IsA(A, B)^IsA(C,D)^Eats(B,
D)à Eats(A,C)
Context(KnowledgeBase)
End-to-endQAModelwithReasoning
Capability
Fly
Whatdoesafrogeat?
Frogisanexampleofamphibian.
Fliesareoneofthemostcommoninsectsaroundus.
Insectsaregoodsourcesofproteinforamphibians.
…
Contextinnaturallanguage
Isend-to-endalwaysfeasible?
• No.End-to-endsystemsperformpoorlyifeither:
• Dataislimited
• Logicissupercomplicated
• Butnothopeless.
ReasoningLevel
GeometryQA
(2015)
StanfordQA
(2016)
bAbI QA
(2016)
DiagramQA
(2016)
End-to-end-ness
GeometryQA
C
In the diagram at the
right, circle O has a
radius of 5, and CE =
2. Diameter AC is
perpendicular to chord
BD. What is the length
of BD?
a) 2
d) 8
b) 4 c) 6
e) 10
B
2
E
D
5
O
A
GeometryQAModel
Whatisthelengthof
BD?
8
In the diagram at the
right, circle O has a
radius of 5, and CE =
2. Diameter AC is
perpendicular to chord
BD.
Localcontext
First
Order
Logic
Globalcontext
Method
• Learntomapquestiontologicalform
• Learntomaplocalcontexttologicalform
• Textà logicalform
• Diagramà logicalform
• Globalcontextisalreadyformal!
• Manually defined
• “IfAB=BC,then<CAB=<ACB”
• Solveronalllogicalforms
• Wecreatedareasonablenumericalsolver
Mappingquestion/texttologicalform
Text
Input
Logical
form
IntriangleABC,lineDEisparallelwith
lineAC,DBequals4,ADis8,andDEis5.
FindAC.
(a)9(b)10(c)12.5(d)15(e)17
IsTriangle(ABC) ∧
B
D
A
E
C
Parallel(AC, DE) ∧
Equals(LengthOf(DB), 4) ∧
Equals(LengthOf(AD),
8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))
Difficulttodirectlymaptexttoalonglogicalform!
Mappingquestion/texttologicalform
Text
Input
Our
method
IntriangleABC,lineDEisparallelwith
lineAC,DBequals4,ADis8,andDEis5.
FindAC.
(a)9(b)10(c)12.5(d)15(e)17
D
A
Over-generatedliterals
Textscores
Diagramscores
IsTriangle(ABC)
Parallel(AC, DE)
Parallel(AC, DB)
Equals(LengthOf(DB), 4)
Equals(LengthOf(AD), 8)
Equals(LengthOf(DE), 5)
Equals(4, LengthOf(AD))
…
0.96
0.91
0.74
0.97
0.94
0.94
0.31
…
1.00
0.99
0.02
n/a
n/a
n/a
n/a
…
Selectedsubset
Logical
form
B
IsTriangle(ABC) ∧
Parallel(AC, DE) ∧
Equals(LengthOf(DB), 4) ∧
Equals(LengthOf(AD),
8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC))
E
C
Numericalsolver
• Translateliteralstonumericequations
Literal
Equation
Equals(LengthOf(AB),d)
(Ax-Bx)2+(Ay-By)2-d2 =0
Parallel(AB,CD)
(Ax-Bx)(Cy-Dy)-(Ay-By)(Cx-Dx)=0
PointLiesOnLine(B,AC)
(Ax-Bx)(By-Cy)-(Ay-By)(Bx-Cx)=0
Perpendicular(AB,CD)
(Ax-Bx)(Cx-Dx)+(Ay-By)(Cy-Dy)=0
• Findthesolutiontotheequationsystem
• Useoff-the-shelfnumericalminimizers(WalesandDoye,
1997;Kraft,1988)
• Numericalsolvercanchoosenot toanswerquestion
Dataset
• Trainingquestions(67questions,121sentences)
• Seoetal.,2014
• Highschoolgeometryquestions
• Testquestions (119questions,215sentences)
• Wecollectedthem
• SAT(UScollegeentranceexam)geometryquestions
• Wemanuallyannotatedthetextparseofall
questions
Results(EMNLP2015)
60
SATScore(%)
50
40
30
20
10
0
Textonly
Diagram Rule-based
only
GeoS
Student
average
***0.25penaltyforincorrectanswer
Limitations
• Datasetissmall
• Requiredlevelofreasoningisveryhigh
• àAlotofmanualefforts(annotations,ruledefinitions,etc.)
• àEnd-to-endsystemissimplyhopeless
• Collectmoredata?
• Changetask?
• Curriculumlearning?(Domorehopeful tasksfirst?)
ReasoningLevel
GeometryQA
(2015)
StanfordQA
(2016)
bAbI QA
(2016)
DiagramQA
(2016)
End-to-end-ness
DiagramQA
Q:Theprocessofwater
beingheatedbysunand
becominggasiscalled
A:Evaporation
IsDQAsubsetofVQA?
• Diagramsandrealimagesareverydifferent
• Diagramcomponentsaresimplerthanrealimages
• Diagramcontainsalotofinformationinasingleimage
• Diagramsarefew(whereasrealimagesarealmostinfinitelymany)
Problem
Whatcomesbefore
secondfeed?
8
Difficulttolatently
learnrelationships
Strategy
Whatdoesafrogeat?
Fly
DiagramGraph
DiagramParsing
QuestionAnswering
Attentionvisualization
Results(ECCV2016)
Method
Trainingdata
Accuracy
Random(expected)
-
25.00
LSTM+CNN
VQA
29.06
LSTM+CNN
AI2D
32.90
Ours
AI2D
38.47
Limitations
• Youneedalotofpriorknowledgetoanswersomequestions!
• E.g.“Flyisaninsect”,“Frogisanamphibian”
• Youcan’treallycallthisreasoning…
• Rathermatchtingalgorithm
• Nocomplexinferenceinvolved
ReasoningLevel
GeometryQA
(2015)
StanfordQA
(2016)
bAbI QA
(2016)
DiagramQA
(2016)
End-to-end-ness
bAbI QA
• Westonetal.,2015(Facebook)
• Syntheticallygeneratedreasoningstory-questionpairs
• 20tasks,1kquestionsineachtask
• Eachstorycanbeaslongas200sentences
• Requiresreasoningovermultiplesentences
• Shouldbetrainedend-to-end(nomanualrulesorexternallanguage
resources)
• Passedataskifaccuracy>=95%
TasksExamples
Previouswork
• RNN:TestedasbaselinebyWestonetal.(2015)
• Performsverypoorly;hiddenstateisinherentlyunstableforlong-termdependency
• Softmax attentionmechanism(Sukhbaatar etal.,2015,Xiong etal.,2016)
•
•
•
•
Usessharedexternalmemorywithsoftmax attentionmechanism
Attendondifferentfactsoverseverallayers
DMN:CombinesRNNandattentionmechanism
Problem:
• vanillasoftmax attentioncannotdistinguishbetweensimilarsentencesatdifferenttime
steps.
• Cannotcapturetimelocalityinformation.
Query-regression networks
• Namecomesfrom“LogicRegression”(notlinearregression)
• Transformingtheoriginalquerytoaneasier-to-answerquery,invectorspace
• PureRNN-basedmodel
•
•
•
•
•
completelyinternalmemory
Singleunitrecurringovertimeandlayers(simple)
AlthoughRNN,doesnotsufferfromlong-termdependencyproblem
TakefulladvantageofRNN’scapabilitytomodelsequentialdata
Canbeconsideredasusing“sigmoidattention”
Query-regressionnetworks
*(+,
×
+
*,∅
*(
',
1−
!
'(
×
"
)(
',
),-
*-∅
',
)--
*-.
∅
',
)-.
*-/
∅
',
)-/
*-0 = 23
garden
',
)-0
Where is
Sandra?
Where is
Sandra?
Where is
Daniel?
Where is
Daniel?
Where is
Daniel?
*,,
*,,
*,,
*,,
*,,
),,
Sandragot
theapple
there.
'-
),.
Sandra
droppedthe
apple
'.
),.
Daniel took
theapple
there.
'/
),/
Sandra
wentto
thehallway.
'0
),0
Daniel
journeyedto
thegarden.
)
Whereis
theapple?
Parallelization
ResultsonbAbI QA1k
LSTM (Westonetal.,2015)
End-to-endMemoryNetworks
(Sukhbaatar etal., 2015)
QRN(2 layers)
QRN (3layers)
# ofTasksPassed Average Accuracy(%)
0
48.7
10
84.8
13
15
90.1
88.7
QualitativeResultsofQRN
ResultsonbAbI QA10k*
# ofTasksPassed AverageAccuracy(%)
End-to-endMemoryNetworks 17
95.8
(Sukhbaatar etal.,2015)
DynamicMemoryNetworks
19
97.2
Improved
(Xiong etal.,2016)
QRN(2layers)
18
96.8
Limitations
• Okay,thereasoningprocessisinteresting…
•But“thisisafakedataset”! (byanonymousreviewers)
ReasoningLevel
GeometryQA
(2015)
StanfordQA
(2016)
bAbI QA
(2016)
DiagramQA
(2016)
End-to-end-ness
SQuAD (StanfordQA)
-
Recentlyreleased:June2016
100k+paragraph-question-answertriples
ParagraphsfrommostpopulararticlesinWikipedia
Answeristhesubphrase oftheparagraph
StanfordQAvsOther“Big”QADatasets
• CNN/DailyMail(Hermannetal.,2015)
• GoogleDeepMind
• Document-Summarypairsfromweb
• Clozetestonsummary(fillintheblank)
• Children’sBookTest(Hilletal.,2015)
• FacebookAIResearch
• ProjectGutenberg:Children’sbooks
• Clozeteston21stsentence
• Takeaway:Clozetest,andcrawleddata
• StanfordQAisdirectquestion,andcarefullycontrolled(turked)
𝑖$ = 0 𝑖' = 1
Model:
Co-Attention
MLP+softmax
LSTM(postprocessing)
Attention
Attention
LSTM(preprocessing)
LSTM(preprocessing)
WordEmbedding
WordEmbedding
BarakObamaisthepresidentoftheU.S.
WholeadstheUnitedStates?
EmbeddingModule
• Wordembeddingisfragileagainst
unseenwords
• Charembeddingcan’teasilylearn
semanticsofwords
• Useboth!
Embeddingvector
concat
Seattle
CNN
+MaxPooling
• CharembeddingasproposedbyYoon
(2015)
Seattle
AttentionMechanism:Motivation
WhileSeattle’sweatherisveryniceinsummer,itsweatherisveryrainy
inwinter,makingitoneofthemostgloomycitiesintheU.S.
Q:Whichcityisgloomyinwinter?
AttentionMechanism
• Theoretically,RNNcanpropagateinformationoveralongdistance
throughitsrecurrentstate
• Practically,thisisverydifficult
• Inherentlyunstablestate,evenusingLSTM(Westonetal.,2014)
• Statesizeisfixed(Bahdanau etal.,2014)
• Attentionprovidesshortcutaccesstodistantinformation
• Co-Attention:questionattendsoncontext,andcontextattendson
question.Similarinspiritto,butfundamentallydifferentfrom,Luet
al.(2016).
Results:Metric
• Eachquestionisansweredby2-5differentpeople(byindicatingthe
answerphraseintheparagraph)
• ExactMatch:theanswerexactlymatchesoneoftheanswers
• F1Score:geometricaverageofprecisionandrecall
• “Theactorswerepaid$1.5milliononaverage.”
• Q:Whowerepaidmorethan$1milliononaverage?
ResultsonDev(Sept.20,2016)
Exact Match(%)
F1 (%)
Baseline (June2016)
39.0
51.0
AttentionandChunking(IBM)
48.0
64.5
MatchLSTMv1(Singapore)
54.8
68.0
MatchLSTMv2(Singapore)
59.4
70.0
NeuralChunker (IBM)
61.8
70.7
Co-Attention (Ours)
62.2
72.6
AttentionVisualization
ReasoningLevel
Howabouthere?
GeometryQA
(2015)
StanfordQA
(2016)
bAbI QA
(2016)
DiagramQA
(2016)
End-to-end-ness
Importantquestions
• Isfullyend-to-endreasoningsystemfeasiblewithreasonableamount
ofdata?à Probablyno
• Howtobalancebetween:
• datasize
• modelpriors(manuallydefinedrules,annotations,etc.)
• Howtonaturallyincorporatemodelpriors(whichmightbestructured
data)intothemodel?
Thankyou!
• [email protected]
• http://seominjoon.github.io