TowardsEnd-to-EndReasoning forQuestionAnswering MinjoonSeo UniversityofWashington September21,2016 @SKT-Brain Whatisreasoning? SimpleQuestionAnsweringModel Whatis“Hello”in French? Bonjour. Examples • Mostneuralmachinetranslationsystems(Choetal.,2014;Bahdanau etal. , 2014) • Needveryhighhiddenstatesize(~1000) • Noneedtoquerythedatabase(context)à veryfast • Mostdependency,constituencyparser(Chenetal.,2014;Kleinetal.,2003) • Sentimentclassification(Socher etal.,2013) • Classifyingwhetherasentenceispositiveornegative • Mostneuralimageclassificationsystems • Thequestionisalways“Whatisintheimage?” • Mostclassificationsystems SimpleQuestionAnsweringModel Whatis“Hello”in French? Bonjour. Problem:parametricmodelhasfinite,pre-definedcapacity. “Youcan’tevenfitinanentiresentenceintoasinglesmallvector!” QAModelwithContext Whatis“Hello”in French? Bonjour. English French Hello Bonjour Thankyou Merci Context(KnowledgeBase) Examples • WikiQA(Yangetal.,2015) • QASent(Wangetal.,2007) • WebQuestions (Berant etal.,2013) • WikiAnswer (Wikia) • Free917(Cai andYates,2013) • Manydeeplearningmodelswithexternalmemory (e.g.Memory Networks) QAModelwithContext Fly Whatdoesafrogeat? Eats IsA (Amphibian, insect) (Frog, amphibian) (insect,flower) (Fly,insect) Context(KnowledgeBase) Somethingismissing… QAModelwithReasoningCapability Fly Whatdoesafrogeat? Eats IsA FirstOrderLogic (Amphibian, insect) (Frog, amphibian) (insect,flower) (Fly,insect) IsA(A, B)^IsA(C,D)^Eats(B, D)à Eats(A,C) Context(KnowledgeBase) Examples • Semanticparsing • GeoQA (Krishnamurthyetal.,2013;Artzi etal.,2015) • Sciencequestions • AristoChallenge(Clarketal.,2015) • ProcessBank (Berant etal.,2014) • Machinecomprehension • MCTest (Richardsonetal.,2013) “Vague”linebetween factoidQAandreasoningQA • Factoid: • Therequiredinformationisexplicitinthecontext • Themodeloftenneedstohandlelexical/syntacticvariations • Reasoning: • Therequiredinformationmaynot beexplicitinthecontext • Needtocombinemultiplefactstoderivetheanswer • Thereisnoclearlinebetweenthetwo! Ifourobjectiveisto“answer”difficult questions… • Wecantrytomakethemachinemorecapableofreasoning(better model) OR • Wecantrytomakemoreinformationexplicitinthecontext(more data) QAModelwithReasoningCapability Fly Whatdoesafrogeat? Whomakes this? Tellmeit’s not me… Eats IsA FirstOrderLogic (Amphibian, insect) (Frog, amphibian) (insect,flower) (Fly,insect) IsA(A, B)^IsA(C,D)^Eats(B, D)à Eats(A,C) Context(KnowledgeBase) End-to-endQAModelwithReasoning Capability Fly Whatdoesafrogeat? Frogisanexampleofamphibian. Fliesareoneofthemostcommoninsectsaroundus. Insectsaregoodsourcesofproteinforamphibians. … Contextinnaturallanguage Isend-to-endalwaysfeasible? • No.End-to-endsystemsperformpoorlyifeither: • Dataislimited • Logicissupercomplicated • Butnothopeless. ReasoningLevel GeometryQA (2015) StanfordQA (2016) bAbI QA (2016) DiagramQA (2016) End-to-end-ness GeometryQA C In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. What is the length of BD? a) 2 d) 8 b) 4 c) 6 e) 10 B 2 E D 5 O A GeometryQAModel Whatisthelengthof BD? 8 In the diagram at the right, circle O has a radius of 5, and CE = 2. Diameter AC is perpendicular to chord BD. Localcontext First Order Logic Globalcontext Method • Learntomapquestiontologicalform • Learntomaplocalcontexttologicalform • Textà logicalform • Diagramà logicalform • Globalcontextisalreadyformal! • Manually defined • “IfAB=BC,then<CAB=<ACB” • Solveronalllogicalforms • Wecreatedareasonablenumericalsolver Mappingquestion/texttologicalform Text Input Logical form IntriangleABC,lineDEisparallelwith lineAC,DBequals4,ADis8,andDEis5. FindAC. (a)9(b)10(c)12.5(d)15(e)17 IsTriangle(ABC) ∧ B D A E C Parallel(AC, DE) ∧ Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC)) Difficulttodirectlymaptexttoalonglogicalform! Mappingquestion/texttologicalform Text Input Our method IntriangleABC,lineDEisparallelwith lineAC,DBequals4,ADis8,andDEis5. FindAC. (a)9(b)10(c)12.5(d)15(e)17 D A Over-generatedliterals Textscores Diagramscores IsTriangle(ABC) Parallel(AC, DE) Parallel(AC, DB) Equals(LengthOf(DB), 4) Equals(LengthOf(AD), 8) Equals(LengthOf(DE), 5) Equals(4, LengthOf(AD)) … 0.96 0.91 0.74 0.97 0.94 0.94 0.31 … 1.00 0.99 0.02 n/a n/a n/a n/a … Selectedsubset Logical form B IsTriangle(ABC) ∧ Parallel(AC, DE) ∧ Equals(LengthOf(DB), 4) ∧ Equals(LengthOf(AD), 8) ∧ Equals(LengthOf(DE), 5) ∧ Find(LengthOf(AC)) E C Numericalsolver • Translateliteralstonumericequations Literal Equation Equals(LengthOf(AB),d) (Ax-Bx)2+(Ay-By)2-d2 =0 Parallel(AB,CD) (Ax-Bx)(Cy-Dy)-(Ay-By)(Cx-Dx)=0 PointLiesOnLine(B,AC) (Ax-Bx)(By-Cy)-(Ay-By)(Bx-Cx)=0 Perpendicular(AB,CD) (Ax-Bx)(Cx-Dx)+(Ay-By)(Cy-Dy)=0 • Findthesolutiontotheequationsystem • Useoff-the-shelfnumericalminimizers(WalesandDoye, 1997;Kraft,1988) • Numericalsolvercanchoosenot toanswerquestion Dataset • Trainingquestions(67questions,121sentences) • Seoetal.,2014 • Highschoolgeometryquestions • Testquestions (119questions,215sentences) • Wecollectedthem • SAT(UScollegeentranceexam)geometryquestions • Wemanuallyannotatedthetextparseofall questions Results(EMNLP2015) 60 SATScore(%) 50 40 30 20 10 0 Textonly Diagram Rule-based only GeoS Student average ***0.25penaltyforincorrectanswer Limitations • Datasetissmall • Requiredlevelofreasoningisveryhigh • àAlotofmanualefforts(annotations,ruledefinitions,etc.) • àEnd-to-endsystemissimplyhopeless • Collectmoredata? • Changetask? • Curriculumlearning?(Domorehopeful tasksfirst?) ReasoningLevel GeometryQA (2015) StanfordQA (2016) bAbI QA (2016) DiagramQA (2016) End-to-end-ness DiagramQA Q:Theprocessofwater beingheatedbysunand becominggasiscalled A:Evaporation IsDQAsubsetofVQA? • Diagramsandrealimagesareverydifferent • Diagramcomponentsaresimplerthanrealimages • Diagramcontainsalotofinformationinasingleimage • Diagramsarefew(whereasrealimagesarealmostinfinitelymany) Problem Whatcomesbefore secondfeed? 8 Difficulttolatently learnrelationships Strategy Whatdoesafrogeat? Fly DiagramGraph DiagramParsing QuestionAnswering Attentionvisualization Results(ECCV2016) Method Trainingdata Accuracy Random(expected) - 25.00 LSTM+CNN VQA 29.06 LSTM+CNN AI2D 32.90 Ours AI2D 38.47 Limitations • Youneedalotofpriorknowledgetoanswersomequestions! • E.g.“Flyisaninsect”,“Frogisanamphibian” • Youcan’treallycallthisreasoning… • Rathermatchtingalgorithm • Nocomplexinferenceinvolved ReasoningLevel GeometryQA (2015) StanfordQA (2016) bAbI QA (2016) DiagramQA (2016) End-to-end-ness bAbI QA • Westonetal.,2015(Facebook) • Syntheticallygeneratedreasoningstory-questionpairs • 20tasks,1kquestionsineachtask • Eachstorycanbeaslongas200sentences • Requiresreasoningovermultiplesentences • Shouldbetrainedend-to-end(nomanualrulesorexternallanguage resources) • Passedataskifaccuracy>=95% TasksExamples Previouswork • RNN:TestedasbaselinebyWestonetal.(2015) • Performsverypoorly;hiddenstateisinherentlyunstableforlong-termdependency • Softmax attentionmechanism(Sukhbaatar etal.,2015,Xiong etal.,2016) • • • • Usessharedexternalmemorywithsoftmax attentionmechanism Attendondifferentfactsoverseverallayers DMN:CombinesRNNandattentionmechanism Problem: • vanillasoftmax attentioncannotdistinguishbetweensimilarsentencesatdifferenttime steps. • Cannotcapturetimelocalityinformation. Query-regression networks • Namecomesfrom“LogicRegression”(notlinearregression) • Transformingtheoriginalquerytoaneasier-to-answerquery,invectorspace • PureRNN-basedmodel • • • • • completelyinternalmemory Singleunitrecurringovertimeandlayers(simple) AlthoughRNN,doesnotsufferfromlong-termdependencyproblem TakefulladvantageofRNN’scapabilitytomodelsequentialdata Canbeconsideredasusing“sigmoidattention” Query-regressionnetworks *(+, × + *,∅ *( ', 1− ! '( × " )( ', ),- *-∅ ', )-- *-. ∅ ', )-. *-/ ∅ ', )-/ *-0 = 23 garden ', )-0 Where is Sandra? Where is Sandra? Where is Daniel? Where is Daniel? Where is Daniel? *,, *,, *,, *,, *,, ),, Sandragot theapple there. '- ),. Sandra droppedthe apple '. ),. Daniel took theapple there. '/ ),/ Sandra wentto thehallway. '0 ),0 Daniel journeyedto thegarden. ) Whereis theapple? Parallelization ResultsonbAbI QA1k LSTM (Westonetal.,2015) End-to-endMemoryNetworks (Sukhbaatar etal., 2015) QRN(2 layers) QRN (3layers) # ofTasksPassed Average Accuracy(%) 0 48.7 10 84.8 13 15 90.1 88.7 QualitativeResultsofQRN ResultsonbAbI QA10k* # ofTasksPassed AverageAccuracy(%) End-to-endMemoryNetworks 17 95.8 (Sukhbaatar etal.,2015) DynamicMemoryNetworks 19 97.2 Improved (Xiong etal.,2016) QRN(2layers) 18 96.8 Limitations • Okay,thereasoningprocessisinteresting… •But“thisisafakedataset”! (byanonymousreviewers) ReasoningLevel GeometryQA (2015) StanfordQA (2016) bAbI QA (2016) DiagramQA (2016) End-to-end-ness SQuAD (StanfordQA) - Recentlyreleased:June2016 100k+paragraph-question-answertriples ParagraphsfrommostpopulararticlesinWikipedia Answeristhesubphrase oftheparagraph StanfordQAvsOther“Big”QADatasets • CNN/DailyMail(Hermannetal.,2015) • GoogleDeepMind • Document-Summarypairsfromweb • Clozetestonsummary(fillintheblank) • Children’sBookTest(Hilletal.,2015) • FacebookAIResearch • ProjectGutenberg:Children’sbooks • Clozeteston21stsentence • Takeaway:Clozetest,andcrawleddata • StanfordQAisdirectquestion,andcarefullycontrolled(turked) 𝑖$ = 0 𝑖' = 1 Model: Co-Attention MLP+softmax LSTM(postprocessing) Attention Attention LSTM(preprocessing) LSTM(preprocessing) WordEmbedding WordEmbedding BarakObamaisthepresidentoftheU.S. WholeadstheUnitedStates? EmbeddingModule • Wordembeddingisfragileagainst unseenwords • Charembeddingcan’teasilylearn semanticsofwords • Useboth! Embeddingvector concat Seattle CNN +MaxPooling • CharembeddingasproposedbyYoon (2015) Seattle AttentionMechanism:Motivation WhileSeattle’sweatherisveryniceinsummer,itsweatherisveryrainy inwinter,makingitoneofthemostgloomycitiesintheU.S. Q:Whichcityisgloomyinwinter? AttentionMechanism • Theoretically,RNNcanpropagateinformationoveralongdistance throughitsrecurrentstate • Practically,thisisverydifficult • Inherentlyunstablestate,evenusingLSTM(Westonetal.,2014) • Statesizeisfixed(Bahdanau etal.,2014) • Attentionprovidesshortcutaccesstodistantinformation • Co-Attention:questionattendsoncontext,andcontextattendson question.Similarinspiritto,butfundamentallydifferentfrom,Luet al.(2016). Results:Metric • Eachquestionisansweredby2-5differentpeople(byindicatingthe answerphraseintheparagraph) • ExactMatch:theanswerexactlymatchesoneoftheanswers • F1Score:geometricaverageofprecisionandrecall • “Theactorswerepaid$1.5milliononaverage.” • Q:Whowerepaidmorethan$1milliononaverage? ResultsonDev(Sept.20,2016) Exact Match(%) F1 (%) Baseline (June2016) 39.0 51.0 AttentionandChunking(IBM) 48.0 64.5 MatchLSTMv1(Singapore) 54.8 68.0 MatchLSTMv2(Singapore) 59.4 70.0 NeuralChunker (IBM) 61.8 70.7 Co-Attention (Ours) 62.2 72.6 AttentionVisualization ReasoningLevel Howabouthere? GeometryQA (2015) StanfordQA (2016) bAbI QA (2016) DiagramQA (2016) End-to-end-ness Importantquestions • Isfullyend-to-endreasoningsystemfeasiblewithreasonableamount ofdata?à Probablyno • Howtobalancebetween: • datasize • modelpriors(manuallydefinedrules,annotations,etc.) • Howtonaturallyincorporatemodelpriors(whichmightbestructured data)intothemodel? Thankyou! • [email protected] • http://seominjoon.github.io
© Copyright 2026 Paperzz