Generating Sentences from a Continuous Space (3)

Variational
Autoencoders
Write Poetry
(Generating Sentences from a
Continuous Space)
Elsbeth Turcan andFei-TzinLee
PaperbySamBowman,LukeVilnis etal.
2016
Motivation
– Generativemodelsfornaturallanguagesentences
– Machinetranslation
– Imagecaptioning
– Datasetsummarization
– Chatbots
– Etc.
– Wanttocapturehigh-levelfeaturesoftextsuchastopicandstyleandkeep
themconsistentwhengeneratingtext
Related work - RNNLM
– InthewordsofBowmanetal.,“AstandardRNNlanguagemodelpredictseach
wordofasentenceconditionedonthepreviouswordandanevolvinghidden
state.”
– Inotherwords,itonlylooksattherelationshipsbetweenconsecutivewords,
andsodoesnotcontainorobserveanyglobalfeatures
– Butwhatifwewantglobalinformation?
Other related work
– Skip-thought
– Generatesentencecodesinthestyleofwordembeddings topredictcontext
sentences
– Paragraphvector
– Avectorrepresenting theparagraphisincorporatedintosingle-word embeddings
Autoencoders
– TypicallycomposedoftwoRNNs
– ThefirstRNNencodesasentenceintoanintermediatevector
– ThesecondRNNdecodestheintermediaterepresentationbackintoasentence,
ideallythesameastheinput
Variational Autoencoders
(VAEs)
– Regularautoencoders learnonlydiscretemappingsfrompointtopoint
– However,ifwewanttolearnholisticinformationaboutthestructureof
sentences,weneedtobeabletofillsentencespacebetter
– InaVAE,wereplacethehiddenvectorz withaposteriorprobabilitydistribution
q(z|x)conditionedontheinput,andsampleourlatentz fromthatdistribution
ateachstep
– Weensurethatthisdistributionhasatractableformbyenforcingitssimilarity
toadefinedpriordistribution,typicallysomeformofGaussian
Modified loss function
– Theregularautoencoder’s lossfunctionwouldencouragetheVAEtolearn
posteriorsasclosetodiscreteaspossible– inotherwords,Gaussiansthatare
clusteredextremelytightlyaroundtheirmeans
– Inordertoenforceourposterior’ssimilaritytoawell-formedGaussian,we
introduceaKLdivergencetermintoourloss,asbelow:
Reparameterization trick
– Intheoriginalformulation,theencodernetencodesthesentenceintoaprobability
distribution(usuallyGaussian);practicallyspeaking,itencodesthesentenceintothe
parametersofthedistribution(i.e.µandσ)
– However,thisposeschallengesforus
whilebackpropagating:wecan’t
backpropagate overthejumpfrom
µandσ toz,sinceit’srandom
– Solution:extracttherandomnessfrom
theGaussianbyreformulatingitasa
functionofµ,σ,andanotherseparate
randomvariable
FromStackOverflow.
Specific architecture
– Single-layerLSTMforencoderanddecoder
Issues and fixes
– Decodertoostrong,withoutany
limitationsjustdoesn’tusez at
all
– Fix:KLannealing
– Fix:worddropout
Experiments – Language
modeling
– UsedVAEtocreatelanguagemodelsonthePennTreebankdataset,with
RNNLMasbaseline
– Task:trainanLMonthetraining setandhaveitdesignatethetestsetashighly
probable
– RNNLMoutperformedtheVAEinthetraditionalsetting
– However,whenhandicapswereimposedonbothmodels(inputless decoder),
theVAEwassignificantlybetterabletoovercomethem
Experiments – Imputing missing
words
– Task:infermissingwordsinasentencegivensomeknownwords(imputation)
– PlacetheunknownwordsattheendofthesentencefortheRNNLM
– RNNLMandVAEperformedbeamsearch(VAEdecodingbrokenintothree
steps)toproducethemostlikelywordstocompleteasentence
– Preciseevaluationoftheseresultsiscomputationallydifficult
Adversarial evaluation
– Instead,createanadversarialclassifier,trainedtodistinguishrealsentences
fromgeneratedsentences,andscorethemodelonhowwellitfoolsthe
adversary
– Adversarialerrorisdefinedasthegapbetween
chanceaccuracy(50%)andtherealaccuracyofthe
adversary– ideallythiserrorwillbeminimized
Experiments - Other
– SeveralotherexperimentsintheappendixshowedtheVAEtobeapplicabletoa
varietyoftasks
– Textclassification
– Paraphrasedetection
– Question classification
Analysis
– Worddropout
– Keepratetoolow:sentencestructuresuffers
– Keepratetoohigh: nocreativity,stiflesthevariation
– Effectsoncostfunctioncomponents:
Extras: sampling from the
posterior and homotopies
– Samplingfromtheposterior:examplesofsentencesadjacentinsentencespace
– Homotopies:linearinterpolationsinsentencespacebetweenthecodesfortwo
sentences
Even more homotopies
Thanks for listening!
– Anyquestions?