TextClassification1 Prof.SameerSingh CS295:STATISTICALNLP WINTER2017 January12,2017 BasedonslidesfromNathanSchneider,NoahSmith,DanKleinandeveryoneelsetheycopiedfrom. TextClassification1 IntroductiontoTextClassification NaiveBayesClassification CourseProjects CS295:STATISTICALNLP(WINTER2017) 2 TextClassification IntroductiontoTextClassification NaiveBayesClassification CourseProjects CS295:STATISTICALNLP(WINTER2017) 3 SentimentAnalysis Filledwithhorrificdialogue, laughablecharacters,alaughable plot,adreallynointerestingstakes duringthisfilm,"StarWarsEpisode I:ThePhantomMenace"isnotat allwhatIwantedfromafilmthatis supposedtobethehugeopening tothesegueintothefantastic OriginalTrilogy.Thepositives includethescore,thesound… CS295:STATISTICALNLP(WINTER2017) 4 OtherExamples • Reviewsoffilms,restaurants,products:positivevs.negative • Amazonreviewsdata,IMDBreviewsdata • Library-likesubjects(e.g.,theDeweydecimalsystem) • Newsstories:politicsvs.sportsvs.businessvs.technology... • 20newsgroupdata • Authorattributes:identity,politicalstance,gender,age,... • Email:spamvs.not • Gmail:important,promotion,updates,socialmedia,… • Whatisthereadinglevelofapieceoftext? • Automaticgraders? • Howinfluentialwillascientificpaperbe? • Advertisementrecommendations… • Willapieceofproposedlegislationpass? • Identifythepresidentialcandidatefromspeeches • Postrecommendations/Fakenewsdetection • Canmajorlyinfluencetheworld! CS295:STATISTICALNLP(WINTER2017) 5 FormalSetup Classification SupervisedLearning Training Algorithm CS295:STATISTICALNLP(WINTER2017) 6 Evaluation:ContingencyTable CS295:STATISTICALNLP(WINTER2017) 7 Accuracy Problem • Classimbalancehurts.. • Gettingoneclassrightmattersmorethantheother(retrieval) CS295:STATISTICALNLP(WINTER2017) 8 PrecisionandRecall CS295:STATISTICALNLP(WINTER2017) 9 >2Classes? Macro-averagedMeasures Micro-averagedMeasures CS295:STATISTICALNLP(WINTER2017) 10 McNemar’s Test,Psychometrika, (1947) MoretestsinSmithbook,appendixB StatisticalSignificance CS295:STATISTICALNLP(WINTER2017) 11 TextClassification IntroductiontoTextClassification NaiveBayesClassification CourseProjects CS295:STATISTICALNLP(WINTER2017) 12 ClassificationusingJointProb CS295:STATISTICALNLP(WINTER2017) 13 NaïveBayesClassifier Twoassumptions • Wordorderingdoesnotmatter(BagofWords) CS295:STATISTICALNLP(WINTER2017) 14 NaïveBayesClassifier Twoassumptions • Wordorderingdoesnotmatter (BagofWords) • Wordsareindependentgivencategory CS295:STATISTICALNLP(WINTER2017) 15 EstimationofParameters CS295:STATISTICALNLP(WINTER2017) 16 ProblemwithNaïveBayes CS295:STATISTICALNLP(WINTER2017) 17 LinearModels CS295:STATISTICALNLP(WINTER2017) 18 NaïveBayesasaLinearModel CS295:STATISTICALNLP(WINTER2017) 19 TextClassification IntroductiontoTextClassification NaiveBayesClassification CourseProjects CS295:STATISTICALNLP(WINTER2017) 20 GroupProjects GroupsfortheProject • Idealteamsizeis3 • Absolutemaximumof4 • <3ifIapprove(ongoingwork) SubmitFourReports • Firsttworeportsareveryshort(1page) • Finalreportmattersthemost HowdoIknow it’sNLP? • Outputisanyphraseorsentence,definitely! • Inputisanyphraseorsentence • Outputisasequenceorstructure(yes!) • Classification:onlyifoverwordsorphrases • Outputislinguisticclasses/structures(yes!) CS295:STATISTICALNLP(WINTER2017) 21 ScopeofWork Novelty Butnot toomuch! Reuse • NewTask/Data • NewMethod/Models • NewApplicationofExistingMethodtoExistingTask • • • • Youdonothavemuchtime! Aimtohavethewholepipelinedonesoon Keepthe“scale”ofthedatasmall,sub-sampleifneeded Bettertohaveacompletefinishedreport • thangrandideasthatdidnotwork • Youdonothavetocodeeverything • Exploitexistingcode,datasets,libraries,webservices • Donotreinventallthewheels! CS295:STATISTICALNLP(WINTER2017) 22 Example1:What’stheword.. What’sthewordforsomeoneusingpretentiouswords? definitionofaword fromthedictionary MachineLearning (LSTM) lexiphanic theworditself ThiscanbeacoolTwitterbot! Evaluation • Accuracyofguessingtheword,using definitionsfromdifferentdictionary? • Baselines:Google,reversedictionary.org,… CS295:STATISTICALNLP(WINTER2017) 23 Example2:SQuAD Tesla was the fourth of five children. He had an older brother named Dane and three sisters, Milka, Angelina and Marica. Dane was killed in a horse-riding accident when Nikola was five. In 1861, Tesla attended the "Lower" or "Primary" School in Smiljan where he studied German, arithmetic, and religion. In 1862, the Tesla family moved to Gospić, Austrian Empire, where Tesla's father worked as a pastor. Nikola completed "Lower" or "Primary" School, followed by the "Lower Real Gymnasium" or "Normal School." How many siblings did Tesla have? four What was Tesla’s brother’s name? Dane What happened to Dane? killed in a horse-riding accident https://rajpurkar.github.io/SQuAD-explorer/ CS295:STATISTICALNLP(WINTER2017) 24 DatasetsandPapers Data • • • • SearchKaggle,Quora,etc forlargetextdatasets SeerecentpapersinNLPforreleaseddatasets Lookfor“sharedtasks”,“challenges”,workshops Linkstosomeexistingdatasetscomingtowebsitesoon Papers • • • • • NLPConferences:ACL,EMNLP,NAACL MLConferences:NIPS,ICML,ICLR,AAAI Datafocusedvenues:TREC/TAC,SemEval,CONLL Workshopsattheseconferences:interestingdirections Morepaperscomingsoontothewebsite CS295:STATISTICALNLP(WINTER2017) 25 WritingthePitch Team Project Appointment • Teamnameandmembers • Singlesentencedescriptionforeachmember • (approximately)whattheywilldo • Singlesentenceonwhatmakesyourteamdiverse • MotivationandProblemDescription • Plannedapproach:tentative • Evaluation:usually,mostimportant • If1or2,meetmebefore/on January17(o.w.noneed) • Everygrouphastomeetafterwardstodiscusstheproject CS295:STATISTICALNLP(WINTER2017) 26 Upcoming… Homework Project • • • • Homework1isup! Nextlectureswillcontinuewithmoredetails SignupfortheKaggle account(@uci.edu email) Due:January26,2017 • ProjectpitchisdueJanuary23,2017! • Startassemblingteamsnow!(usePiazza) • Startlookingatpapers,data,etc.forideas CS295:STATISTICALNLP(WINTER2017) 27
© Copyright 2026 Paperzz