CS175,ProjectinArtificialIntelligence Lecture9:ProgressReports,Presentations,and EvaluationMethods Padhraic Smyth DepartmentofComputerScience BrenSchoolofInformationandComputerSciences UniversityofCalifornia,Irvine 2 WeeklySchedule Week Monday Wednesday Feb27 Lecture:Progress Reports,etc Officehours(nolecture) ProgressReportdue by11pm Mar6 ProjectPresentations(inclass) Submit slidesby1pm ProjectPresentations(inclass) Submit slidesby1pm Mar13 ShortLectureonFinalProjectReports Nolectureor officehours Mar20 FinalReportduetoEEE byTuesdayMarch21st 11pm Note:nofinalexam,justsubmityourfinalreportbyTuesdaynightoffinalsweek PadhraicSmyth,UCIrvine:CS175,Winter2017 3 ProgressReports • Yourproposalshouldbeupto4pageslong – Usetheprogress report template – DuetoEEEby11pmthisWednesday • Reportswillbegradedandreceiveaweightof20%ofyouroverallgrade. • Proposalswillprimarilybegradedon (a)clarity:arethetechnicalideas,experiments, etc.,explained clearly? (b) completeness:areallsectionscovered? (c)creativity:haveyouadded anyofyour ownideastotheproject? (c)progress: how much progress has there been? • Onereportperteam:bothstudentswillgetthesamegradebydefault • Tipsonwritingproposals(Lecture6)arestillhighlyrelevant– seealso samplesofpastproposalsdistributedviaPiazza PadhraicSmyth,UCIrvine:CS175,Winter2017 4 ProjectSlidePresentations • In-class,MondayandWednesdaynextweek • Eachstudentorteamwillmakeonepresentation – Approximately 4minutesperpresentation • Timeaftereachpresentationforquestionsandchangeover – Soabout6minutes x11projects perday=66minutes total • ListandorderofpresentationswillbeannouncedthisWednesday(byemail andonPiazza) • SlidesneedtobeuploadedtotheEEEdropbox by1pmthedayofyour presentation: – theywillbeloadedontotheclassroommachine,noneedtouseyourlaptop. – EitherPDForPowerpoint isfine PadhraicSmyth,UCIrvine:CS175,Winter2017 5 SuggestionsforSlideContent • Slide0:projectname,studentname(s) • Slide1:overviewofproject,e.g., – Investigation ofsentimentanalysisalgorithms forTwitter – Focusonclassifying Tweetsinto 3categories,….. – Usingthefollowing dataset:…. – Usingthefollowing classifiers:…. – Evaluationmethods: …. [Note:youcouldjustshowafigure here,e.g.,anexampleofatweetorasummary ofthedataset,andmakeyour points usingthefigure] • Slide2: – Somemore detailsonyour technicalapproach – figures aregood!, e.g.,ablockdiagram ofyourpipeline PadhraicSmyth,UCIrvine:CS175,Winter2017 6 SuggestionsforSlideContent(continued) • Slide3: – Examplesofinitialresults • Slide4(optional) – Challengesyouhaveencountered • Slide5: – Plansfortheremainderof thequarter – E.g.,milestones youplantocomplete – E.g.,optional additional itemyou willinvestigateifthereistime PadhraicSmyth,UCIrvine:CS175,Winter2017 7 PadhraicSmyth,UCIrvine:CS175,Winter2017 8 PadhraicSmyth,UCIrvine:CS175,Winter2017 9 It'sinterestingthattrainingdatawithstopwordspredictstest databettertrainingdatawithoutstopwords.The reason maybestopwordsconnectsentencestoformthemeaningof thewholereview. PadhraicSmyth,UCIrvine:CS175,Winter2017 10 PadhraicSmyth,UCIrvine:CS175,Winter2017 11 Rotten Tomatoes Movie Review Classification With Machine Learning and Natural Language Processing OverallError FalsePositive Error FalseNegative Error Logistic Regression 12.38% 16.58% 9.54% Multinomial NaiveBayes 13.62% 18.55% 9.59% Figure : Error Rate Results - Binary Classification of Positive vs Negative Phrases ● How we conduct the binary classification: We removed phrases labeled 2, which is neutral. Then, we combined 0 and 1 into one class “Negative”, which we relabeled as 0, and 4 and 5 into one class “Positive”, which we relabeled as 1. ● We were surprised at how well the classifier was able to distinguish between positive and negative phrases. We are looking into creating a multi-level classifier that would try to classify whether a phrase is neutral or not, and then using this to determine whether that phrase is actually positive or negative. PadhraicSmyth,UCIrvine:CS175,Winter2017 12 PadhraicSmyth,UCIrvine:CS175,Winter2017 13 “aww!im sohappycuz youarehappy8Di havetogogoodnightgirll!!youare AMAZING!haveanicenight!loveya?” • Naïvebayes probability.93 • SVMdistance1.75 “Ihatethewaypillsmakemystomachfeellikeit'sboiling.Thisisworsethan beingsick.Also” • Naïvebayes probability.97 • SVMdistance-2.10 Theabovearetwotweetsthatbothclassifierswereveryconfidentabout. Bothweremislabeledbytheuser.Thefirstwasincorrectlylabeledbytheuser asnegativeandthesecondwasincorrectlylabeledaspositive.Thetweets sentimenthoweverisobviouslytheopposite.Wehavefoundatleastafew tweetswiththisproblem PadhraicSmyth,UCIrvine:CS175,Winter2017 14 TwitterTrendDetection • Whilemanyclaimsocialmediatobe negative(eventoxic),thetopwords areoften positive! • “love”,“like”,“good”, “please”,“best” • Negativewordsandobscenities are further down. • Twitterisn’tasbadasyouthink;) PadhraicSmyth,UCIrvine:CS175,Winter2017 15 ParsingDifficulties • WeareusingtheStanfordparsertogenerateparsetreesforourPCFG.For complexsentences,theparseroftenfindssentences withinsentences. Asa result,wegeneratesentenceswithstrangepunctuation(periodsinthe middle,commaattheend,etc.). • Also,somecomplexsentencesareparsedasNP(nounphrase)ratherthanS (sentence).Asaresult,someofourgeneratedsentences arejustsinglenoun phrases,andsomenounphrasesbecomecomplexsentences. BadGeneratedTreeExample: ROOT • S NP VP NP SBAR … S NP VP … … … . . • • Solution:samplethegeneratedRHSusingboth thesymbolanditsdirectparent.Thisletsus tellthedifferencebetweenasentence/NPat therootvs.inthemiddleofthetree. Similartousingahigher-orderMarkovchain Examplesofvalidexpansions: • (ROOT->S)->NPVP. • (SBAR->S)->NPVP • (ROOT->NP)->NP:S. • (S->NP)->DTNN PadhraicSmyth,UCIrvine:CS175,Winter2017 16 PadhraicSmyth,UCIrvine:CS175,Winter2017 17 PadhraicSmyth,UCIrvine:CS175,Winter2017 18 PadhraicSmyth,UCIrvine:CS175,Winter2017 19 EvaluatingClusteringorTopicModels PadhraicSmyth,UCIrvine:CS175,Winter2017 20 EvaluatingAlgorithmswithoutGroundTruth • Groundtruth=“goldstandard”(e.g.,human-generatedclasslabels) • Evaluationofunsupervisedlearningisdifficult – Ifreference/ground truth exists,thismaybehelpful, butitdoesnotnecessarily correspond withhuman judgement • Forsomeproblems,likesummarization,therearetechniques(suchasthe ROUGEmetric)thathavebeendeveloped • Ingeneralyouwilllikelyneedtodosomeresearchandreading – E.g.,“clusterevaluation”or“textsummarization evaluation” PadhraicSmyth,UCIrvine:CS175,Winter2017 21 FindingWordsthatCharacterizeClusters • SaywehaveaclusteringalgorithmthatproducessayK=20clusters • Tosummarizetheseclustersforhumaninterpretationweneedtofindthe topwordsforeachcluster,e.g., – Find allthedocuments for acluster – Listthetop10mostcommon wordsinthecluster? – …but thismight notbeideal– why? PadhraicSmyth,UCIrvine:CS175,Winter2017 22 FindingWordsthatCharacterizeClusters • Insteadwecantrytofindthemost“discriminative”wordsforacluster – i.e.,thewords thataremostdistinctivefor thisclusterversustheothers • Onesimplewaytodothisistotrainasupervisedclassifier(suchaslogistic regression)tolearnhowtoclassifydocumentsinthisclusterversusallthe otherclusters(asonelargeclass). • Thewordswithhighpositiveweights(seeAssignment2)arethewordsthat aremostdiscriminativeforthiscluster PadhraicSmyth,UCIrvine:CS175,Winter2017 23 ClustersandtheirTopWords • Wecannowfindthetop5or10(forexample)mostdiscriminativewords thatcharacterizeeachcluster • E.g., – Cluster1:dog, cat,pet,animal,…. – Cluster2:food, restaurant,menu, service,… – Cluster3:salary,income, check,tax,…. • Canweevaluatehowgoodtheseclustersintermsofhumanjudgement? PadhraicSmyth,UCIrvine:CS175,Winter2017 24 UserStudieswithWordIntrusion Word 1 Word2 Word3 Word4 Cluster1 Apple Orange Banana Melon Cluster2 Job Salary Office Tax Cluster 3 Food Menu Happy Car User1 User2 User3 User4 Saywehave3clustersandthesearethe4bestwordsforeach PadhraicSmyth,UCIrvine:CS175,Winter2017 25 UserStudieswithWordIntrusion Word 1 Word2 Word3 Word4 Cluster1 Apple Cat Banana Melon Cluster2 Job Salary France Tax Cluster 3 Truck Menu Happy Car User1 User2 User3 User4 Nowsayweintroduceonerandomcommonword, inarandomposition,percluster PadhraicSmyth,UCIrvine:CS175,Winter2017 26 UserStudieswithWordIntrusion Word 1 Word2 Word3 Word4 User1 User2 User3 User4 Cluster1 Apple Cat Banana Melon 1 1 1 1 Cluster2 Job Salary France Tax 1 0 1 1 Cluster 3 Happy Menu Happy Car 1 0 0 0 Indicatesiftheuserdetectedthetrue intruderwordornot Wecanusetheuserdetectionstoindicateif (a)theclustersgenerallymakesense(highdetectionrate) (b)ifthereareparticularclustersthatdon’tmakesense PadhraicSmyth,UCIrvine:CS175,Winter2017 27 UserStudiesinGeneral • CanbeusedtocompareoutputsofAlgorithmAandAlgorithmB • Humansaregoodatindicatingpreferences(e.g.,AisbetterthanBorviceversa)– notsogoodatprovidingnumbers • Ideallythestudyneedstobereplicatedacrossmultipleusers(since individualusersmaybebiased,unreliable,etc) • Forexample – Twosummarization algorithms AandB – Eachuserispresented withoutput from Aandoutput fromB • Importantthattheuserdoesnotknowwhichiswhich(“blindtest”) • Originalinput(text)couldalsobeprovidedforcontext – Thiscanbedone onpaper(printouts) orwithaWebform (easierforanalysis) – EachusergetsKpairsofAandBoutputs andstatesKpreferences – Nusers,Kpairs->largervaluesofNandKarebetter(e.g.,N>5,K>10) PadhraicSmyth,UCIrvine:CS175,Winter2017 28 ExampleofDataFromaUserStudy Say5userseachcomparedtheoutputsofAandB20times NumberofTimes NumberofTimes Difference A waspreferred Bwaspreferred betweenAandB User1 12 8 4 User2 15 5 10 User3 9 11 -2 User 4 11 9 2 User5 12 8 4 CanuseWilcoxonsignedtest(forexample) toanalyzethedifferencesstatistically PadhraicSmyth,UCIrvine:CS175,Winter2017 29 EvaluatingClassificationAlgorithms PadhraicSmyth,UCIrvine:CS175,Winter2017 30 ReportingClassificationAccuracy • Alwaysreporttheperformanceofasimplebaseline,e.g., – Themajority classifier(seenextslide) – KNNclassifierwithK=1 • Cross-validationisuseful – Butifyouhaveaverylargeamountof dataasingle randomly selectedtrain/test splitofthedatashould befine PadhraicSmyth,UCIrvine:CS175,Winter2017 31 TheSimpleMajorityClassifier • SaywehaveKclasses,andclasskoccurswithprobabilityp(k),k=1,…K • Letk*bethemostlikelyclass,i.e.,highestvalueofp(k) • The“majorityclassifier”alwayspredictsclassk* – Whatistheaccuracyofthisclassifier? Accuracy=p(k*) – Thisisa“dumb”classifier,butitsalwaysgood toreport thisnumber when reporting classificationresults – Onsomeproblems, ifp(k*) =0.99,itmaybehardtobeatthemajority classifier – Oriftherearemorethan2classes,p(k*) maybequitesmall(e.g.,0.3)andan accuracyof0.5(say)tellsyou thatyouarepicking uppredictivesignal PadhraicSmyth,UCIrvine:CS175,Winter2017 32 Illustrationof5-foldCross-Validation Feature Vector Class Label d1 x 1 d2 x 2 d3 x 1 d4 x 2 d5 x 2 d6 x 1 d7 x 1 d8 x 2 d9 x 1 d10 x 2 PadhraicSmyth,UCIrvine:CS175,Winter2017 33 Illustrationof5-foldCross-Validation Feature Vector Class Label d1 x 1 d2 x 2 d3 x 1 d4 x 2 d5 x 2 d6 x 1 d7 x 1 d8 x 2 d9 x 1 d10 x 2 Fold1 Fold2 Fold3 Fold4 Fold5 Foreachfold, testsetinyellow, trainingsetingreen Theaverageaccuracy acrossthe5testsetsis computed andreferredto asthecross-validated estimateoftheclassifier accuracy PadhraicSmyth,UCIrvine:CS175,Winter2017 34 ComparingAlgorithmswithCross-Validation • Importantthattheyareevaluatedonexactlythesamedata • Correctapproach: – Incross-validation, outerloop isoverthetrain-testfolds, andinner loopisover thedifferent methods • Incorrectapproach: – Outerloop isoverthedifferent methods, andinner loopperforms CV(with different randomtrain/testfolds foreachmethod) • The2nd approachisincorrectbecauseitintroducedadditional(unnecessary) variabilityornoiseintotheprocessofmeasuringthedifferencebetween thealgorithmsormethods PadhraicSmyth,UCIrvine:CS175,Winter2017 35 CommentsonTrain/TestNumbers Consider 2different experiments (e.g.,2different classifiersbeing fittothesamedata) andwerun5-fold crossvalidationon eachandgetthefollowing results. Whatmight weconclude? Classifier2 Classifier1 Fold Train Accuracy Test Accuracy Fold Train Accuracy Test Accuracy 1 82.0 71.0 1 74.0 73.8 2 73.1 56.0 2 73.1 72.0 3 98.9 76.0 3 75.9 73.7 4 67.1 61.0 4 74.5 74.0 5 84.2 68.0 5 73.9 73.2 PadhraicSmyth,UCIrvine:CS175,Winter2017 36 StatisticalSignificance • SaywecompareAlgorithmAversusBon20datasets/trials – Adoesbetter12timesandBdoesbetter8times – Whatshould weconclude? • Wecanusestatisticalhypothesistestingtohelpusdecide – Nullhypothesis: thetwomethods areequallyaccurate(and anyvariationwe seeisjustrandom noise) • Canusedifferenttypesofhypothesistests,eg,. – t-testonthedifference inaccuraciesacrossthe20tests:meaniszeroornot? – Wilcoxonsign test:justlooks atnumber oftimesAorB“won”atest • Canusesameideaforuserstudies:howoftendoestheuserselectAoverB PadhraicSmyth,UCIrvine:CS175,Winter2017 37 GoingBeyondAccuracy • Togetinsightintowhataclassifierisdoing,itsoftenusefultodigdeeper beyondaccuracynumbers • Examples: – Lookatrandom examplesofthedocuments itgotright andgotwrongandseeif thereareanyobvious patterns(e.g.,negation insentiment analysis) – Ifyouhaveaclassifierthatisproducing probabilities, youcouldlook atthetest exampleswithhighest predicted probability ofbeing inclass1but thatare actuallyclass0(andvice-versa) – With2ormoreclassifiers,youcould seeiftheytend tomakeerrors onthe sameexamples(look at2x2tablesfor2classifiers) PadhraicSmyth,UCIrvine:CS175,Winter2017 38 ConfusionMatrices TRUELABELS PREDICTED LABELS Class 1 Class2 Class3 Class1 24 15 0 Class2 5 5 0 Class3 0 1 50 PadhraicSmyth,UCIrvine:CS175,Winter2017 39 ConfusionMatrices TRUELABELS PREDICTED LABELS Class 1 Class2 Class3 Class1 24 15 0 Class2 5 5 0 Class3 0 1 50 Accuracy=(24+5+50)/100=79% PadhraicSmyth,UCIrvine:CS175,Winter2017 40 ConfusionMatrices TRUELABELS PREDICTED LABELS Class 1 Class2 Class3 Class1 24 15 0 Class2 5 5 0 Class3 0 1 50 PadhraicSmyth,UCIrvine:CS175,Winter2017 41 ConfusionMatrices TRUELABELS PREDICTED LABELS PredictedNumber perClass Class 1 Class2 Class3 Class1 24 15 0 39 Class2 5 5 0 10 Class3 0 1 50 51 29 21 50 ActualNumber perClass PadhraicSmyth,UCIrvine:CS175,Winter2017 42 AnalyzingClassProbabilities PadhraicSmyth,UCIrvine:CS175,Winter2017 43 ClassProbabilities • Manyclassifiersproduceclassprobabilitiesasoutputs – E.g.,logisticregression, neuralnetworks, naïveBayes – Foreachtraining ortestexampletheclassifierproduces P(c |x)wherexistheinput vector – ForKclassesthereareKclassprobabilities thatsumto1 PadhraicSmyth,UCIrvine:CS175,Winter2017 44 MethodsforLogisticRegressionClassifierinscikit-learn Fromhttp://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html PadhraicSmyth,UCIrvine:CS175,Winter2017 45 ClassProbabilitiesinscikit-learn Fromhttp://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html PadhraicSmyth,UCIrvine:CS175,Winter2017 46 Settingupa2-ClassProblem(from20NewsgroupsData) n_categories =2 twenty_train,twenty_test =load_news_dataset(n_categories) Thenamesofthe2mostcommonlabelsinthedatasetare: ['rec.sport.hockey','soc.religion.christian'] DimensionsofX_train_counts are(1199,21891) Numberofnon-zeroelementsinX_train_counts:157195 Percentageofnon-zeroelementsinX_train_counts:0.60 Averagenumberofwordtokensperdocument:196.54 Averagenumberofdocumentsperwordtoken:10.76 PadhraicSmyth,UCIrvine:CS175,Winter2017 47 ClassProbabilitiesfromaLogisticRegressionModel #fitalogisticregressionmodelandgeneratetheprobabilities clfLR =LogisticRegression().fit(X_train_tfidf,twenty_train.target) #generatepredictionsonthetestdata predicted_LR =clfLR.predict(X_test_tfidf); #generateclassprobabilitiesonthetestdata prob_test_LR =clfLR.predict_proba(X_test_tfidf); PadhraicSmyth,UCIrvine:CS175,Winter2017 48 ClassProbabilitiesfromaLogisticRegressionModel prob_test_LR array([ [ 0.7041694 , [ 0.50150781, [ 0.74678626, ..., [ 0.45904319, [ 0.09707758, [ 0.86623216, 0.2958306 ], 0.49849219], 0.25321374], 2classprobabilitiesthat sumto1foreveryexample inthetestdataarray 0.54095681], 0.90292242], 0.13376784]] ) PadhraicSmyth,UCIrvine:CS175,Winter2017 49 PadhraicSmyth,UCIrvine:CS175,Winter2017 50 PadhraicSmyth,UCIrvine:CS175,Winter2017 51 PadhraicSmyth,UCIrvine:CS175,Winter2017 52 PadhraicSmyth,UCIrvine:CS175,Winter2017 53 PadhraicSmyth,UCIrvine:CS175,Winter2017 54 UsingClassProbabilities • Canlookathowconfidentaclassifieris – E.g.,whichtestexamplesfor eachclassisitmostconfident on? • Canhelpwithanalyzingwhaterrorsaclassifierismaking – E.g.,find alltestexamplesthatwereincorrectlyclassifiedandlookatthose examplesthathavethehighest probability oftheincorrectclass • Cancompareclassifiers – E.g.,using scatterplots, aswiththeexamplesontheprevious slides • Canrankthetestexamplesandanalyzerankings(nextsection) PadhraicSmyth,UCIrvine:CS175,Winter2017 55 Ranking-basedMetrics PadhraicSmyth,UCIrvine:CS175,Winter2017 56 MetricsbasedonThresholding/Ranking • Considerbinaryclassification • Manyclassifiersproduceascoreindicatingthelikelihoodofclass1 – Classprobability from logisticregression ornaïveBayes – Distancefrom boundary inSVMs • Thedefaultthresholdforthisscore(e.g.,forclassprobabilities)is0.5 • Butwecanvarythisthreshold – E.g.,inspamemailwemaywanttosetahigh-threshold for declaringspam • Wecanalsorankourtestexamplesbytheirscoreandmeasurethequality oftherankedlist(relativetosomethreshold) PadhraicSmyth,UCIrvine:CS175,Winter2017 57 PadhraicSmyth,UCIrvine:CS175,Winter2017 58 BinaryClassification:RankingMetrics • Toevaluateusingarankingmetricwedothefollowing – takeatestsetwithNdatavectorsx – compute ascoreforeachitem,sayP(C=1|x),using our prediction model – sorttheNitemsfrom largesttosmallestscore • Thisgivesus2lists,eachoflengthN – Alistofpredictions, withdecreasing scorevalues – Acorresponding listof“ground truth”values,0’sand1’sfor binaryclasslabels • Avarietyofevaluationmetricscanbecomputedbasedonthese2lists – – – – Precision/recall Receiver-operating characteristics Liftcurves And soon. PadhraicSmyth,UCIrvine:CS175,Winter2017 59 RankingTerminology TrueLabels Positive Positive Negative TP True positive FP False positive FN False negative TN True negative 59 Model’s Predictions Negative Precision=TP/(TP+FP)=ratioofcorrectpositivespredicted tototalpositivepredicted Recall=TP/(TP+FN)=ratioofcorrectpositivespredicted toactualnumber ofpositives Typicallywillgethigh precisionforlowrecall,andlowprecision athigh recall PadhraicSmyth,UCIrvine:CS175,Winter2017 60 Precision Precisionforclassk = fractionpredictedasclasskthatbelongtoclassk = 90%forclass2below True Class1 True Class2 True Class3 Predicted Class1 80 60 10 Predicted Class2 10 90 0 Predicted Class3 10 30 110 PadhraicSmyth,UCIrvine:CS175,Winter2017 61 Recall Recallforclassk =fractionthatbelongtoclasskpredictedasclassk =50%forclass2below True Class1 True Class2 True Class3 Predicted Class1 80 60 10 Predicted Class2 10 90 0 Predicted Class3 10 30 110 PadhraicSmyth,UCIrvine:CS175,Winter2017 62 BinaryClassification:SimpleExampleofPrecisionandRecall • Testsetwith10items,binarylabels,5fromeachclass Scores from theModel TrueLabel 0.97 1 0.95 1 0.93 0 0.81 1 0.55 0 0.28 1 0.17 0 0.15 1 0.10 0 0.03 0 PadhraicSmyth,UCIrvine:CS175,Winter2017 63 Precision-RecallwithHighThreshold • Testsetwith10items,binarylabels,5fromeachclass Threshold Scores from theModel TrueLabel 0.97 1 0.95 1 0.93 0 0.81 1 0.55 0 0.28 1 0.17 0 0.15 1 0.10 0 0.03 0 Precision: Percentageabovethreshold thatareactuallyclass1=2/2=100% Recall: Percentageoftotalnumberofpositive examplesthatareabovethethreshold =2/5=40% PadhraicSmyth,UCIrvine:CS175,Winter2017 64 Precision-RecallwithHighThreshold • Testsetwith10items,binarylabels,5fromeachclass Threshold Scores from theModel TrueLabel 0.97 1 0.95 1 0.93 0 0.81 1 0.55 0 0.28 1 0.17 0 0.15 1 0.10 0 0.03 0 Precision: Percentageabovethreshold thatareactuallyclass1=5/8=62.5% Recall: Percentageoftotalnumberofpositive examplesthatareabovethethreshold =5/5=100% PadhraicSmyth,UCIrvine:CS175,Winter2017 65 HowarePrecisionandRecallusedasMetrics? Precision@Kisoftenusedasametric,whereKisthetopKitemsorthetop K%ofthesortedpredictionlist – E.g.,useful forinformation retrieval,for search Precision-recallcurvescanbeplottedbyvaryingthethreshold Highthreshold(e.g.,0.95)oftenyieldshighprecision,lowrecall Lowthreshold(e.g.,0.1)oftenyieldslowprecision,highrecall PadhraicSmyth,UCIrvine:CS175,Winter2017 66 ComputingPrecision-RecallCurves Rankourtestdocumentsbypredictedprobabilityofbelongingtoclass1 – Sorttherankedlist – Fort=1….N(N=totalnumber of testdocuments) • Selecttoptdocumentsonthelist • Classifydocumentsaboveasclass1,documentsbelowasnotclass1 • Computeprecisionandrecallnumberforeacht – Generatesasetofpairsof(precision, recall)valueswecanplotas(x,y) – Perfectperformance: precision =1,recall=1 • Requiresthatallthepositiveexamplesgethigherscoresthanallnegatives PadhraicSmyth,UCIrvine:CS175,Winter2017 67 Precision-RecallExample:ReutersData,ClassLabel=CrudeOil 1 0.9 0.8 0.7 Precision 0.6 LSVM DecisionTree NaïveBayes Rocchio 0.5 0.4 0.3 0.2 0.1 0 0 0.2 0.4 0.6 0.8 1 Recall FigurefromS. Dumais (1998) PadhraicSmyth,UCIrvine:CS175,Winter2017 68 AdditionalAspectsofDocumentClassifiers PadhraicSmyth,UCIrvine:CS175,Winter2017 69 AlgorithmParameters Manymachinelearning/AIalgorithmshaveparametersthatcanbe“tuned” Example:classificationalgorithms – Logisticregression • Objectivefunction=errorfunction+ λ *sumofweightssquared • Hereλ isaparameterofthemethodthatcontrolstheamountofregularization – Support vectormachines • TheparameterCcontrolsregularization Defaultsettings – Thesoftwareyou areusing(e.g.,from scikit-learn)mayhavedefaultvaluesfor theseparameters – Notethatthesearenotnecessarilyoptimal – Youmaywanttoexperiment withdifferent valuesviacross-validation PadhraicSmyth,UCIrvine:CS175,Winter2017 70 Sensitivity • Accuracyontestdatamaybesensitivetothevaluesoftheparameters – E.g.,withsmallerdatasetsmoreregularization maygivebetterresults • Howcanwefigureoutwhatvaluesworkbest? • Useavalidationset(orcross-validation)fromwithinyourtrainingdata – Settheparametertodifferent values(above andbelowdefaultvalue) • Initiallymaywanttosearchonalog-scaletogetageneralideaofthescale – e.g.,tryλ valuesof0.001,0.01,0.1,1,10,100 – Foreachvalue,trainandtestaclassifier – Lookat(e.g.,plot) thetestaccuracyasafunction oftheparametervalue • Canthenrerunonafinerscaletozero-in(optional) – Thisisknownasgrid search PadhraicSmyth,UCIrvine:CS175,Winter2017 71 LearningCurves From:http://scikit-learn.org/stable/modules/learning_curve.html PadhraicSmyth,UCIrvine:CS175,Winter2017 72 LearningCurves From:http://scikit-learn.org/stable/modules/learning_curve.html PadhraicSmyth,UCIrvine:CS175,Winter2017 73 FeatureSelection • CanbeparticularlyusefulforNaïveBayesmodels – Logisticregression (withregularization) andSVMsareoftenrobust enough to notneedfeatureselection • Selectthemostcommonfeatures,e.g.,mostcommonwords? – Thesearenot necessarilythemostdiscriminativewords PadhraicSmyth,UCIrvine:CS175,Winter2017 74 FeatureSelection • CanbeparticularlyusefulforNaïveBayesmodels – Logisticregression (withregularization) andSVMsareoftenrobust enough to notneedfeatureselection • Selectthemostcommonfeatures,e.g.,mostcommonwords? – Thesearenot necessarilythemostdiscriminativewords • Betterapproach:selecttheKmostdiscriminativefeatures – Simple method • Computehowwelleachfeaturedoesonitsown,selecttopK • e.g.,usethemutualinformationmeasure(seeChapter13,Sect5,inManningtext) – Greedymethod: • • • • Trainaclassifieroneachfeatureonitsown,selectthefeaturethatdoesbest Thenfromtherest,trainD– 1classifiers,addingeachotherfeatureoneatatime Pickthebest ContinueuntilKfeatureshavebeenadded PadhraicSmyth,UCIrvine:CS175,Winter2017 75 AdditionalTopics • Using“documentzones” – Weighting of termsdifferent zones, e.g,.upweighting wordsintitles,abstracts – Separatefeatures/term-counts fordifferent sectionsof adocument • E.g.,patentabstractvclaims • Featuresbeyondterm-countsortermpresence – UsingTF-IDFfor term-weighting – Grouping termsbasedonprior knowledge (e.g.,listsofpositive andnegative sentiment words) – Usingparts-of-speechinformation (e.g.,number ofadjectives) • Semi-supervisedlearning – Probabilistic algorithms (suchasExpectation-Maximization) thatcanlearnfrom both labeledandunlabeled examples – Stillmoreofaresearchtopicthanapracticaltechnique [SeeManning etal,Chapter15,Section 15.3forfurther discussion] PadhraicSmyth,UCIrvine:CS175,Winter2017 76 Advice:Itsnotalwaystheaboutthealgorithm… • Itistemptinginapredictivemodelingprojecttospendalotoftime experimentingwiththealgorithms – E.g.,different neural networkarchitecturesandlearning heuristics • Butinpracticewhichvariablesareusedmaybemuchmoreimportantthan whichalgorithm • Soausefulstrategymaybetorethinkwhatthevariables/attributesare. Theremightbeanothervariableortwoyoucanaddtoyourproblemthat willincreaseclassificationaccuracybyalargemargin. PadhraicSmyth,UCIrvine:CS175,Winter2017 77 ExperimentsComparingDifferentClassifiers FromZhangandOles,Textcategorizationbasedonregularizedlinearclassificationmethods PadhraicSmyth,UCIrvine:CS175,Winter2017 Sec. 15.3.1 78 AccuracyasaFunctionofDataSize Withenough datathe choiceofclassifiermaynot mattermuch Scaling to very very large corpora for natural language disambiguation Banko and Brill, Annual Meeting of the ACL, 2001 PadhraicSmyth,UCIrvine:CS175,Winter2017 79 ProjectSlidePresentations • In-class,MondayandWednesdaynextweek • Eachstudentorteamwillmakeonepresentation – Approximately 4minutesperpresentation • Timeaftereachpresentationforquestionsandchangeover – Soabout6minutes x11projects perday=66minutes total • ListandorderofpresentationswillbeannouncedthisWednesday(byemail andonPiazza) • SlidesneedtobeuploadedtotheEEEdropbox by1pmthedayofyour presentation: – theywillbeloadedontotheclassroommachine,noneedtouseyourlaptop. – EitherPDForPowerpoint isfine PadhraicSmyth,UCIrvine:CS175,Winter2017 80 WeeklySchedule Week Monday Wednesday Feb27 Lecture:Progress Reports,etc Officehours(nolecture) ProgressReportdue by11pm Mar6 ProjectPresentations(inclass) Submit slidesby1pm ProjectPresentations(inclass) Submit slidesby1pm Mar13 ShortLectureonFinalProjectReports Nolectureor officehours Mar20 FinalReportduetoEEE byTuesdayMarch21st 11pm Note:nofinalexam,justsubmityourfinalreportbyTuesdaynightoffinalsweek PadhraicSmyth,UCIrvine:CS175,Winter2017
© Copyright 2026 Paperzz