CS 175, Project in Artificial Intelligence Lecture 9

CS175,ProjectinArtificialIntelligence
Lecture9:ProgressReports,Presentations,and
EvaluationMethods
Padhraic Smyth
DepartmentofComputerScience
BrenSchoolofInformationandComputerSciences
UniversityofCalifornia,Irvine
2
WeeklySchedule
Week
Monday
Wednesday
Feb27
Lecture:Progress Reports,etc
Officehours(nolecture)
ProgressReportdue by11pm
Mar6
ProjectPresentations(inclass)
Submit slidesby1pm
ProjectPresentations(inclass)
Submit slidesby1pm
Mar13
ShortLectureonFinalProjectReports
Nolectureor officehours
Mar20
FinalReportduetoEEE byTuesdayMarch21st 11pm
Note:nofinalexam,justsubmityourfinalreportbyTuesdaynightoffinalsweek
PadhraicSmyth,UCIrvine:CS175,Winter2017
3
ProgressReports
• Yourproposalshouldbeupto4pageslong
– Usetheprogress report template
– DuetoEEEby11pmthisWednesday
• Reportswillbegradedandreceiveaweightof20%ofyouroverallgrade.
• Proposalswillprimarilybegradedon
(a)clarity:arethetechnicalideas,experiments, etc.,explained clearly?
(b) completeness:areallsectionscovered?
(c)creativity:haveyouadded anyofyour ownideastotheproject?
(c)progress: how much progress has there been?
• Onereportperteam:bothstudentswillgetthesamegradebydefault
• Tipsonwritingproposals(Lecture6)arestillhighlyrelevant– seealso
samplesofpastproposalsdistributedviaPiazza
PadhraicSmyth,UCIrvine:CS175,Winter2017
4
ProjectSlidePresentations
• In-class,MondayandWednesdaynextweek
• Eachstudentorteamwillmakeonepresentation
– Approximately 4minutesperpresentation
• Timeaftereachpresentationforquestionsandchangeover
– Soabout6minutes x11projects perday=66minutes total
• ListandorderofpresentationswillbeannouncedthisWednesday(byemail
andonPiazza)
• SlidesneedtobeuploadedtotheEEEdropbox by1pmthedayofyour
presentation:
– theywillbeloadedontotheclassroommachine,noneedtouseyourlaptop.
– EitherPDForPowerpoint isfine
PadhraicSmyth,UCIrvine:CS175,Winter2017
5
SuggestionsforSlideContent
• Slide0:projectname,studentname(s)
• Slide1:overviewofproject,e.g.,
– Investigation ofsentimentanalysisalgorithms forTwitter
– Focusonclassifying Tweetsinto 3categories,…..
– Usingthefollowing dataset:….
– Usingthefollowing classifiers:….
– Evaluationmethods: ….
[Note:youcouldjustshowafigure here,e.g.,anexampleofatweetorasummary
ofthedataset,andmakeyour points usingthefigure]
• Slide2:
– Somemore detailsonyour technicalapproach
– figures aregood!, e.g.,ablockdiagram ofyourpipeline
PadhraicSmyth,UCIrvine:CS175,Winter2017
6
SuggestionsforSlideContent(continued)
• Slide3:
– Examplesofinitialresults
• Slide4(optional)
– Challengesyouhaveencountered
• Slide5:
– Plansfortheremainderof thequarter
– E.g.,milestones youplantocomplete
– E.g.,optional additional itemyou willinvestigateifthereistime
PadhraicSmyth,UCIrvine:CS175,Winter2017
7
PadhraicSmyth,UCIrvine:CS175,Winter2017
8
PadhraicSmyth,UCIrvine:CS175,Winter2017
9
It'sinterestingthattrainingdatawithstopwordspredictstest
databettertrainingdatawithoutstopwords.The reason
maybestopwordsconnectsentencestoformthemeaningof
thewholereview.
PadhraicSmyth,UCIrvine:CS175,Winter2017
10
PadhraicSmyth,UCIrvine:CS175,Winter2017
11
Rotten Tomatoes Movie Review Classification With Machine Learning and Natural
Language Processing
OverallError
FalsePositive
Error
FalseNegative
Error
Logistic
Regression
12.38%
16.58%
9.54%
Multinomial
NaiveBayes
13.62%
18.55%
9.59%
Figure : Error Rate Results - Binary Classification of Positive vs Negative Phrases
●
How we conduct the binary classification:
We removed phrases labeled 2, which is neutral. Then, we combined 0 and 1 into
one class “Negative”, which we relabeled as 0, and 4 and 5 into one class “Positive”,
which we relabeled as 1.
●
We were surprised at how well the classifier was able to distinguish between positive
and negative phrases. We are looking into creating a multi-level classifier that would
try to classify whether a phrase is neutral or not, and then using this to determine
whether that phrase is actually positive or negative.
PadhraicSmyth,UCIrvine:CS175,Winter2017
12
PadhraicSmyth,UCIrvine:CS175,Winter2017
13
“aww!im sohappycuz youarehappy8Di havetogogoodnightgirll!!youare
AMAZING!haveanicenight!loveya?”
• Naïvebayes probability.93
• SVMdistance1.75
“Ihatethewaypillsmakemystomachfeellikeit'sboiling.Thisisworsethan
beingsick.Also”
• Naïvebayes probability.97
• SVMdistance-2.10
Theabovearetwotweetsthatbothclassifierswereveryconfidentabout.
Bothweremislabeledbytheuser.Thefirstwasincorrectlylabeledbytheuser
asnegativeandthesecondwasincorrectlylabeledaspositive.Thetweets
sentimenthoweverisobviouslytheopposite.Wehavefoundatleastafew
tweetswiththisproblem
PadhraicSmyth,UCIrvine:CS175,Winter2017
14
TwitterTrendDetection
•
Whilemanyclaimsocialmediatobe
negative(eventoxic),thetopwords
areoften positive!
•
“love”,“like”,“good”, “please”,“best”
•
Negativewordsandobscenities are
further down.
•
Twitterisn’tasbadasyouthink;)
PadhraicSmyth,UCIrvine:CS175,Winter2017
15
ParsingDifficulties
• WeareusingtheStanfordparsertogenerateparsetreesforourPCFG.For
complexsentences,theparseroftenfindssentences withinsentences. Asa
result,wegeneratesentenceswithstrangepunctuation(periodsinthe
middle,commaattheend,etc.).
• Also,somecomplexsentencesareparsedasNP(nounphrase)ratherthanS
(sentence).Asaresult,someofourgeneratedsentences arejustsinglenoun
phrases,andsomenounphrasesbecomecomplexsentences.
BadGeneratedTreeExample:
ROOT
•
S
NP
VP
NP
SBAR
…
S
NP
VP
…
…
…
.
.
•
•
Solution:samplethegeneratedRHSusingboth
thesymbolanditsdirectparent.Thisletsus
tellthedifferencebetweenasentence/NPat
therootvs.inthemiddleofthetree.
Similartousingahigher-orderMarkovchain
Examplesofvalidexpansions:
• (ROOT->S)->NPVP.
• (SBAR->S)->NPVP
• (ROOT->NP)->NP:S.
• (S->NP)->DTNN
PadhraicSmyth,UCIrvine:CS175,Winter2017
16
PadhraicSmyth,UCIrvine:CS175,Winter2017
17
PadhraicSmyth,UCIrvine:CS175,Winter2017
18
PadhraicSmyth,UCIrvine:CS175,Winter2017
19
EvaluatingClusteringorTopicModels
PadhraicSmyth,UCIrvine:CS175,Winter2017
20
EvaluatingAlgorithmswithoutGroundTruth
• Groundtruth=“goldstandard”(e.g.,human-generatedclasslabels)
• Evaluationofunsupervisedlearningisdifficult
– Ifreference/ground truth exists,thismaybehelpful, butitdoesnotnecessarily
correspond withhuman judgement
• Forsomeproblems,likesummarization,therearetechniques(suchasthe
ROUGEmetric)thathavebeendeveloped
• Ingeneralyouwilllikelyneedtodosomeresearchandreading
– E.g.,“clusterevaluation”or“textsummarization evaluation”
PadhraicSmyth,UCIrvine:CS175,Winter2017
21
FindingWordsthatCharacterizeClusters
• SaywehaveaclusteringalgorithmthatproducessayK=20clusters
• Tosummarizetheseclustersforhumaninterpretationweneedtofindthe
topwordsforeachcluster,e.g.,
– Find allthedocuments for acluster
– Listthetop10mostcommon wordsinthecluster?
– …but thismight notbeideal– why?
PadhraicSmyth,UCIrvine:CS175,Winter2017
22
FindingWordsthatCharacterizeClusters
• Insteadwecantrytofindthemost“discriminative”wordsforacluster
– i.e.,thewords thataremostdistinctivefor thisclusterversustheothers
• Onesimplewaytodothisistotrainasupervisedclassifier(suchaslogistic
regression)tolearnhowtoclassifydocumentsinthisclusterversusallthe
otherclusters(asonelargeclass).
• Thewordswithhighpositiveweights(seeAssignment2)arethewordsthat
aremostdiscriminativeforthiscluster
PadhraicSmyth,UCIrvine:CS175,Winter2017
23
ClustersandtheirTopWords
• Wecannowfindthetop5or10(forexample)mostdiscriminativewords
thatcharacterizeeachcluster
• E.g.,
– Cluster1:dog, cat,pet,animal,….
– Cluster2:food, restaurant,menu, service,…
– Cluster3:salary,income, check,tax,….
• Canweevaluatehowgoodtheseclustersintermsofhumanjudgement?
PadhraicSmyth,UCIrvine:CS175,Winter2017
24
UserStudieswithWordIntrusion
Word 1
Word2
Word3 Word4
Cluster1
Apple
Orange
Banana
Melon
Cluster2
Job
Salary
Office
Tax
Cluster 3
Food
Menu
Happy
Car
User1
User2
User3
User4
Saywehave3clustersandthesearethe4bestwordsforeach
PadhraicSmyth,UCIrvine:CS175,Winter2017
25
UserStudieswithWordIntrusion
Word 1
Word2
Word3 Word4
Cluster1
Apple
Cat
Banana
Melon
Cluster2
Job
Salary
France
Tax
Cluster 3
Truck
Menu
Happy
Car
User1
User2
User3
User4
Nowsayweintroduceonerandomcommonword,
inarandomposition,percluster
PadhraicSmyth,UCIrvine:CS175,Winter2017
26
UserStudieswithWordIntrusion
Word 1
Word2
Word3 Word4
User1
User2
User3
User4
Cluster1
Apple
Cat
Banana
Melon
1
1
1
1
Cluster2
Job
Salary
France
Tax
1
0
1
1
Cluster 3
Happy
Menu
Happy
Car
1
0
0
0
Indicatesiftheuserdetectedthetrue
intruderwordornot
Wecanusetheuserdetectionstoindicateif
(a)theclustersgenerallymakesense(highdetectionrate)
(b)ifthereareparticularclustersthatdon’tmakesense
PadhraicSmyth,UCIrvine:CS175,Winter2017
27
UserStudiesinGeneral
• CanbeusedtocompareoutputsofAlgorithmAandAlgorithmB
• Humansaregoodatindicatingpreferences(e.g.,AisbetterthanBorviceversa)– notsogoodatprovidingnumbers
• Ideallythestudyneedstobereplicatedacrossmultipleusers(since
individualusersmaybebiased,unreliable,etc)
• Forexample
– Twosummarization algorithms AandB
– Eachuserispresented withoutput from Aandoutput fromB
• Importantthattheuserdoesnotknowwhichiswhich(“blindtest”)
• Originalinput(text)couldalsobeprovidedforcontext
– Thiscanbedone onpaper(printouts) orwithaWebform (easierforanalysis)
– EachusergetsKpairsofAandBoutputs andstatesKpreferences
– Nusers,Kpairs->largervaluesofNandKarebetter(e.g.,N>5,K>10)
PadhraicSmyth,UCIrvine:CS175,Winter2017
28
ExampleofDataFromaUserStudy
Say5userseachcomparedtheoutputsofAandB20times
NumberofTimes NumberofTimes
Difference
A waspreferred Bwaspreferred betweenAandB
User1
12
8
4
User2
15
5
10
User3
9
11
-2
User 4
11
9
2
User5
12
8
4
CanuseWilcoxonsignedtest(forexample)
toanalyzethedifferencesstatistically
PadhraicSmyth,UCIrvine:CS175,Winter2017
29
EvaluatingClassificationAlgorithms
PadhraicSmyth,UCIrvine:CS175,Winter2017
30
ReportingClassificationAccuracy
• Alwaysreporttheperformanceofasimplebaseline,e.g.,
– Themajority classifier(seenextslide)
– KNNclassifierwithK=1
• Cross-validationisuseful
– Butifyouhaveaverylargeamountof dataasingle randomly selectedtrain/test
splitofthedatashould befine
PadhraicSmyth,UCIrvine:CS175,Winter2017
31
TheSimpleMajorityClassifier
• SaywehaveKclasses,andclasskoccurswithprobabilityp(k),k=1,…K
• Letk*bethemostlikelyclass,i.e.,highestvalueofp(k)
• The“majorityclassifier”alwayspredictsclassk*
– Whatistheaccuracyofthisclassifier?
Accuracy=p(k*)
– Thisisa“dumb”classifier,butitsalwaysgood toreport thisnumber when
reporting classificationresults
– Onsomeproblems, ifp(k*) =0.99,itmaybehardtobeatthemajority classifier
– Oriftherearemorethan2classes,p(k*) maybequitesmall(e.g.,0.3)andan
accuracyof0.5(say)tellsyou thatyouarepicking uppredictivesignal
PadhraicSmyth,UCIrvine:CS175,Winter2017
32
Illustrationof5-foldCross-Validation
Feature
Vector
Class
Label
d1
x
1
d2
x
2
d3
x
1
d4
x
2
d5
x
2
d6
x
1
d7
x
1
d8
x
2
d9
x
1
d10
x
2
PadhraicSmyth,UCIrvine:CS175,Winter2017
33
Illustrationof5-foldCross-Validation
Feature
Vector
Class
Label
d1
x
1
d2
x
2
d3
x
1
d4
x
2
d5
x
2
d6
x
1
d7
x
1
d8
x
2
d9
x
1
d10
x
2
Fold1
Fold2
Fold3
Fold4
Fold5
Foreachfold,
testsetinyellow,
trainingsetingreen
Theaverageaccuracy
acrossthe5testsetsis
computed andreferredto
asthecross-validated
estimateoftheclassifier
accuracy
PadhraicSmyth,UCIrvine:CS175,Winter2017
34
ComparingAlgorithmswithCross-Validation
• Importantthattheyareevaluatedonexactlythesamedata
• Correctapproach:
– Incross-validation, outerloop isoverthetrain-testfolds, andinner loopisover
thedifferent methods
• Incorrectapproach:
– Outerloop isoverthedifferent methods, andinner loopperforms CV(with
different randomtrain/testfolds foreachmethod)
• The2nd approachisincorrectbecauseitintroducedadditional(unnecessary)
variabilityornoiseintotheprocessofmeasuringthedifferencebetween
thealgorithmsormethods
PadhraicSmyth,UCIrvine:CS175,Winter2017
35
CommentsonTrain/TestNumbers
Consider 2different experiments (e.g.,2different classifiersbeing fittothesamedata)
andwerun5-fold crossvalidationon eachandgetthefollowing results.
Whatmight weconclude?
Classifier2
Classifier1
Fold
Train
Accuracy
Test
Accuracy
Fold
Train
Accuracy
Test
Accuracy
1
82.0
71.0
1
74.0
73.8
2
73.1
56.0
2
73.1
72.0
3
98.9
76.0
3
75.9
73.7
4
67.1
61.0
4
74.5
74.0
5
84.2
68.0
5
73.9
73.2
PadhraicSmyth,UCIrvine:CS175,Winter2017
36
StatisticalSignificance
• SaywecompareAlgorithmAversusBon20datasets/trials
– Adoesbetter12timesandBdoesbetter8times
– Whatshould weconclude?
• Wecanusestatisticalhypothesistestingtohelpusdecide
– Nullhypothesis: thetwomethods areequallyaccurate(and anyvariationwe
seeisjustrandom noise)
• Canusedifferenttypesofhypothesistests,eg,.
– t-testonthedifference inaccuraciesacrossthe20tests:meaniszeroornot?
– Wilcoxonsign test:justlooks atnumber oftimesAorB“won”atest
• Canusesameideaforuserstudies:howoftendoestheuserselectAoverB
PadhraicSmyth,UCIrvine:CS175,Winter2017
37
GoingBeyondAccuracy
• Togetinsightintowhataclassifierisdoing,itsoftenusefultodigdeeper
beyondaccuracynumbers
• Examples:
– Lookatrandom examplesofthedocuments itgotright andgotwrongandseeif
thereareanyobvious patterns(e.g.,negation insentiment analysis)
– Ifyouhaveaclassifierthatisproducing probabilities, youcouldlook atthetest
exampleswithhighest predicted probability ofbeing inclass1but thatare
actuallyclass0(andvice-versa)
– With2ormoreclassifiers,youcould seeiftheytend tomakeerrors onthe
sameexamples(look at2x2tablesfor2classifiers)
PadhraicSmyth,UCIrvine:CS175,Winter2017
38
ConfusionMatrices
TRUELABELS
PREDICTED
LABELS
Class 1
Class2
Class3
Class1
24
15
0
Class2
5
5
0
Class3
0
1
50
PadhraicSmyth,UCIrvine:CS175,Winter2017
39
ConfusionMatrices
TRUELABELS
PREDICTED
LABELS
Class 1
Class2
Class3
Class1
24
15
0
Class2
5
5
0
Class3
0
1
50
Accuracy=(24+5+50)/100=79%
PadhraicSmyth,UCIrvine:CS175,Winter2017
40
ConfusionMatrices
TRUELABELS
PREDICTED
LABELS
Class 1
Class2
Class3
Class1
24
15
0
Class2
5
5
0
Class3
0
1
50
PadhraicSmyth,UCIrvine:CS175,Winter2017
41
ConfusionMatrices
TRUELABELS
PREDICTED
LABELS
PredictedNumber
perClass
Class 1
Class2
Class3
Class1
24
15
0
39
Class2
5
5
0
10
Class3
0
1
50
51
29
21
50
ActualNumber
perClass
PadhraicSmyth,UCIrvine:CS175,Winter2017
42
AnalyzingClassProbabilities
PadhraicSmyth,UCIrvine:CS175,Winter2017
43
ClassProbabilities
• Manyclassifiersproduceclassprobabilitiesasoutputs
– E.g.,logisticregression, neuralnetworks, naïveBayes
– Foreachtraining ortestexampletheclassifierproduces
P(c |x)wherexistheinput vector
– ForKclassesthereareKclassprobabilities thatsumto1
PadhraicSmyth,UCIrvine:CS175,Winter2017
44
MethodsforLogisticRegressionClassifierinscikit-learn
Fromhttp://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
PadhraicSmyth,UCIrvine:CS175,Winter2017
45
ClassProbabilitiesinscikit-learn
Fromhttp://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
PadhraicSmyth,UCIrvine:CS175,Winter2017
46
Settingupa2-ClassProblem(from20NewsgroupsData)
n_categories =2
twenty_train,twenty_test =load_news_dataset(n_categories)
Thenamesofthe2mostcommonlabelsinthedatasetare:
['rec.sport.hockey','soc.religion.christian']
DimensionsofX_train_counts are(1199,21891)
Numberofnon-zeroelementsinX_train_counts:157195
Percentageofnon-zeroelementsinX_train_counts:0.60
Averagenumberofwordtokensperdocument:196.54
Averagenumberofdocumentsperwordtoken:10.76
PadhraicSmyth,UCIrvine:CS175,Winter2017
47
ClassProbabilitiesfromaLogisticRegressionModel
#fitalogisticregressionmodelandgeneratetheprobabilities
clfLR =LogisticRegression().fit(X_train_tfidf,twenty_train.target)
#generatepredictionsonthetestdata
predicted_LR =clfLR.predict(X_test_tfidf);
#generateclassprobabilitiesonthetestdata
prob_test_LR =clfLR.predict_proba(X_test_tfidf);
PadhraicSmyth,UCIrvine:CS175,Winter2017
48
ClassProbabilitiesfromaLogisticRegressionModel
prob_test_LR
array([ [ 0.7041694 ,
[ 0.50150781,
[ 0.74678626,
...,
[ 0.45904319,
[ 0.09707758,
[ 0.86623216,
0.2958306 ],
0.49849219],
0.25321374],
2classprobabilitiesthat
sumto1foreveryexample
inthetestdataarray
0.54095681],
0.90292242],
0.13376784]] )
PadhraicSmyth,UCIrvine:CS175,Winter2017
49
PadhraicSmyth,UCIrvine:CS175,Winter2017
50
PadhraicSmyth,UCIrvine:CS175,Winter2017
51
PadhraicSmyth,UCIrvine:CS175,Winter2017
52
PadhraicSmyth,UCIrvine:CS175,Winter2017
53
PadhraicSmyth,UCIrvine:CS175,Winter2017
54
UsingClassProbabilities
• Canlookathowconfidentaclassifieris
– E.g.,whichtestexamplesfor eachclassisitmostconfident on?
• Canhelpwithanalyzingwhaterrorsaclassifierismaking
– E.g.,find alltestexamplesthatwereincorrectlyclassifiedandlookatthose
examplesthathavethehighest probability oftheincorrectclass
• Cancompareclassifiers
– E.g.,using scatterplots, aswiththeexamplesontheprevious slides
• Canrankthetestexamplesandanalyzerankings(nextsection)
PadhraicSmyth,UCIrvine:CS175,Winter2017
55
Ranking-basedMetrics
PadhraicSmyth,UCIrvine:CS175,Winter2017
56
MetricsbasedonThresholding/Ranking
• Considerbinaryclassification
• Manyclassifiersproduceascoreindicatingthelikelihoodofclass1
– Classprobability from logisticregression ornaïveBayes
– Distancefrom boundary inSVMs
• Thedefaultthresholdforthisscore(e.g.,forclassprobabilities)is0.5
• Butwecanvarythisthreshold
– E.g.,inspamemailwemaywanttosetahigh-threshold for declaringspam
• Wecanalsorankourtestexamplesbytheirscoreandmeasurethequality
oftherankedlist(relativetosomethreshold)
PadhraicSmyth,UCIrvine:CS175,Winter2017
57
PadhraicSmyth,UCIrvine:CS175,Winter2017
58
BinaryClassification:RankingMetrics
• Toevaluateusingarankingmetricwedothefollowing
– takeatestsetwithNdatavectorsx
– compute ascoreforeachitem,sayP(C=1|x),using our prediction model
– sorttheNitemsfrom largesttosmallestscore
• Thisgivesus2lists,eachoflengthN
– Alistofpredictions, withdecreasing scorevalues
– Acorresponding listof“ground truth”values,0’sand1’sfor binaryclasslabels
• Avarietyofevaluationmetricscanbecomputedbasedonthese2lists
–
–
–
–
Precision/recall
Receiver-operating characteristics
Liftcurves
And soon.
PadhraicSmyth,UCIrvine:CS175,Winter2017
59
RankingTerminology
TrueLabels
Positive
Positive
Negative
TP
True positive
FP
False positive
FN
False negative
TN
True negative
59
Model’s
Predictions
Negative
Precision=TP/(TP+FP)=ratioofcorrectpositivespredicted tototalpositivepredicted
Recall=TP/(TP+FN)=ratioofcorrectpositivespredicted toactualnumber ofpositives
Typicallywillgethigh precisionforlowrecall,andlowprecision athigh recall
PadhraicSmyth,UCIrvine:CS175,Winter2017
60
Precision
Precisionforclassk
= fractionpredictedasclasskthatbelongtoclassk
= 90%forclass2below
True
Class1
True
Class2
True
Class3
Predicted
Class1
80
60
10
Predicted
Class2
10
90
0
Predicted
Class3
10
30
110
PadhraicSmyth,UCIrvine:CS175,Winter2017
61
Recall
Recallforclassk
=fractionthatbelongtoclasskpredictedasclassk
=50%forclass2below
True
Class1
True
Class2
True
Class3
Predicted
Class1
80
60
10
Predicted
Class2
10
90
0
Predicted
Class3
10
30
110
PadhraicSmyth,UCIrvine:CS175,Winter2017
62
BinaryClassification:SimpleExampleofPrecisionandRecall
• Testsetwith10items,binarylabels,5fromeachclass
Scores from
theModel
TrueLabel
0.97
1
0.95
1
0.93
0
0.81
1
0.55
0
0.28
1
0.17
0
0.15
1
0.10
0
0.03
0
PadhraicSmyth,UCIrvine:CS175,Winter2017
63
Precision-RecallwithHighThreshold
• Testsetwith10items,binarylabels,5fromeachclass
Threshold
Scores from
theModel
TrueLabel
0.97
1
0.95
1
0.93
0
0.81
1
0.55
0
0.28
1
0.17
0
0.15
1
0.10
0
0.03
0
Precision:
Percentageabovethreshold
thatareactuallyclass1=2/2=100%
Recall:
Percentageoftotalnumberofpositive
examplesthatareabovethethreshold
=2/5=40%
PadhraicSmyth,UCIrvine:CS175,Winter2017
64
Precision-RecallwithHighThreshold
• Testsetwith10items,binarylabels,5fromeachclass
Threshold
Scores from
theModel
TrueLabel
0.97
1
0.95
1
0.93
0
0.81
1
0.55
0
0.28
1
0.17
0
0.15
1
0.10
0
0.03
0
Precision:
Percentageabovethreshold
thatareactuallyclass1=5/8=62.5%
Recall:
Percentageoftotalnumberofpositive
examplesthatareabovethethreshold
=5/5=100%
PadhraicSmyth,UCIrvine:CS175,Winter2017
65
HowarePrecisionandRecallusedasMetrics?
Precision@Kisoftenusedasametric,whereKisthetopKitemsorthetop
K%ofthesortedpredictionlist
– E.g.,useful forinformation retrieval,for search
Precision-recallcurvescanbeplottedbyvaryingthethreshold
Highthreshold(e.g.,0.95)oftenyieldshighprecision,lowrecall
Lowthreshold(e.g.,0.1)oftenyieldslowprecision,highrecall
PadhraicSmyth,UCIrvine:CS175,Winter2017
66
ComputingPrecision-RecallCurves
Rankourtestdocumentsbypredictedprobabilityofbelongingtoclass1
– Sorttherankedlist
– Fort=1….N(N=totalnumber of testdocuments)
• Selecttoptdocumentsonthelist
• Classifydocumentsaboveasclass1,documentsbelowasnotclass1
• Computeprecisionandrecallnumberforeacht
– Generatesasetofpairsof(precision, recall)valueswecanplotas(x,y)
– Perfectperformance: precision =1,recall=1
• Requiresthatallthepositiveexamplesgethigherscoresthanallnegatives
PadhraicSmyth,UCIrvine:CS175,Winter2017
67
Precision-RecallExample:ReutersData,ClassLabel=CrudeOil
1
0.9
0.8
0.7
Precision
0.6
LSVM
DecisionTree
NaïveBayes
Rocchio
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
Recall
FigurefromS.
Dumais
(1998)
PadhraicSmyth,UCIrvine:CS175,Winter2017
68
AdditionalAspectsofDocumentClassifiers
PadhraicSmyth,UCIrvine:CS175,Winter2017
69
AlgorithmParameters
Manymachinelearning/AIalgorithmshaveparametersthatcanbe“tuned”
Example:classificationalgorithms
– Logisticregression
• Objectivefunction=errorfunction+ λ *sumofweightssquared
• Hereλ isaparameterofthemethodthatcontrolstheamountofregularization
– Support vectormachines
• TheparameterCcontrolsregularization
Defaultsettings
– Thesoftwareyou areusing(e.g.,from scikit-learn)mayhavedefaultvaluesfor
theseparameters
– Notethatthesearenotnecessarilyoptimal
– Youmaywanttoexperiment withdifferent valuesviacross-validation
PadhraicSmyth,UCIrvine:CS175,Winter2017
70
Sensitivity
• Accuracyontestdatamaybesensitivetothevaluesoftheparameters
– E.g.,withsmallerdatasetsmoreregularization maygivebetterresults
• Howcanwefigureoutwhatvaluesworkbest?
• Useavalidationset(orcross-validation)fromwithinyourtrainingdata
– Settheparametertodifferent values(above andbelowdefaultvalue)
• Initiallymaywanttosearchonalog-scaletogetageneralideaofthescale
– e.g.,tryλ valuesof0.001,0.01,0.1,1,10,100
– Foreachvalue,trainandtestaclassifier
– Lookat(e.g.,plot) thetestaccuracyasafunction oftheparametervalue
• Canthenrerunonafinerscaletozero-in(optional)
– Thisisknownasgrid search
PadhraicSmyth,UCIrvine:CS175,Winter2017
71
LearningCurves
From:http://scikit-learn.org/stable/modules/learning_curve.html
PadhraicSmyth,UCIrvine:CS175,Winter2017
72
LearningCurves
From:http://scikit-learn.org/stable/modules/learning_curve.html
PadhraicSmyth,UCIrvine:CS175,Winter2017
73
FeatureSelection
• CanbeparticularlyusefulforNaïveBayesmodels
– Logisticregression (withregularization) andSVMsareoftenrobust enough to
notneedfeatureselection
• Selectthemostcommonfeatures,e.g.,mostcommonwords?
– Thesearenot necessarilythemostdiscriminativewords
PadhraicSmyth,UCIrvine:CS175,Winter2017
74
FeatureSelection
• CanbeparticularlyusefulforNaïveBayesmodels
– Logisticregression (withregularization) andSVMsareoftenrobust enough to
notneedfeatureselection
• Selectthemostcommonfeatures,e.g.,mostcommonwords?
– Thesearenot necessarilythemostdiscriminativewords
• Betterapproach:selecttheKmostdiscriminativefeatures
– Simple method
• Computehowwelleachfeaturedoesonitsown,selecttopK
• e.g.,usethemutualinformationmeasure(seeChapter13,Sect5,inManningtext)
– Greedymethod:
•
•
•
•
Trainaclassifieroneachfeatureonitsown,selectthefeaturethatdoesbest
Thenfromtherest,trainD– 1classifiers,addingeachotherfeatureoneatatime
Pickthebest
ContinueuntilKfeatureshavebeenadded
PadhraicSmyth,UCIrvine:CS175,Winter2017
75
AdditionalTopics
• Using“documentzones”
– Weighting of termsdifferent zones, e.g,.upweighting wordsintitles,abstracts
– Separatefeatures/term-counts fordifferent sectionsof adocument
• E.g.,patentabstractvclaims
• Featuresbeyondterm-countsortermpresence
– UsingTF-IDFfor term-weighting
– Grouping termsbasedonprior knowledge (e.g.,listsofpositive andnegative
sentiment words)
– Usingparts-of-speechinformation (e.g.,number ofadjectives)
• Semi-supervisedlearning
– Probabilistic algorithms (suchasExpectation-Maximization) thatcanlearnfrom
both labeledandunlabeled examples
– Stillmoreofaresearchtopicthanapracticaltechnique
[SeeManning etal,Chapter15,Section 15.3forfurther discussion]
PadhraicSmyth,UCIrvine:CS175,Winter2017
76
Advice:Itsnotalwaystheaboutthealgorithm…
• Itistemptinginapredictivemodelingprojecttospendalotoftime
experimentingwiththealgorithms
– E.g.,different neural networkarchitecturesandlearning heuristics
• Butinpracticewhichvariablesareusedmaybemuchmoreimportantthan
whichalgorithm
• Soausefulstrategymaybetorethinkwhatthevariables/attributesare.
Theremightbeanothervariableortwoyoucanaddtoyourproblemthat
willincreaseclassificationaccuracybyalargemargin.
PadhraicSmyth,UCIrvine:CS175,Winter2017
77
ExperimentsComparingDifferentClassifiers
FromZhangandOles,Textcategorizationbasedonregularizedlinearclassificationmethods
PadhraicSmyth,UCIrvine:CS175,Winter2017
Sec. 15.3.1
78
AccuracyasaFunctionofDataSize
Withenough datathe
choiceofclassifiermaynot
mattermuch
Scaling to very very large corpora for natural
language disambiguation
Banko and Brill, Annual Meeting of the ACL,
2001
PadhraicSmyth,UCIrvine:CS175,Winter2017
79
ProjectSlidePresentations
• In-class,MondayandWednesdaynextweek
• Eachstudentorteamwillmakeonepresentation
– Approximately 4minutesperpresentation
• Timeaftereachpresentationforquestionsandchangeover
– Soabout6minutes x11projects perday=66minutes total
• ListandorderofpresentationswillbeannouncedthisWednesday(byemail
andonPiazza)
• SlidesneedtobeuploadedtotheEEEdropbox by1pmthedayofyour
presentation:
– theywillbeloadedontotheclassroommachine,noneedtouseyourlaptop.
– EitherPDForPowerpoint isfine
PadhraicSmyth,UCIrvine:CS175,Winter2017
80
WeeklySchedule
Week
Monday
Wednesday
Feb27
Lecture:Progress Reports,etc
Officehours(nolecture)
ProgressReportdue by11pm
Mar6
ProjectPresentations(inclass)
Submit slidesby1pm
ProjectPresentations(inclass)
Submit slidesby1pm
Mar13
ShortLectureonFinalProjectReports
Nolectureor officehours
Mar20
FinalReportduetoEEE byTuesdayMarch21st 11pm
Note:nofinalexam,justsubmityourfinalreportbyTuesdaynightoffinalsweek
PadhraicSmyth,UCIrvine:CS175,Winter2017