Named Entity Recognition Challenges Proper Name Ambiguity

NamedEntityRecognition
NamedEntityRecognition(NER)systemsidentifyspecifictypesof
entities,primarilypropernamedentitiesandspecialcategories:
• ProperNames:people,organizations,locations,etc.
ElvisPresley,IBM,DepartmentoftheInterior,CentersforDiseaseControl
• Dates&Times:ubiquitousandsurprisinglyvaried
November9,1997,11/9/97,10:29pm
• Measures:measurementswithspecificunits
45%,5.3lbs,65mph,$1.4billion
• Other:Application-specificstylizedterms.
URLs,emailaddresses,phonenumbers,socialsecuritynumbers
Challenges
• Nodictionarywillcontainallexistingpropernames.
Newnamesareconstantlybeingcreated.
• Justfindingthepropernamesisn’ttrivial.
– thefirstwordofeverysentenceiscapitalized
– propernamescanincludelowercasewords(e.g.,UofU)
• Propernamesareoftenabbreviatedorturnedinto
acronyms.
• Butnotallacronymsarepropernames!
– Ex:CS,NLP,AI,OS,etc.
ProperNameAmbiguity
• Manycompanies,organizations,andlocationsarenamed
afterpeople!
Companies:Ford,JohnHancock,CalvinKlein,PhillipMorris
Universities:BrighamYoung,Smith,McGill,Purdue
Sites:JFK (airport),Washington (capital,state,county,etc.)
• Acronymscanoftenrefertomanydifferentthings
– UT,MRI,SCI(checkoutthedifferenthitsonGoogle!)
• Manypropernamescancorrespondtomultipleypes
– April,June,Georgia,Jordan,Calvin&Hobbes
ClassifierLabelingforNER
• Whennamedentityrecognitionisviewedasaclassification or
sequential tagging problem,everywordislabeledbasedon
whetheritispartofanamedentity.
• ThemostcommonlabelingschemeisBIO tagging,where
B =Beginning,I=Inside,andO =Outside.
Example:
John/BSmith/Ibought/OMary/BAnn/IJones/Ia/O book/O
Adifferentclassifiermaybecreatedforeachentitytype,ordifferent
labelsareneededforeachentitytype(e.g.,BORG andBPER).
The/O University/BORG of/IORG Utah/IORGhired/O
Dr./BPER Susan/IPER Miller/IPER on/OTuesday/BDATE .
MessageUnderstandingConferences
AMaximumEntropySystemforNER
• AseriesofMessageUnderstandingConferences(MUCs)wereheld
inthe1990s,whichplayedamajorroleinadvancinginformation
extractionresearch.
• TheMENERGIsystemisaniceexampleofamaximum
entropyapproachtoNER:
• MUC-3throughMUC-7werecompetitiveperformanceevaluations
ofIEsystemsbuiltbydifferentresearchgroups.
• TheMUCsestablishedlarge-scale,realisticperformance
evaluationswithformalevaluationcriteriafortheNLPcommunity.
• Thetasksincludednamedentityrecognition,eventextraction,and
coreferenceresolution.
• SomeoftheMUCdatasetsarestillusedasevaluation
benchmarks.
NERTypesandTaggingScheme
• MENERGIrecognizes7typesofnamedentities,basedon
theMUC-6/7NERtaskdefinition:
person,organization,location,date,time,money,percent
• MENERGIusesaBCEUtaggingscheme:
Begin/Continue/End
Salt/BLake/CCity/E
Unique
Utah/U
– NamedEntityRecognition:AMaximumEntropyApproach
UsingGlobalInformation,[Chieu&Ng,2002]
• GoodNERsystemstypicallyusealargesetoflocal
featuresbasedonpropertiesofthetargetwordand
itsneighboringwords.
• Recentresearchhasbeguntoalsoincorporateglobal
featuresthatcaptureinformationfromacrossthe
document.
ApplyingtheClassifier
• OneproblemwithNERclassifiersisthattheycan
produceinadmissible(illegal)tagsequences.
Forexample,anEndtagwithoutaprecedingBegintag
• Toeliminatethisproblem,theydefinedtransition
probabilitiesbetweenclassesP(ci |ci-1)tobe1ifthe
sequenceisadmissibleor0ifitisillegal.
• P(ci |s,D)isproducedbytheMaxEntclassifier.
• Intotal,thesystemhas29classes:
BCEUtagsforeachNEclass(4x7)
28tags
aspecialtagforOther(notaNE)
1tag
N
P(c1, ….,cn| s,D) = P P(ci | s,D) * P(ci | ci-1)
i=1
FeatureSet
• Theclassifierusesonesetoflocal features,whichare
basedonpropertiesofthetargetwordw,thewordonits
leftw-1,andthewordonitsrightw+1.
ExternalDictionaries
Severalexternaldictionarieswerecreatedby
compilinglistsoflocations,companies,andperson
names.
• Theclassifieralsousesasetofglobal features,whichare
extractedfrominstancesofthesametokenthatoccur
elsewhereinthedocument.
• Featuresthatoccurinfrequentlyinthetrainingsetare
discardedasaformoffeatureselection.
SummaryofLocalFeatures
• stringsofthetarget,previous,andnextwords
Thisisverycommon– largelistsareeasytoobtain
andcanreallyhelpanNERsystem.
TargetWordCharacterFeatures
Feature
Description
Example
InitCapPeriod
startswithacapitalletter,
endsw/period
Mr.
• capitalization-basedfeatures
OneCap
containsonlyonecapital
letter
A
• isitfirstwordofthesentence?
AllCaps-Period
CORP.
• isitinWordNet?(OOV=out-of-vocabularyfeature)
Allcapitallettersand
period
Contain-Digit
Containsadigit
AB3747
• presenceofthetarget,previous,andnextwordsindictionaries
TwoD
2digits
99
• isitamonth,day,ornumber
FourD
4digits
1999
Digit-slash
digitsandslash
01/01
• isitpreceded/followedbyanNEclassprefix/suffixterm
Dollar
contains$
US$20
• 10featuresthatlookforspecificcharactersinthecurrentword
string
Percent
contains%
20%
Digit-Period
containsdigitandperiod
$US3.20
• zoneoftheword(headline,dateline,DD,ormaintext)
AmbiguousContexts
Somenamedentitiesoccurinambiguouscontextsthat
canbeconfusingevenforhumanreaders.
McCanninitiatedanewglobalsystem.
TheCEOofMcCannannounced…
TheMcCannfamilyannounced…
GlobalFeatures
• Traditionally,NERsystemsclassifiedeachword/phrase
independentlyofotherinstancesofthesame
word/phraseinotherpartsofthedocument.
• Butothercontextsmayprovidevaluablecluesabout
whattypeofentityitis.Forexample:
– capitalizationisindicativeifnotthefirstwordofasentence
LizClaibornerecentlypurchasedShoesRUsfor$1.3milllion.
– somecontextscontainstrongprefixes/suffixesinaphrase
Sheboughttheshoeretailertobeginfranchisingitnationwide.
– somecontextscontainstrongpreceding/followingneighbors
Thecompanyboughttheshoeretailertoexpanditsproductline.
– acronymscanoftenbealignedwiththeirexpandedphrase
SummaryofGlobalFeatures
• ICOC:ifanotheroccurrenceofthewordappearsinan
unambiguousposition(notfirstword),isitcapitalized?
• CSPP:dootheroccurrencesofthewordoccurwithaknown
namedentityprefix/suffix?
• ACRO:ifthewordlookslikeanacronym,isthereacapitalized
sequenceofwordsanywherewiththeseleadingletters?Ifso,
acronymfeaturesareassignedtothelikelyacronymwordandthe
correspondingwordsequence.
• SOIC:forcapitalizedwordsequences,thelongestsubstringsthat
appearelsewhereareassignedfeatures.
• UNIQ:isthewordcapitalizedandunique?
NERResults
Ablationstudieslookatthecontributionoffeaturesor
componentsindividuallytodeterminehowmuch(ifany)
impacteachonemakestothesystemasawhole.
F-measurescores:
MUC-6
MUC-7
Baseline
90.75
85.22
+ICOC
91.50
86.24
+CSPP
92.89
86.96
+ACRO
93.04
86.99
+SOIC
93.25
87.22
+UNIQ
93.27
87.24
DesignChallenges&Misconceptions
Ratinov&Roth[CoNLL2009]investigatedseveralissues
relatedtothedesignofNERsystems.
KeydesigndecisionsinanNERsystem
1. Howtorepresenttextchunks?
2. Whatinferencealgorithmtouse?
3. Howtomodelnon-localdependencies?
MotivatingExample
SOCCER– BLINKERBANLIFTED
LONDON1996-12-06DutchforwardReggieBlinkerhad
hisindefinitesuspensionliftedbyFIFAonFridayandwas
settomakehisSheffieldWednesdaycomebackagainst
LiverpoolonSaturday.Blinkermissedhisclub’slasttwo
gamesafterFIFAslappedaworldwidebanonhimfor
appearingtosigncontractsforbothWednesdayand
UdinesewhilehewasplayingforFeyenoord.
4. Howtouseexternalknowledgeresources?
MotivatingExample
SOCCER– BLINKERPER BANLIFTED
LONDONLOC 1996-12-06DutchMISC forwardReggiePER
BlinkerPER hadhisindefinitesuspensionliftedbyFIFAORG
onFridayandwassettomakehisSheffieldORG
WednesdayORG comebackagainstLiverpoolORG on
SaturdayDATE.BlinkerPER missedhisclub’slasttwogames
afterFIFAORG slappedaworldwidebanonhimfor
appearingtosigncontractsforbothWednesdayORG and
UdineseORG whilehewasplayingforFeyenoordORG.
BaselineNERSystem
• TheycreatedabaselineNERclassifierusinggeneral,
widelyusedfeatures.
– thecurrentword
– theword’stype(all-caps,is-cap,all-digits,alphanumeric,etc)
– theword’sprefixesandsuffixes
– acontextwindowofthe2previousand2followingwords
– capitalizationpatterninthecontextwindow
– thepredictionsofthetwopreviouswords
– conjunctionofthecontextwindowandpreviousprediction
HowtoRepresentTextChunks?
Theycomparedthetwomostpopulartaggingschemes:
WhatInferenceAlgorithmtoUse?
Theyevaluated3typesofdecoding(inference)algorithms:
– BIO :Beginning,Inside,Outside
1. Greedyleft-to-right
– BILOU:Beginning,Inside,Last(formulti-wordentities),
Outside,andUnit-Length(forsinglewordentities)
3. Viterbi
2. Beamsearch
ResultsonCoNLLtestdata:
SaltB-LOC LakeI-LOC CityL-LOCisO locatedO inO UtahU-LOC
CoNLL-03
MUC-7
BIO
89.15
85.15
BILOU
90.57
85.62
Conclusion:BILOUtaggingworksbetterthanBIOtaggingforNER.
Non-LocalDependencies
• Non-localfeaturesoftentrytocaptureinformationfrom
multipleinstancesofthesametokeninadocument.
Example:
“Blinker” inonecontext(hardtorecognizeasPERSON)
“ReggieBlinker” elsewhere(easiertorecognize)
• Typically,anassumptionismadethatallinstancesofthe
sametokenshouldhavethesameNERtype.
• Thisassumptionoftenholds,butcanbeviolated.
Example:“Australia” and“BankofAustralia” inthesame
document.
BaselineSystem
FinalSystem
Greedy
83.29
90.57
Beam(size=10)
83.38
90.67
Beam(size=100)
83.38
90.67
Viterbi
83.71
n/a
Conclusion:simplegreedydecodingworksquitewellforNERand
ismuchfasterthanViterbi.
ModelingNon-LocalDependencies
Severalapproacheswereimplemented:
• Contextaggregation:usecontextfeaturesfromallinstances.
• Twostageprediction:applyabaselineNERsystemfirstanduseits
predictionsasfeaturesinasecondNERclassifier.
• Extendedpredictionhistory:basedonobservationthatentities
introducedatthebeginningofadocumentareusuallyeasierto
disambiguatethanthoselaterinadocument.
Afeaturecapturesthedistributionoflabelsassignedtopreceding
instancesofatoken(e.g.,L-ORG=.40,U-LOC=.60).
Conclusion:inconsistentresultsdatasets!ButTOGETHERthey
consistentlyimprovedresults.
ExternalKnowledge
• Externaldictionariesandknowledgebasesofnames,locations,
temporalexpressions,etc.canbeenormouslyhelpful.They
compiled30suchresourcesanddefinedlook-upfeatures.
• Theyused“Brownclustering” (hierarchicalwordclusters)
generatedfromunlabeledReuterstexts.Pathprefixeswereused
asfeatures.
CoNLL-03
MUC-7
Web
Baseline(BL)
83.65
71.28
71.41
BL+Dictionaries
87.22
80.43
74.46
BL+WordClusters
86.82
79.88
72.26
ALL
88.55
83.23
74.44
• Conclusion:bothdictionariesandwordclustersarebeneficial.
Conclusions
• NERcanbenefitfromusingalargertagset(BILOU),
externalresources,andfeaturestocapturenon-local
dependenciesacrossadocument.
• Althoughindividualgainsmaybesmall,theycanaddup
toproducesubstantialgains.
StanfordNER
LBJ-NER
MUC-7
80.62
85.71
Web
72.50
74.89
Reuters2003
87.04
90.74