NamedEntityRecognition NamedEntityRecognition(NER)systemsidentifyspecifictypesof entities,primarilypropernamedentitiesandspecialcategories: • ProperNames:people,organizations,locations,etc. ElvisPresley,IBM,DepartmentoftheInterior,CentersforDiseaseControl • Dates&Times:ubiquitousandsurprisinglyvaried November9,1997,11/9/97,10:29pm • Measures:measurementswithspecificunits 45%,5.3lbs,65mph,$1.4billion • Other:Application-specificstylizedterms. URLs,emailaddresses,phonenumbers,socialsecuritynumbers Challenges • Nodictionarywillcontainallexistingpropernames. Newnamesareconstantlybeingcreated. • Justfindingthepropernamesisn’ttrivial. – thefirstwordofeverysentenceiscapitalized – propernamescanincludelowercasewords(e.g.,UofU) • Propernamesareoftenabbreviatedorturnedinto acronyms. • Butnotallacronymsarepropernames! – Ex:CS,NLP,AI,OS,etc. ProperNameAmbiguity • Manycompanies,organizations,andlocationsarenamed afterpeople! Companies:Ford,JohnHancock,CalvinKlein,PhillipMorris Universities:BrighamYoung,Smith,McGill,Purdue Sites:JFK (airport),Washington (capital,state,county,etc.) • Acronymscanoftenrefertomanydifferentthings – UT,MRI,SCI(checkoutthedifferenthitsonGoogle!) • Manypropernamescancorrespondtomultipleypes – April,June,Georgia,Jordan,Calvin&Hobbes ClassifierLabelingforNER • Whennamedentityrecognitionisviewedasaclassification or sequential tagging problem,everywordislabeledbasedon whetheritispartofanamedentity. • ThemostcommonlabelingschemeisBIO tagging,where B =Beginning,I=Inside,andO =Outside. Example: John/BSmith/Ibought/OMary/BAnn/IJones/Ia/O book/O Adifferentclassifiermaybecreatedforeachentitytype,ordifferent labelsareneededforeachentitytype(e.g.,BORG andBPER). The/O University/BORG of/IORG Utah/IORGhired/O Dr./BPER Susan/IPER Miller/IPER on/OTuesday/BDATE . MessageUnderstandingConferences AMaximumEntropySystemforNER • AseriesofMessageUnderstandingConferences(MUCs)wereheld inthe1990s,whichplayedamajorroleinadvancinginformation extractionresearch. • TheMENERGIsystemisaniceexampleofamaximum entropyapproachtoNER: • MUC-3throughMUC-7werecompetitiveperformanceevaluations ofIEsystemsbuiltbydifferentresearchgroups. • TheMUCsestablishedlarge-scale,realisticperformance evaluationswithformalevaluationcriteriafortheNLPcommunity. • Thetasksincludednamedentityrecognition,eventextraction,and coreferenceresolution. • SomeoftheMUCdatasetsarestillusedasevaluation benchmarks. NERTypesandTaggingScheme • MENERGIrecognizes7typesofnamedentities,basedon theMUC-6/7NERtaskdefinition: person,organization,location,date,time,money,percent • MENERGIusesaBCEUtaggingscheme: Begin/Continue/End Salt/BLake/CCity/E Unique Utah/U – NamedEntityRecognition:AMaximumEntropyApproach UsingGlobalInformation,[Chieu&Ng,2002] • GoodNERsystemstypicallyusealargesetoflocal featuresbasedonpropertiesofthetargetwordand itsneighboringwords. • Recentresearchhasbeguntoalsoincorporateglobal featuresthatcaptureinformationfromacrossthe document. ApplyingtheClassifier • OneproblemwithNERclassifiersisthattheycan produceinadmissible(illegal)tagsequences. Forexample,anEndtagwithoutaprecedingBegintag • Toeliminatethisproblem,theydefinedtransition probabilitiesbetweenclassesP(ci |ci-1)tobe1ifthe sequenceisadmissibleor0ifitisillegal. • P(ci |s,D)isproducedbytheMaxEntclassifier. • Intotal,thesystemhas29classes: BCEUtagsforeachNEclass(4x7) 28tags aspecialtagforOther(notaNE) 1tag N P(c1, ….,cn| s,D) = P P(ci | s,D) * P(ci | ci-1) i=1 FeatureSet • Theclassifierusesonesetoflocal features,whichare basedonpropertiesofthetargetwordw,thewordonits leftw-1,andthewordonitsrightw+1. ExternalDictionaries Severalexternaldictionarieswerecreatedby compilinglistsoflocations,companies,andperson names. • Theclassifieralsousesasetofglobal features,whichare extractedfrominstancesofthesametokenthatoccur elsewhereinthedocument. • Featuresthatoccurinfrequentlyinthetrainingsetare discardedasaformoffeatureselection. SummaryofLocalFeatures • stringsofthetarget,previous,andnextwords Thisisverycommon– largelistsareeasytoobtain andcanreallyhelpanNERsystem. TargetWordCharacterFeatures Feature Description Example InitCapPeriod startswithacapitalletter, endsw/period Mr. • capitalization-basedfeatures OneCap containsonlyonecapital letter A • isitfirstwordofthesentence? AllCaps-Period CORP. • isitinWordNet?(OOV=out-of-vocabularyfeature) Allcapitallettersand period Contain-Digit Containsadigit AB3747 • presenceofthetarget,previous,andnextwordsindictionaries TwoD 2digits 99 • isitamonth,day,ornumber FourD 4digits 1999 Digit-slash digitsandslash 01/01 • isitpreceded/followedbyanNEclassprefix/suffixterm Dollar contains$ US$20 • 10featuresthatlookforspecificcharactersinthecurrentword string Percent contains% 20% Digit-Period containsdigitandperiod $US3.20 • zoneoftheword(headline,dateline,DD,ormaintext) AmbiguousContexts Somenamedentitiesoccurinambiguouscontextsthat canbeconfusingevenforhumanreaders. McCanninitiatedanewglobalsystem. TheCEOofMcCannannounced… TheMcCannfamilyannounced… GlobalFeatures • Traditionally,NERsystemsclassifiedeachword/phrase independentlyofotherinstancesofthesame word/phraseinotherpartsofthedocument. • Butothercontextsmayprovidevaluablecluesabout whattypeofentityitis.Forexample: – capitalizationisindicativeifnotthefirstwordofasentence LizClaibornerecentlypurchasedShoesRUsfor$1.3milllion. – somecontextscontainstrongprefixes/suffixesinaphrase Sheboughttheshoeretailertobeginfranchisingitnationwide. – somecontextscontainstrongpreceding/followingneighbors Thecompanyboughttheshoeretailertoexpanditsproductline. – acronymscanoftenbealignedwiththeirexpandedphrase SummaryofGlobalFeatures • ICOC:ifanotheroccurrenceofthewordappearsinan unambiguousposition(notfirstword),isitcapitalized? • CSPP:dootheroccurrencesofthewordoccurwithaknown namedentityprefix/suffix? • ACRO:ifthewordlookslikeanacronym,isthereacapitalized sequenceofwordsanywherewiththeseleadingletters?Ifso, acronymfeaturesareassignedtothelikelyacronymwordandthe correspondingwordsequence. • SOIC:forcapitalizedwordsequences,thelongestsubstringsthat appearelsewhereareassignedfeatures. • UNIQ:isthewordcapitalizedandunique? NERResults Ablationstudieslookatthecontributionoffeaturesor componentsindividuallytodeterminehowmuch(ifany) impacteachonemakestothesystemasawhole. F-measurescores: MUC-6 MUC-7 Baseline 90.75 85.22 +ICOC 91.50 86.24 +CSPP 92.89 86.96 +ACRO 93.04 86.99 +SOIC 93.25 87.22 +UNIQ 93.27 87.24 DesignChallenges&Misconceptions Ratinov&Roth[CoNLL2009]investigatedseveralissues relatedtothedesignofNERsystems. KeydesigndecisionsinanNERsystem 1. Howtorepresenttextchunks? 2. Whatinferencealgorithmtouse? 3. Howtomodelnon-localdependencies? MotivatingExample SOCCER– BLINKERBANLIFTED LONDON1996-12-06DutchforwardReggieBlinkerhad hisindefinitesuspensionliftedbyFIFAonFridayandwas settomakehisSheffieldWednesdaycomebackagainst LiverpoolonSaturday.Blinkermissedhisclub’slasttwo gamesafterFIFAslappedaworldwidebanonhimfor appearingtosigncontractsforbothWednesdayand UdinesewhilehewasplayingforFeyenoord. 4. Howtouseexternalknowledgeresources? MotivatingExample SOCCER– BLINKERPER BANLIFTED LONDONLOC 1996-12-06DutchMISC forwardReggiePER BlinkerPER hadhisindefinitesuspensionliftedbyFIFAORG onFridayandwassettomakehisSheffieldORG WednesdayORG comebackagainstLiverpoolORG on SaturdayDATE.BlinkerPER missedhisclub’slasttwogames afterFIFAORG slappedaworldwidebanonhimfor appearingtosigncontractsforbothWednesdayORG and UdineseORG whilehewasplayingforFeyenoordORG. BaselineNERSystem • TheycreatedabaselineNERclassifierusinggeneral, widelyusedfeatures. – thecurrentword – theword’stype(all-caps,is-cap,all-digits,alphanumeric,etc) – theword’sprefixesandsuffixes – acontextwindowofthe2previousand2followingwords – capitalizationpatterninthecontextwindow – thepredictionsofthetwopreviouswords – conjunctionofthecontextwindowandpreviousprediction HowtoRepresentTextChunks? Theycomparedthetwomostpopulartaggingschemes: WhatInferenceAlgorithmtoUse? Theyevaluated3typesofdecoding(inference)algorithms: – BIO :Beginning,Inside,Outside 1. Greedyleft-to-right – BILOU:Beginning,Inside,Last(formulti-wordentities), Outside,andUnit-Length(forsinglewordentities) 3. Viterbi 2. Beamsearch ResultsonCoNLLtestdata: SaltB-LOC LakeI-LOC CityL-LOCisO locatedO inO UtahU-LOC CoNLL-03 MUC-7 BIO 89.15 85.15 BILOU 90.57 85.62 Conclusion:BILOUtaggingworksbetterthanBIOtaggingforNER. Non-LocalDependencies • Non-localfeaturesoftentrytocaptureinformationfrom multipleinstancesofthesametokeninadocument. Example: “Blinker” inonecontext(hardtorecognizeasPERSON) “ReggieBlinker” elsewhere(easiertorecognize) • Typically,anassumptionismadethatallinstancesofthe sametokenshouldhavethesameNERtype. • Thisassumptionoftenholds,butcanbeviolated. Example:“Australia” and“BankofAustralia” inthesame document. BaselineSystem FinalSystem Greedy 83.29 90.57 Beam(size=10) 83.38 90.67 Beam(size=100) 83.38 90.67 Viterbi 83.71 n/a Conclusion:simplegreedydecodingworksquitewellforNERand ismuchfasterthanViterbi. ModelingNon-LocalDependencies Severalapproacheswereimplemented: • Contextaggregation:usecontextfeaturesfromallinstances. • Twostageprediction:applyabaselineNERsystemfirstanduseits predictionsasfeaturesinasecondNERclassifier. • Extendedpredictionhistory:basedonobservationthatentities introducedatthebeginningofadocumentareusuallyeasierto disambiguatethanthoselaterinadocument. Afeaturecapturesthedistributionoflabelsassignedtopreceding instancesofatoken(e.g.,L-ORG=.40,U-LOC=.60). Conclusion:inconsistentresultsdatasets!ButTOGETHERthey consistentlyimprovedresults. ExternalKnowledge • Externaldictionariesandknowledgebasesofnames,locations, temporalexpressions,etc.canbeenormouslyhelpful.They compiled30suchresourcesanddefinedlook-upfeatures. • Theyused“Brownclustering” (hierarchicalwordclusters) generatedfromunlabeledReuterstexts.Pathprefixeswereused asfeatures. CoNLL-03 MUC-7 Web Baseline(BL) 83.65 71.28 71.41 BL+Dictionaries 87.22 80.43 74.46 BL+WordClusters 86.82 79.88 72.26 ALL 88.55 83.23 74.44 • Conclusion:bothdictionariesandwordclustersarebeneficial. Conclusions • NERcanbenefitfromusingalargertagset(BILOU), externalresources,andfeaturestocapturenon-local dependenciesacrossadocument. • Althoughindividualgainsmaybesmall,theycanaddup toproducesubstantialgains. StanfordNER LBJ-NER MUC-7 80.62 85.71 Web 72.50 74.89 Reuters2003 87.04 90.74
© Copyright 2026 Paperzz