aspect extraction

EntityandAspectExtraction
forOrganizingNewsComments
RadityoEkoPrasojo,MounaKacimi&WernerNutt
EMCLWorkshop,11–12February2016
CommentsinNewsWebsite
Typically, commentsarelistedbased
ondate-timeandreplyrelation.
Problem:difficultytocatchtheflow
ofthediscussionsandtounderstand
theirmainpointsofagreementand
disagreement.
Example:whyisindependence
good/bad fortheScots?Willtheir
economy beaffected?
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
2
Thereisaneedfororganizingcomments
to help users to:
(1) have a better understanding of the viewpoints related to each topic
(2) facilitate the participation in discussions and thus increase the
chance of acquiring new viewpoints
by clustering comments containing similar discussions:
• they talk about the same entities
• they argue about the same aspects of those entities.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
3
Contributions
• Improvementsonstate-of-the-artunsupervisedentityextraction
tools(Zemanta,NERD,AIDAYago)
• Addressedissues:noisesandlowcoverage(duetocoreferences)
• Introduced aspectextractioninnewsdomain
• Previously:aspectsonlyonproductreviewdomain(Zhang&Liu,2014)
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
4
EntityextractionBaseline:UnsupervisedTools
<http://dbpedia.org/resource/NATO>
“Don't be afraid of Rasmussen or NATO,
this is none of their business. If he ends
up sending USS Ronald Reagan aircraft
carrier to the coast of Scotland, then he
should have done the same to Crimea.”
• Improvedbyapplying:
•
•
•
•
Entityfiltering
Namenormalization
EntitysearchonKB
Coreference resolution
EMCLWorkshop 2016
“Don't be afraid of Rasmussen or NATO,
this is none of their business. If he ends up
sending USS Ronald Reagan
aircraft carrier to the coast of Scotland,
then he should have done the same to
Crimea.”
<http://dbpedia.org/resource/Crimea>
<http://dbpedia.org/resource/Aircraft_Carrier>
<http://dbpedia.org/resource/Scotland>
<http://dbpedia.org/resource/USS_Ronald_Reagan>
Tools:Dbpedia
Stanford CoreNLP
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
5
EntityFiltering
Anentity isaninstanceofsomewell-definedclass
rdf:type ofNATO?à
←dbo:PopulatedPlace
“Don't be afraid of Rasmussen or NATO,
rdf:type?à
this is none of their business. If he ends up sending USS Ronald Reagan
←dbo:Ship
aircraft carrier to the coast of Scotland,
rdf:type?à
then he should have done the same to Crimea.”
←dbo:Location
rdf:type?à
←owl:Thing
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
6
EntityFiltering
Anentity isaninstanceofsomewell-definedclass
rdf:type ofNATO?à
←dbo:PopulatedPlace
“Don't be afraid of Rasmussen or NATO,
rdf:type?à
this is none of their business. If he ends up sending USS Ronald Reagan
←dbo:Ship
aircraft carrier to the coast of Scotland,
rdf:type?à
then he should have done the same to Crimea.”
←dbo:Location
rdf:type?à
←owl:Thing
Weremoveentities thatdon’thaverdf:type otherthanowl:Thing andowl:Class
Risk:lowerrecall
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
7
NameNormalization
Anentity mayappearusingnon-propernames(alias)
ScotlandRejectsIndependence
inReferendum
Lorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusac
bibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
L ore m i ps u m dolo r s ti a me t, con se cte tu ra dip i sc i ng el i t. Ut s u sc p
i i t la cu s a c bi ben dum mol il s. Pe l e nte squ e ti nc i du n t vu lp u tate li gul a a e ffi c ti u r. Ali qua m e ra tv olu tpat. Al q
i u am se de n m
i et tortor
Anders Fogh Rasmussen
…………
fri n gi l l a a sda sda sda sdas das dqwrqwe qwe qwr
w qeqw e
.
M a e c e n a s ve h ic ul a urn a e get m etu s i mpe rdie t com modo.
L ore m i ps u m dolo r s ti a me t, con se cte tu ra dip i sc i ng el i t. Ut s u sc p
i i t la cu s a c bi ben dum mol il s. Pe l e nte as dsa da s qu e t incit
United States
foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à
←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
“Don't be afraid of Rasmussen or NATO, this is none
of their business. If he ends up sending USS Ronald
Reagan aircraft carrier to the coast of Scotland, then
he should have done the same to Crimea.”
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
8
NameNormalization
Anentity mayappearusingnon-propernames(alias)
ScotlandRejectsIndependence
inReferendum
Lorem ipsumdolorsitamet,consectetur adipiscing elit.Ut suscipit lacusac
bibendum mollis.Pellentesque tincidunt vulputate ligulaaefficitur.Aliquam erat
L ore m i ps u m dolo r s ti a me t, con se cte tu ra dip i sc i ng el i t. Ut s u sc p
i i t la cu s a c bi ben dum mol il s. Pe l e nte squ e ti nc i du n t vu lp u tate li gul a a e ffi c ti u r. Ali qua m e ra tv olu tpat. Al q
i u am se de n m
i et tortor
Anders Fogh Rasmussen
…………
fri n gi l l a a sda sda sda sdas das dqwrqwe qwe qwr wqe qwe
. M aecenas
v e h i c u l a urn a ege tm e tus im pe rdie t c om modo.
L ore m i ps u m dolo r s ti a me t, con se cte tu ra dip i sc i ng el i t. Ut s u sc p
i i t la cu s a c bi ben dum mol il s. Pe l e nte as dsa da s qu e t incit
United States
foaf:givenName,foaf:surname?à
←{‘Anders’, ‘Fogh’, ‘Rasmussen’}
dbpedia-owl:wikiPageRedirects of?à
←{‘US’,‘USA’, ‘U.S.’,‘U.S.A.’,…}
“Don't be afraid of Rasmussen or NATO, this is none
of their business. If he ends up sending USS Ronald
Reagan aircraft carrier to the coast of Scotland, then
he should have done the same to Crimea.”
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
Risk:lowerprecision
9
Context-RelatedEntitySearch
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderName dbpedia:Anders_Fogh_Rasmussen,
{‘AndersFogh’, ‘Rasmussen’}
“Don't be afraid of Rasmussen or NATO,
EMCLWorkshop 2016
Stringcomparison
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
10
Context-RelatedEntitySearch
Sometimes,mappingforanaliascannotbefound
AllrelatedentitiesofNATOandtheiraliasesà
←dbprop:leaderName dbpedia:Anders_Fogh_Rasmussen,
{‘AndersFogh’, ‘Rasmussen’}
“Don't be afraid of Rasmussen or NATO,
Stringcomparison
Risk:lowerprecision
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
11
Coreference Resolution
“Don't be afraid of Rasmussen or NATO, this is none
of their business. If he ends up sending USS Ronald
Reagan aircraft carrier to the coast of Scotland, then
he should have done the same to Crimea.”
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
Stanford
CoreNLP
12
Coreference Resolution
Coreference resolution
“Don't be afraid of Rasmussen or NATO, this is none
of their business. If he ends up sending USS Ronald
Reagan aircraft carrier to the coast of Scotland, then
he should have done the same to Crimea.”
Stanford
CoreNLP
Risk:lowerprecision
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
13
EntityExtraction– ExperimentSetup
• 10newsarticlesthatuseDISQUS
• 100commentshavingthehighestwordcounts
• 5studentsasentity andaspect annotators
• Annotateddataasgroundtruth
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
14
EntityExtraction– ExperimentResults
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
15
Aspects
• ofproductentities
• Theaspectsofanentitye arethecomponents andattributes ofe.
(Zhang&Liu,2014)
• ofentitiesonnews
•
•
•
•
“MoreScots woulddefinitelyhavevoted nothanyes”(voting - action)
“OrlandoBloomisagoodactor”(acting - skill)
“…it’stherightthe rightofScottishPeople”(right - possession)
Other:components,attributes,andmoods
Anaspect isallwhatisarguableaboutanentity
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
16
TypesofAspects
aspect::right
• “…it’stherightthe rightofScottishPeople.”(explicit)
aspect::employer
• “Tesco is large.”(implicit)
aspect::beauty
• “Scotlandisavery beautiful country.”(semi-implicit)
aspect::voting
• “TheScotscan vote howevertheywant.” (semi-implicit)
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
17
ExtractionofExplicitAspects:
ExploitingDependency
PrepositionalDependency
“…it’stherightthe rightofScottishPeople.”
StanfordCoreNLP Annotation
EMCLWorkshop 2016
Extractallnounphrasesthathave
annmod:of,nmod:in,nmod:on,ornmod:at
relationtowardstheentity inthesentence.
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
18
ExtractionofExplicitAspects,Combinedwith
EntityExtraction
• Specifically,thecoreference resolutionpart
“Don't be afraid of Rasmussen or
NATO, this is none of their business.”
possessivedependency
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
19
ExtractionofImplicitAspects:
Adjective-to-aspectMapping
Ideafrom(Zhang&Liu,2014)
“Tesco islarge”
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:
• employer(2x)
• backoffice(1x)
• callcenteroperation(1x)
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.Resultis
furtherimprovedby(1)takingintoaccountfrequentcontextwordsand(2)lexical
relationsofadjective.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
20
ExtractionofSemi-ImplicitAspects(1)
• Semi-implicitaspects:implicitaspectsthatdon’thavemapping.
• “Scotlandisavery beautiful country.”
StanfordCoreNLP Annotation
Lexicaldatabasesearch
Weconsiderfollowinglexicalrelations:
attribute>pertainym >participleofverb>
derivationallyrelatedform(drf)>seealso
beautiful WordNet:attribute
beauty
Wesearchforanounphrasethatisconnectedtotheentityandthe
wordindicatingtheaspectusingalexicalrelation.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
21
ExtractionofSemi-ImplicitAspects(2)
• Generallycanbeusedtoidentifyaspects fromverb,adjective,ornoun.
• Otherexamples:
WordNet:attribute
beautiful
beauty
WordNet:pertainym
instrumental
instrument
voteWordNet:derivationally relatedform voting
WordNet:antonym
WordNet:similar to
inexcusable
excusable
justifiable
WordNet:drf
justification
justify WordNet:drf
• Iftherearemultiplepossibleaspects forasingleword,weuse
WordNet::Similaritytodecideforthebestone.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
22
AspectExtraction–
ExperimentandResults
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
23
Visualization
REPrasojo,MKacimi,F
Darari.IEEEInfoVis.
Chicago,25-30October
2015.
demoisnowavailable
atorcaestra.inf.unibz.it
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
24
Conclusion
• Generalcontribution:aframeworkfororganizingnewscomment using
entityextractionandaspectextraction.
• Ourentityextraction: unsupervisedtools+entityfiltering, name
normalization,entitysearch,andcoreference resolution.
• Weextractexplicit,implicit,andsemi-implicit aspectsusinggrammar
analysisandlexicaldatabasesearch.
• Experimentshowsimprovementonbothentity andaspect extraction
comparedtobaselinetechnique.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
25
FutureWorks
• Addressinglimitations:
•
•
•
•
Conceptextractions
Idiomsandothermetaphoricalexpressions
Difficultcoreference (e.g.demonstrativepronouns)
Experimentsonmore,variousdata
• Completethemissingpieces:
• Sentimentanalysisfornewscomments
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
26
Acknowledgements
• EuropeanMaster’sProgramme inComputationalLogic
• To-KnowProject
• SIGIRStudentTravelGrant
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
27
Thankyou!
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
28
Extraslides
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
29
Anentity canbe…
aperson,alocation,anorganization,oranywelldefinedconcept suchasnationalities,languages,or
wars.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
30
Entity Extraction Tasks
1. Recognition– throughpropernames/rigiddesignators
(Coates-Stephens,1992)(Thielen,1995)(Nadeau&Sekine,2006)
“Scotland can vote however it wants, it's the Scottish peoples
right.”
2. Disambiguation – bymappingtoarepresentationinaKB
<http://dbpedia.org/resource/USS_Ronald_Reagan_(CVN-76)>
“If he ends up sending USS Ronald Reagan aircraft carrier to the coast of
Scotland, then he should have done the same to Crimea.”
<http://dbpedia.org/resource/Scotland>
EMCLWorkshop 2016
<http://dbpedia.org/resource/Crimea>
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
31
SupervisedvsUnsupervisedApproaches
toEntityExtraction
AspectofEE
Prominent tools
Supervised
StanfordNLP
Unsupervised
Zemanta,AlchemyAPI,
NERD,AidaYAGO
Recognition ability
depends onthetraining
set
domain-independent
Disambiguation ability
limited
provided byKBs
Running time
fast
slow
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
32
Thereisaneedfororganizingcomments
to help users to:
(1) have a better understanding of the viewpoints related to each topic
(2) facilitate the participation in discussions and thus increase the
chance of acquiring new viewpoints
Our contribution is a comment organization framework:
Entityextraction
EMCLWorkshop 2016
Aspect
extraction
Sentiment
analysis
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
Visualizationand
organization
33
ExtractionofImplicitAspects:
Adjective-to-aspectMapping
Ideafrom(Zhang&Liu,2014)
“Tesco islarge”
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:
• employer(2x)
• backoffice(1x)
• callcenteroperation(1x)
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
34
ExtractionofImplicitAspects:
ContextWords
Supposenowweneedatiebreaker!
rdf:type
“Tesco islargeandthepayisgood” contextwords:{large,pay,company}
contextwords:{large,pay,company}
,company
Inothercomments,wefoundasaspectsofTesco,qualifiedas“large”:
contextwords:{employer,large,salary,job,people}
• employer(2x)contextwords:{employer,large,salary,job,people}
• backoffice(2x)contextwords:{back,office,large,headquarter,building}
contextwords:{back,office,large,area,building}
• callcenteroperation(1x)
Weusecontext-words difference,computedusingWordNet:Similarity.
Weconclude:mostprobably,theemployer aspectofTesco wasmeant.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
35
ExtractionofImplicitAspects:
ExperimentandResult(1)
Adjective-to-aspectmapping
Data
Precision Recall
AllNews
EMCLWorkshop 2016
76.87
26.12
withcontextwords
Precision Recall
88.65
30.03
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
36
ExtractionofImplicitAspects:
LexicalMapping
“Tesco islarge”
Inadditionto“large”,countalsoaspectsqualifiedwithsimilar
adjectives(“huge”,“big”,…)
Similarityismeasuredusing
1. lexicalrelationship<synonym>and<similarto>inWordNet
2. WordNet::Similarity
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
37
ExtractionofImplicitAspects:
ExperimentandResult(2)
Lexicalmapping(1-step)
Data
Precision Recall
AllNews 87.07
32.72
EMCLWorkshop 2016
2-steps
Precision Recall
83.33
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
32.76
38
ExtractionofSemi-ImplicitAspects(2)
• Ifwedon’tfindtheaspectwithin1stepoflexicalsearch:
• “OrlandoBloomisagoodactor”
StanfordCoreNLP Annotation
act
Lexicaldatabasesearch
WordNet:derivationally relatedform
Tofind intermediatesynsets,weconsider
following lexicalrelations:
synonym >antonym>similarto>
derivationallyrelatedform(drf) >seealso
actor
WordNet:derivationally relatedform
acting
Wesearchfortheclosestnountothepseudo-aspect.
EMCLWorkshop 2016
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
39
FindingAspects
1. Findwordsthatrepresenttheaspect
job,beautiful,large,acted
<noun>,<adjective>,or<verb>
2. Identifytheaspect
job,beauty,employer,acting
EMCLWorkshop 2016
Technique:grammaranalysis
Tool:StanfordCoreNLP
Technique:frequency-basedmapping (implicit)
lexicaldatabasesearch(semi-implicit)
Tools:WordNet, DBpedia
Prasojo,Kacimi,&Nutt- EntityandAspectExtractionfor
OrganizingNewsComments
40