The ICWSM 2010 JDPA Corpus for the Automotive

The JDPA Sentiment Corpus
for the Automotive Domain
Jason S. Kessler
Miriam Eckert, Lyndsie Clark,
Nicolas Nicolov
Indiana University
J.D. Power and Associates
Overview
• 335 blog posts containing opinions about cars
– 223K tokens of blog data
• Goal of annotation project:
– Examples of how words interact to evaluate entities
– Annotations encode these interactions
• Entities are invoked physical objects and their
properties
– Not just cars, car parts
– People, locations, organizations, times
Excerpt from the corpus
“last night was nice. sean bought me caribou
and we went to my house to watch the baseball
game …
“… yesturday i helped me mom with brians
house and then we went and looked at a kia
spectra. it looked nice, but when we got up to it,
i wasn't impressed ...”
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
John recently purchased a Honda Civic.
PERSON
CAR
REFERS-TO
It had a great engine,
CAR
a mildly disappointing stereo,
CAR-PART
CAR-PART
REFERS-TO
and was very grippy.
He also considered a BMW
PERSON
which, while priced highly
CAR-FEATURE
CAR
had a better stereo.
CAR-PART
John recently purchased a Honda Civic.
PERSON
CAR
TARGET
TARGET
It had a great engine,
a mildly disappointing stereo,
CAR-PART
CAR
TARGET
CAR-PART
TARGET
and was very grippy.
He also considered a BMW
PERSON
CAR
TARGET
which, while priced highly
CAR-FEATURE
had a better stereo.
CAR-PART
John recently purchased a Honda Civic.
PERSON
CAR
REFERS-TO
It had a great engine,
CAR
PART-OF
a mildly disappointing stereo,
CAR-PART
CAR-PART
PART-OF
REFERS-TO
and was very grippy.
He also considered a BMW
PERSON
CAR
FEATURE-OF
PART-OF
which, while priced highly
CAR-FEATURE
had a better stereo.
CAR-PART
John recently purchased a Honda Civic.
PERSON
CAR
It had a great engine,
CAR
a mildly disappointing stereo,
CAR-PART
CAR-PART
LESS
and was very grippy.
He also considered a BMW
PERSON
CAR
MORE
which, while priced highly
CAR-FEATURE
had a better stereo.
CAR-PART
DIMENSION
Entity-level
sentiment: positive
John recently purchased a Honda Civic.
PERSON
CAR
TARGET
TARGET
TARGET
REFERS-TO
It had a great engine,
CAR
PART-OF
a mildly disappointing stereo,
CAR-PART
CAR-PART
PART-OF
REFERS-TO
TARGET
Entity-level
sentiment: mixed
LESS
and was very grippy.
He also considered a BMW
PERSON
TARGET
TARGET
which, while priced highly
CAR-FEATURE
CAR
FEATURE-OF
MORE
had a better stereo.
CAR-PART
DIMENSION
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
Entity annotations
REFERS-TO
John recently purchased a Civic. It had a
great engine and was priced well.
REFERS-TO
John
PERSON
Civic
CAR
It
engine
priced
CAR-PART
CARFEATURE
• >20 semantic types from
• ACE Entity Mention Detection Task
• Generic automotive types
Entity-relation annotations
Entity-level sentiment:
Positive
• Relations between entities
• Entity-level sentiment
annotations
• Sentiment flow between
entities through relations
• My car has a great engine.
• Honda, known for its high
standards, made my car.
Civic
CAR
PART-OF
engine
CARPART
FEATUREOF
priced
CARFEATURE
Entity annotation type: statistics
• Inter-annotator
agreement
• Among mentions 83%
• Refers-to: 68%
• 61K mentions in corpus
and 43K entities
• 103 documents
annotated by around 3
annotators
MATCH
A1: …Kia Rio…
A2: …Kia Rio…
NOT A MATCH
A1: …Kia Rio…
A2: …Kia Rio…
Sentiment expressions
• Evaluations
• Target mentions
• Prior polarity:
• Semantic orientation
given target
• positive, negative,
neutral, mixed
… a great engine
Prior polarity: positive
highly priced
Prior polarity: negative
highly spec’ed
Prior polarity: positive
Sentiment expressions
• Occurrences in corpus: 10K
• 13% are multi-word
• like no other, get up and go
•
•
•
•
49% are headed by adjectives
22% nouns (damage, good amount)
20% verbs (likes, upset)
5% adverbs (highly)
Sentiment expressions
• 75% of sentiment expression occurrences
have non evaluative uses in corpus
• “light”
– …the car seemed too light to be safe…
– …vehicles in the light truck category…
• 77% sentiment expression occurrences are
positive
• Inter-annotator agreement:
– 75% spans, 66% targets, 95% prior polarity
Modifiers -> contextual polarity
NEGATORS
not a good car
not a very good car
NEUTRALIZERS
if the car is good
I hope the car is good
INTENSIFIERS
a very good car
UPWARD
a kind of good car
DOWNARD
COMMITTERS
I am sure the car is good
UPWARD
I suspect the car is good
DOWNWARD
Other annotations
• Speech events (not sourced from author)
– John thinks the car is good.
• Comparisons:
– Car X has a better engine than car Y.
– Handles a variety of cases
Outline
• Motivating example
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
Possible tasks
• Detecting mentions, sentiment expressions,
and modifiers
• Identifying targets of sentiment expressions,
modifiers
• Coreference resolution
• Finding part-of, feature-of, etc. relations
• Identifying errors/inconsistencies in data
Possible tasks
• Exploring how elements interact:
– Some idiot thinks this is a good car.
• Evaluating unsupervised sentiment systems or
those trained on other domains
• How do relations between entities transfer
sentiment?
– The car’s paint job is flawless but the safety record
is poor.
• Solution to one task may be useful in solving
another.
But wait, there’s more!
• 180 digital camera blog posts were annotated
• Total of 223,001 + 108,593 = 331,594 tokens
Outline
• Motivating example
– Elements combine to render entity-level
sentiment
• Overview of annotation types
– Some statistics
• Potential uses of corpus
• Comparison to other resources
Other resources
• MPQA Version 2.0
– Wiebe, Wilson and Cardie (2005)
– Largely professionally written news articles
– Subjective expression
• “beliefs, emotions, sentiments, speculations, etc.”
– Attitude, contextual sentiment on subjective
expressions
– Target, source annotations
– 226K tokens (JDPA: 332K)
Other resources
• Data sets provided by Bing Liu (2004, 2008)
– Customer-written consumer electronics product
reviews
– Contextual sentiment toward mention of product
– Comparison annotations
– 130K tokens (JDPA: 332K)
Thank you!
• Obtaining the corpus:
– Research and educational purposes
– [email protected]
– June 2010
– Annotation guidelines:
http://www.cs.indiana.edu/~jaskessl
• Thanks to: Prof. Michael Gasser, Prof. James
Martin, Prof. Martha Palmer, Prof. Michael
Mozer, William Headden
Top 20 annotations by type
Inter-annotator agreement