The JDPA Sentiment Corpus for the Automotive Domain Jason S. Kessler Miriam Eckert, Lyndsie Clark, Nicolas Nicolov Indiana University J.D. Power and Associates Overview • 335 blog posts containing opinions about cars – 223K tokens of blog data • Goal of annotation project: – Examples of how words interact to evaluate entities – Annotations encode these interactions • Entities are invoked physical objects and their properties – Not just cars, car parts – People, locations, organizations, times Excerpt from the corpus “last night was nice. sean bought me caribou and we went to my house to watch the baseball game … “… yesturday i helped me mom with brians house and then we went and looked at a kia spectra. it looked nice, but when we got up to it, i wasn't impressed ...” Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources John recently purchased a Honda Civic. PERSON CAR REFERS-TO It had a great engine, CAR a mildly disappointing stereo, CAR-PART CAR-PART REFERS-TO and was very grippy. He also considered a BMW PERSON which, while priced highly CAR-FEATURE CAR had a better stereo. CAR-PART John recently purchased a Honda Civic. PERSON CAR TARGET TARGET It had a great engine, a mildly disappointing stereo, CAR-PART CAR TARGET CAR-PART TARGET and was very grippy. He also considered a BMW PERSON CAR TARGET which, while priced highly CAR-FEATURE had a better stereo. CAR-PART John recently purchased a Honda Civic. PERSON CAR REFERS-TO It had a great engine, CAR PART-OF a mildly disappointing stereo, CAR-PART CAR-PART PART-OF REFERS-TO and was very grippy. He also considered a BMW PERSON CAR FEATURE-OF PART-OF which, while priced highly CAR-FEATURE had a better stereo. CAR-PART John recently purchased a Honda Civic. PERSON CAR It had a great engine, CAR a mildly disappointing stereo, CAR-PART CAR-PART LESS and was very grippy. He also considered a BMW PERSON CAR MORE which, while priced highly CAR-FEATURE had a better stereo. CAR-PART DIMENSION Entity-level sentiment: positive John recently purchased a Honda Civic. PERSON CAR TARGET TARGET TARGET REFERS-TO It had a great engine, CAR PART-OF a mildly disappointing stereo, CAR-PART CAR-PART PART-OF REFERS-TO TARGET Entity-level sentiment: mixed LESS and was very grippy. He also considered a BMW PERSON TARGET TARGET which, while priced highly CAR-FEATURE CAR FEATURE-OF MORE had a better stereo. CAR-PART DIMENSION Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources Entity annotations REFERS-TO John recently purchased a Civic. It had a great engine and was priced well. REFERS-TO John PERSON Civic CAR It engine priced CAR-PART CARFEATURE • >20 semantic types from • ACE Entity Mention Detection Task • Generic automotive types Entity-relation annotations Entity-level sentiment: Positive • Relations between entities • Entity-level sentiment annotations • Sentiment flow between entities through relations • My car has a great engine. • Honda, known for its high standards, made my car. Civic CAR PART-OF engine CARPART FEATUREOF priced CARFEATURE Entity annotation type: statistics • Inter-annotator agreement • Among mentions 83% • Refers-to: 68% • 61K mentions in corpus and 43K entities • 103 documents annotated by around 3 annotators MATCH A1: …Kia Rio… A2: …Kia Rio… NOT A MATCH A1: …Kia Rio… A2: …Kia Rio… Sentiment expressions • Evaluations • Target mentions • Prior polarity: • Semantic orientation given target • positive, negative, neutral, mixed … a great engine Prior polarity: positive highly priced Prior polarity: negative highly spec’ed Prior polarity: positive Sentiment expressions • Occurrences in corpus: 10K • 13% are multi-word • like no other, get up and go • • • • 49% are headed by adjectives 22% nouns (damage, good amount) 20% verbs (likes, upset) 5% adverbs (highly) Sentiment expressions • 75% of sentiment expression occurrences have non evaluative uses in corpus • “light” – …the car seemed too light to be safe… – …vehicles in the light truck category… • 77% sentiment expression occurrences are positive • Inter-annotator agreement: – 75% spans, 66% targets, 95% prior polarity Modifiers -> contextual polarity NEGATORS not a good car not a very good car NEUTRALIZERS if the car is good I hope the car is good INTENSIFIERS a very good car UPWARD a kind of good car DOWNARD COMMITTERS I am sure the car is good UPWARD I suspect the car is good DOWNWARD Other annotations • Speech events (not sourced from author) – John thinks the car is good. • Comparisons: – Car X has a better engine than car Y. – Handles a variety of cases Outline • Motivating example • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources Possible tasks • Detecting mentions, sentiment expressions, and modifiers • Identifying targets of sentiment expressions, modifiers • Coreference resolution • Finding part-of, feature-of, etc. relations • Identifying errors/inconsistencies in data Possible tasks • Exploring how elements interact: – Some idiot thinks this is a good car. • Evaluating unsupervised sentiment systems or those trained on other domains • How do relations between entities transfer sentiment? – The car’s paint job is flawless but the safety record is poor. • Solution to one task may be useful in solving another. But wait, there’s more! • 180 digital camera blog posts were annotated • Total of 223,001 + 108,593 = 331,594 tokens Outline • Motivating example – Elements combine to render entity-level sentiment • Overview of annotation types – Some statistics • Potential uses of corpus • Comparison to other resources Other resources • MPQA Version 2.0 – Wiebe, Wilson and Cardie (2005) – Largely professionally written news articles – Subjective expression • “beliefs, emotions, sentiments, speculations, etc.” – Attitude, contextual sentiment on subjective expressions – Target, source annotations – 226K tokens (JDPA: 332K) Other resources • Data sets provided by Bing Liu (2004, 2008) – Customer-written consumer electronics product reviews – Contextual sentiment toward mention of product – Comparison annotations – 130K tokens (JDPA: 332K) Thank you! • Obtaining the corpus: – Research and educational purposes – [email protected] – June 2010 – Annotation guidelines: http://www.cs.indiana.edu/~jaskessl • Thanks to: Prof. Michael Gasser, Prof. James Martin, Prof. Martha Palmer, Prof. Michael Mozer, William Headden Top 20 annotations by type Inter-annotator agreement
© Copyright 2026 Paperzz