Opinion Observer: Analyzing and Comparing Opinions on the Web

Opinion Observer: Analyzing and
Comparing Opinions on the Web
Authors
Bing Liu, Minqing Hu, Junsheng Cheng
Paper Presentation: Asif Salekin
Introduction
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Web: excellent source of consumer opinions
• Useful information to customers and
product manufacturers
• Opinion
observer
Technical Tasks
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Identify product features
• For each feature, identify whether the
opinion is positive or negative
• Review Format –
-Pros
-Cons
-Detailed
review
• The paper proposes a technique to identify
product features from pros and cons in this
format
Problem Statement
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
•
•
•
•
Set of products P={P1,P2 … Pn}
Set of reviews Ri for Pi={r1,r2 … rk}
rj={sj1,sj2 …sjm} : sequence of sentenses
A product feature f in rj is an attribute of
the product that has been commented in rj
• If f appears in rj, explicit feature
– “The battery life of this camera is too short”
• If f does not appear in rj but is implied,
implicit feature
– “This camera is too large” (size)
Problem Statement
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Opinion segment of a feature
– Set of consecutive sentences that expresses a
positive or negative opinion on f
– “The picture quality is good, but the battery life is
short”
• Positive opinion set of a feature (Pset)
– Set of opinion segments of f that expresses
positive opinions about f from all the reviews of
the product
– Nset can be defined similarly
Problem Statement
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Observation: Each sentence segment contains at
most one product feature. Sentence segments are
separated by : , . ; and but
– Pros:
Cons:
Prepare a Training Dataset
Introduction
Technical Tasks
Problem
Statement
“Battery usage; included 16MB is stingy”
• Perform Part-Of-Speech (POS) tagging and remove
digits
– “<N> Battery <N> usage”
– “<V> included <N> MB <V>is <Adj> stingy”
Prepare a
Training Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Replace feature words with [feature].
– “<N> [feature] <N> usage”
– “<V> included <N> [feature] <V> is <Adj> stingy”
• Use 3-gram to produce shorter segments
•
“<V> included <N> [feature] <V> is <Adj> stingy”
→ “<Adj> included <N> [feature] <V> is”
“<N> [feature] <V> is <Adj> stingy”
• Distinguish duplicate tags
– “<N1> [feature] <N2> usage”
• Perform word stemming
Associated Rule Mining
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Association Rule Mining model
•
•
•
•
•
I = {i1, …, in} : a set of items.
- I={milk,bread, butter, beer}
D : a set of transactions. Each transaction
consists of a subset of items in I.
Association rule:
X → Y, where X ⊂ I, Y ⊂ I, and X ∩Y = ∅
- Rule: {butter, bread} -> {milk}
The rule has support s in D if s% of transactions in D contain X ∪ Y.
- Support: 1/5 =.2 since, X ∪ Y occurs in only
The rule X → Y holds in D with confidence c if c% of transactions
in D that support X also support Y.
- Confidence: 0.2/0.2=1.0
for 100% of the transactions containing butter and bread , also
contain milk
Associated Rule Mining
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• The resulting sentence (3-gram) segments using
human labeling are saved in a transaction file D.
• Association rule mining finds all rules in the
database that satisfy some minimum support and
minimum confidence constraints.
• Use the association mining system CBA (Liu, B., Hsu,
W., Ma, Y. 1998) to mine rules.
• Use 1% as the minimum support.
• No minimum confident used
• Some example rules:
– <N1>, <N2> → [feature]
– <V>, <N> → [feature]
– <N1> → [feature], <N2>
– <N1>, [feature] → <N2>
Post-processing
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Rules:
<N1>, <N2> → [feature]
<V>, <N> → [feature]
<N1> → [feature], <N2>
<N1>, [feature] → <N2>
• Step 1: We only need rules that have [feature] on
the RHS.
– Need only rule 1, rule 2
• Step 2:We need to consider the sequence of items
in the LHS.
– e.g., “<V>, <N> → [feature]” can have variation like:
“<N>, <V> → [feature]”
– Checking each rule against the transaction file to find
the possible sequences.
– Remove those derived rules with confidence < 50%.
Post-processing
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Rules:
<N1>, <N2> → [feature]
<N>, <V> → [feature]
• Step 3: Generate language patterns.
– changed to the language patterns according to the
ordering of the items in the rules from step 2 and the
feature location
<N1> [feature] <N2>
<N> <V> [feature]
Extraction of Product Features
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Do POS tagging on new reviews
• resulting patterns are used to match and
identify candidate features
• Allow gaps for pattern matching
- <N1> [feature] <N2> can match with
- “Animals like kind people” gap like F: kind
• If a sentence segment satisfies multiple patterns
- Choose the pattern with highest confidence.
• If no pattern applies,
- use nouns or noun phrases as features.
• If a sentence segment has only a single word,
e.g., “heavy” and “big
- use that word as feature
Feature Refinement
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Two main mistakes made during extraction:
– Feature conflict: Two or more feature in
one sentence segment
– There is a more likely feature in the
sentence segment but not extracted by
any pattern.
• e.g., “slight noise from speaker when not in
use”
“noise” is found to be the feature but not
“speaker”.
– How to find this? “speaker” was found as
candidate features in other reviews, but
“noise” was never.
Feature Refinement
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Frequent-noun
• The generated product features together
with their frequency counts are saved in a
candidate feature list.
• For each sentence segment, if there are two
or more nouns, we choose the most frequent
noun in the candidate feature list.
Frequent-term
• For each sentence segment, we simply
choose the word/phrase (it does not need to
be a noun) with the highest frequency in the
candidate feature list.
Mapping to Implicit Features
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• In tagging the training data for mining
rules, we also tag the mapping of
candidate features to their actual
features.
• “<V> included <N> MB <V>is <Adj> stingy”
Here, MB was tagged as feature. Now Map it to
Memory.
Grouping Synonyms
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Grouping features with similar meanings.
- e.g., “photo”, “picture” and “image” all
refers to the same feature in digital camera
reviews.
• Employ WordNet to check if any synonym
groups/sets exist among the features.
• Choose only the top two frequent senses of
a word for finding its synonyms.
Experiments
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
• Training and test review data
– Manually tagged a large collection of reviews of
15 electronic products from epinions.com.
– 10 of them are used as the training data to mine
patterns, and the rest are used as testing.
• Evaluation measure
– recall (r) and precision (p)
n : the total number of reviews of a particular product.
ECi : the number of extracted features from review i that are correct.
Ci : the number of actual features in review i.
Ei : the number of extracted features from review i.
Experiments
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Experiments
Introduction
Technical Tasks
Problem
Statement
Prepare a Training
Dataset
Associated Rule
Mining
Post-processing
Extraction of
Product Features
Feature
Refinement
Mapping to
Implicit Features
Grouping
Synonyms
Experiments
Observations:
• The frequent-term strategy gives better results
than the frequent-noun strategy.
• some features are not expressed as
nouns.POS tagger makes mistakes.
• The results for Pros are better than those for Cons.
– people tend to use similar words like ‘excellent’,
‘great’, ‘good’ in Pros. In contrast, the words that
people use to complain differ a lot in Cons
Thank you
&
Questions?