Joint and pipeline probabilistic models for fine

December 7, 2013
Joint and Pipeline Probabilistic Models
for Fine-Grained Sentiment Analysis
Roman Klinger and Philipp Cimiano
Semantic Computing Group, CIT-EC, Bielefeld University, Germany
SENTIRE Workshop at IEEE ICDM, Dallas, TX, USA
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
What is this talk about?
The battery life of this camera is too short.
Aspect
Aspect
subjective
The battery life of this camera is too short.
Aspect
Roman Klinger and Philipp Cimiano
Aspect
subjective
2 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Outline
1
Introduction
2
Probabilistic model for subjective term and target identification
3
Evaluation of different pipeline orders
4
Joint Model
5
Evaluation of the Pipeline vs. Joint Model
6
Summary and Discussion
Roman Klinger and Philipp Cimiano
3 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Introduction
Sentiment Analysis/Opinion Mining
Often modelled as classification or segmentation task
Fine-Grained Opinion Mining:
Involves prediction of aspect/target, subjective terms, polarity,
relations
Our previous work: Developed model to analyze:
Given subjective phrases ⇒ impact on target prediction
Given targets ⇒ impact on subjective phrase prediction
Both with perfect and realistic prior knowledge
Contribution of this paper:
Present a flexible model which takes into account
inter-dependencies
Roman Klinger and Philipp Cimiano
4 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Previous and Related Work
Extracting subjective phrases:
B. Yang et al. (2012). ``Extracting opinion expressions with
semi-Markov conditional random fields''. In: EMNLP-CoNLL
Given perfect subjective phrases, predict targets:
N. Jakob et al. (2010). ``Extracting opinion targets in a single- and
cross-domain setting with conditional random fields''. In: EMNLP
ILP approach
B. Yang et al. (2013). ``Joint Inference for Fine-grained Opinion
Extraction''. In: ACL
.
Our work:
.
Real-world setting, predict all entities
Relational structure in multiple directions
Flexible, easy to augment
.
Roman Klinger and Philipp Cimiano
5 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Outline
1
Introduction
2
Probabilistic model for subjective term and target identification
3
Evaluation of different pipeline orders
4
Joint Model
5
Evaluation of the Pipeline vs. Joint Model
6
Summary and Discussion
Roman Klinger and Philipp Cimiano
6 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Factor Graphs
A Factor Graph is a bipartite graph over factors and variables
Factor Ψi computes a scalar
over all variables
Let ⃗x be observed variables, ⃗y
output variables
Common definition:
Ψi (⃗x(
yi ) =
i ,⃗
)
exp
∑
x1
Ψ1
~x = (x1 , x2 )T
Ψ2
x2
Ψ3
θki fki (⃗xi ,⃗yi )
k
(parameters θki and sufficient statistics fki (·))
y1
y2
y3
y4
T
Probability distribution:
~y = (y1 , y2 , y3 , y4 )
1 ∏
Ψi (⃗xi ,⃗yi )
p(⃗y|⃗x) =
Z(⃗x)
i
Roman Klinger and Philipp Cimiano
6 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Templates for Factor Graphs
Probability distribution
(
)
∑
1 ∏
p(⃗y|⃗x) =
exp
θki fki (⃗xi ,⃗yi )
Z(⃗x)
k
i
A Factor Template Tj consists of
parameters θjk and statistic functions fjk
some description of variables yielding tupels (⃗xj ,⃗yj )
Roman Klinger and Philipp Cimiano
7 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Templates for Factor Graphs
T1: same value and y1
y1
B
A
x1
A
x4
T2: different value and y1
B
x3
x2
Parameters θjk , feature functions fjk are shared across tupels
1 ∏
p(⃗y|⃗x) =
Z(⃗x)
∏
Tj ∈T (⃗xi ,⃗yi )∈Ti
(
exp
∑
)
θkj fkj (⃗xi ,⃗yi )
k
Examples for descriptions:
Markov Logic Networks (Richardson et al., 2006)
Imperatively defined factor graphs (McCallum et al., 2009)
Roman Klinger and Philipp Cimiano
8 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Variable Definition
Extraction of aspects and subjective phrases as segmentation
Application of a semi-Markov-like model
Implementation in FACTORIE (McCallum et al., 2009)
The battery life of this camera is too short.
Aspect
Roman Klinger and Philipp Cimiano
Aspect
subjective
9 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Templates
The battery life of this camera is too short.
Aspect
Aspect
subjective
Single-Span-Template
lower-case string, POS, and both
Combined with IOB-like-prefixes
Sequence of POS tags
Roman Klinger and Philipp Cimiano
10 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Templates
The battery life of this camera is too short.
Aspect
Aspect
subjective
Inter-Span-Template (partially inspired by Jakob et al., 2010)
Does the target span contain the noun that is closest to the
subjective phrase?
Are there spans of both types in the sentence?
Is there a one-edge dependency relation between
subjective phrase and target?
Single-Span features only if one of those holds!
Roman Klinger and Philipp Cimiano
11 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Learning and Inference
Inference: Metropolis Hastings sampling
(a Markov Chain Monte Carlo method)
Learning: Sample Rank (Wick et al., 2011)
.
Objective Function
.
f(t) = max
g∈s
o(t, g)
|g|
− α · p(t, g) ,
t is a span, g is a gold span
o(t, g) is length of overlap
.
p(t, g) number of 'outside' tokens
Roman Klinger and Philipp Cimiano
12 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Outline
1
Introduction
2
Probabilistic model for subjective term and target identification
3
Evaluation of different pipeline orders
4
Joint Model
5
Evaluation of the Pipeline vs. Joint Model
6
Summary and Discussion
Roman Klinger and Philipp Cimiano
13 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Cameras (Kessler et al., 2010)
Given subjective terms, how good is target prediction?
Predicting subjective terms, how good is target prediction?
Given target terms, how good is subjective prediction?
Predicting targets terms, how good is subjective prediction?
1
1.00
1.00
0.8
F1
0.6
0.4
0.53
0.48
0.61
0.44
0.54
0.65
0.65
0.60
0.58
1.00
0.71
0.65
0.50
0.32
0.2
0
pred. S.
→ T.
pred. T.
→ S.
Target-F1 Partial
Subjective-F1 Partial
Roman Klinger and Philipp Cimiano
Gold S.
→ T.
Gold T.
→ S.
Jakob 2010
Target-F1
Subjective-F1
13 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Cars (Kessler et al., 2010)
1
1.00
0.8
0.69
0.64
F1
0.6
0.4
0.51
0.66
0.56
1.00
1.00
0.74
0.62
0.70
0.55
0.43
0.50
0.43
0.33
0.2
0
pred. S.
→ T.
pred. T.
→ S.
Target-F1 Partial
Subjective-F1 Partial
Roman Klinger and Philipp Cimiano
Gold S.
→ T.
Gold T.
→ S.
Jakob 2010
Target-F1
Subjective-F1
14 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Twitter (Spina et al., 2012)
1
1.00
0.8
1.00
0.67
0.60
F1
0.6
0.4
0.2
0.42 0.40
0.26
0.22
0
pred. S.
→ T.
0.41
0.32
0.40
0.13
pred. T.
→ S.
Target-F1 Partial
Subjective-F1 Partial
Roman Klinger and Philipp Cimiano
0.35
0.28
Gold S.
→ T.
Gold T.
→ S.
Target-F1
Subjective-F1
15 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Outline
1
Introduction
2
Probabilistic model for subjective term and target identification
3
Evaluation of different pipeline orders
4
Joint Model
5
Evaluation of the Pipeline vs. Joint Model
6
Summary and Discussion
Roman Klinger and Philipp Cimiano
16 / 24
Technische Fakultät · AG Semantic Computing
Universität Bielefeld
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Idea for Joint Model
Model relation explicitly
target
The battery life of this camera is too short.
Aspect
Aspect
subjective
Sentence
Features in three templates
Single Span
Inter Span
Relation
(new: similar to inter span, but measuring another variable)
Roman Klinger and Philipp Cimiano
16 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Sampler
.
Pipeline
.
Propose spans, span changes
.
.
Joint
.
Propose adding relations for each aspect-subjective pair
Propose subjective phrases
Propose aspects as targets of each subjective phrase
.
Propose span changes, removing relations
Roman Klinger and Philipp Cimiano
17 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Objective functions
.
Pipeline
.
Spans as before (f(t))
.
Relations accuracy-based
.
Joint
.
Relation:

−1 if o(su, su∗ ) = 0 or o(ta, ta∗ ) = 0
h(su, ta) =
max
(su∗ ,ta∗ )∈rel∗  1
(o(su, su∗ ) + o(ta, ta∗ )) else
2
Span:
g(t) = β f(t) +
.
∑
h(su, ta)
(su,ta)∈rel(t)
Roman Klinger and Philipp Cimiano
18 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Pipeline vs. joint
Iteration
Pipeline Model
Training
0
Relations
1
Subjectives
2
3
Aspects
Templates
Subjectives
Aspects
Joint Model
Iteration
Training/Prediction
Prediction
Single span
Subjectives
Relation
Aspects
Inter span
Relations
1
Relations
Roman Klinger and Philipp Cimiano
19 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Results
Camera
1
F1
0.8
0.6
0.4
0.2
0
Pipeline
Joint
Car
1
Aspect Partial
Subjective Partial
Relation Partial
Aspect
Subjective
Relation
F1
0.8
0.6
0.4
0.2
0
Pipeline
Roman Klinger and Philipp Cimiano
Joint
20 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Summary
Joint Modelling has positive impact
Clearly observable for aspects
Slight to moderate drop for subjective phrase and relation
Easy do adapt to other characteristics
(opinion holder, polarity, dependencies, etc.)
Roman Klinger and Philipp Cimiano
21 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Bibliography I
Jakob, N. et al. (2010). ``Extracting opinion targets in a single- and
cross-domain setting with conditional random fields''. In: EMNLP.
Kessler, J. S. et al. (2010). ``The 2010 ICWSM JDPA Sentment Corpus for the
Automotive Domain''. In: ICWSM-DWC 2010.
McCallum, A. et al. (2009). ``FACTORIE: Probabilistic Programming via
Imperatively Defined Factor Graphs''. In: NIPS.
Richardson, M. et al. (2006). ``Markov logic networks''. In: Machine
Learning 62.1-2, pp. 107–136. ISSN: 0885-6125.
Spina, D. et al. (2012). ``A Corpus for Entity Profiling in Microblog Posts''.
In: LREC Workshop on Information Access Technologies for Online
Reputation Management.
Roman Klinger and Philipp Cimiano
22 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Introduction . . Probabilistic Model . . . . . . . Evaluations 1 . . . Joint Model . . . . Evaluations 2 . Summary .
Bibliography II
Wick, M. et al. (2011). ``SampleRank: Training factor graphs with atomic
gradients''. In: ICML.
Yang, B. et al. (2012). ``Extracting opinion expressions with semi-Markov
conditional random fields''. In: EMNLP-CoNLL.
— (2013). ``Joint Inference for Fine-grained Opinion Extraction''. In: ACL.
Roman Klinger and Philipp Cimiano
23 / 24
Universität Bielefeld
Technische Fakultät · AG Semantic Computing
Thank you!