Experiments on Multext-East English Distributional and

1
POS Induction with Distributional and Morphological Information
Using a Distance-Dependent Chinese Restaurant Process
2
Kairit Sirts1, Jacob Eisenstein2, Micha Elsner3, Sharon Goldwater4
IGMM with dd-CRP Prior
Clustering for POS Induction
We use a distance-dependent Chinese restaurant process [1] (dd-CRP) to
incorporate distributional and morphological information for POS
induction.
β€’ Distributional info modelled as Gaussian likelihood over word
embeddings.
Experiments on Multext-East English
Distance-dependent Chinese restaurant process:
Tagset:
dd-CRP is a variant of a regular CRP, where each customer follows another
customer with probability proportional to the distances between the
customers.
Fine-grained tags (104) are sequences of morphological features:
metallic
pedantic
𝑃 stepped β†’ played ∝ 𝑒
𝑃 table β†’ table ∝ 𝛼
β€’ Learn the suffix similarity function for the dd-CRP prior.
β€’ Tested on English, better performance than K-means and infinite
Gaussian mixture model (IGMM); state-of-the-art for fine-grained POS
induction.
stepped
played
𝑀 𝑇 𝑓(stepped,
Evaluation is type-based; this takes low-frequency words more into account.
played)
stepped played
pedantic
metallic
Afp
Vmis
table
Ncns
Three models using dd-CRP:
We combine the ddCRP prior with Gaussian likelihood and model distances
using the suffix feature function 𝑓(word1, word2).
β€’ Uniform distance function – almost identical to IGMM
metallic
pedantic
stepped
played
table
×
began
guarded
peaceful
played
began
pedantic
stepped
played
stepped
Inference:
We use a variant of Monte Carlo Expectation-Maximization:
-
-d
-ed
-c
-ic
-s
-es
…
1
1
1
0
0
0
0
…
pedantic
1
0
0
1
1
0
0
…
β€’ Learned distance function – log-linear feature model
β€’ Exponentiated – learned prior is emphasized by an exponent parameter
a=5
Model
K-means
IGMM
dd-CRP uniform
dd-CRP learned
dd-CRP exp
K
104
55.6
80.4
89.6
47.2
1-1
16.1
41.0
50.5
50.1
64.0
V-m
47.3
45.9
52.9
55.1
60.3
K-means
1-1 V-m
23.1 49.5
18.6 48.2
17.6 40.0
25.0 50.3
References
[1] D. M. Blei and P. I. Frazier. 2011. Distance dependent Chinese restaurant process.
Journal of Machine Learning Research
[2] R. Al-Rfou et al. Distributed word representations for multilingual nlp. In CoNLL’13
stepped
metallic
β€’ K-means – uses only word embeddings with K = gold number of clusters
Model:
guarded
Morphological information is represented using suffix-based feature vectors.
Baselines:
β€’ IGMM – uses word embeddings and learns the number of clusters
Word embeddings represent the contextual information with continuous
vectors trained using neural networks. We use the Polyglot vectors
(pretrained on Wikipedia) [2].
played
Vmis – Verb main indicative past (stepped, played)
table
Distributional and Morphological Data
pedantic
Afp – Adjective qualificative positive (metallic, pedantic)
Evaluation:
β€’ Learn the number of fine-grained morphosyntactic clusters.
peaceful
3
Ncns – Noun common neuter singular (table)
β€’ Morphological info modelled in prior using feature-based suffix
similarity function.
… and began copying …
… and began to …
…
… to peaceful sounds …
… on peaceful terms …
…
4
E-step
Resample follower
structure with Gibbs
sampling
M-step
Optimize feature
weights using current
follower structure
Acknowledgements
Supported by the Tiger University Program of the Estonian Information
Technology Foundation for Education for the first author.