1 POS Induction with Distributional and Morphological Information Using a Distance-Dependent Chinese Restaurant Process 2 Kairit Sirts1, Jacob Eisenstein2, Micha Elsner3, Sharon Goldwater4 IGMM with dd-CRP Prior Clustering for POS Induction We use a distance-dependent Chinese restaurant process [1] (dd-CRP) to incorporate distributional and morphological information for POS induction. β’ Distributional info modelled as Gaussian likelihood over word embeddings. Experiments on Multext-East English Distance-dependent Chinese restaurant process: Tagset: dd-CRP is a variant of a regular CRP, where each customer follows another customer with probability proportional to the distances between the customers. Fine-grained tags (104) are sequences of morphological features: metallic pedantic π stepped β played β π π table β table β πΌ β’ Learn the suffix similarity function for the dd-CRP prior. β’ Tested on English, better performance than K-means and infinite Gaussian mixture model (IGMM); state-of-the-art for fine-grained POS induction. stepped played π€ π π(stepped, Evaluation is type-based; this takes low-frequency words more into account. played) stepped played pedantic metallic Afp Vmis table Ncns Three models using dd-CRP: We combine the ddCRP prior with Gaussian likelihood and model distances using the suffix feature function π(word1, word2). β’ Uniform distance function β almost identical to IGMM metallic pedantic stepped played table × began guarded peaceful played began pedantic stepped played stepped Inference: We use a variant of Monte Carlo Expectation-Maximization: - -d -ed -c -ic -s -es β¦ 1 1 1 0 0 0 0 β¦ pedantic 1 0 0 1 1 0 0 β¦ β’ Learned distance function β log-linear feature model β’ Exponentiated β learned prior is emphasized by an exponent parameter a=5 Model K-means IGMM dd-CRP uniform dd-CRP learned dd-CRP exp K 104 55.6 80.4 89.6 47.2 1-1 16.1 41.0 50.5 50.1 64.0 V-m 47.3 45.9 52.9 55.1 60.3 K-means 1-1 V-m 23.1 49.5 18.6 48.2 17.6 40.0 25.0 50.3 References [1] D. M. Blei and P. I. Frazier. 2011. Distance dependent Chinese restaurant process. Journal of Machine Learning Research [2] R. Al-Rfou et al. Distributed word representations for multilingual nlp. In CoNLLβ13 stepped metallic β’ K-means β uses only word embeddings with K = gold number of clusters Model: guarded Morphological information is represented using suffix-based feature vectors. Baselines: β’ IGMM β uses word embeddings and learns the number of clusters Word embeddings represent the contextual information with continuous vectors trained using neural networks. We use the Polyglot vectors (pretrained on Wikipedia) [2]. played Vmis β Verb main indicative past (stepped, played) table Distributional and Morphological Data pedantic Afp β Adjective qualificative positive (metallic, pedantic) Evaluation: β’ Learn the number of fine-grained morphosyntactic clusters. peaceful 3 Ncns β Noun common neuter singular (table) β’ Morphological info modelled in prior using feature-based suffix similarity function. β¦ and began copying β¦ β¦ and began to β¦ β¦ β¦ to peaceful sounds β¦ β¦ on peaceful terms β¦ β¦ 4 E-step Resample follower structure with Gibbs sampling M-step Optimize feature weights using current follower structure Acknowledgements Supported by the Tiger University Program of the Estonian Information Technology Foundation for Education for the first author.
© Copyright 2026 Paperzz