Chinese Journal of Electronics Vol.23, No.1, Jan. 2014 A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing∗ YUAN Lichi1,2 (1.School of Information Technology, Jiangxi University of Finance and Economics, Nanchang 330013, China) (2.Jiangxi Key Laboratory of Data and Knowledge Engineering, Jiangxi University of Finance and Economics, Nanchang 330013, China) Abstract — Part-Of-Speech tagging is a basic task in the field of natural language processing. This paper builds a POS tagger based on improved Hidden Markov model, by employing word clustering and syntactic parsing model. Firstly, In order to overcome the defects of the classical HMM, Markov family model (MFM), a new statistical model was introduced. Secondly, to solve the problem of data sparseness, we propose a bottom-to-up hierarchical word clustering algorithm. Then we combine syntactic parsing with part-of-speech tagging. The Part-ofSpeech tagging experiments show that the improved PartOf-Speech tagging model has higher performance than Hidden Markov models (HMMs) under the same testing conditions, the precision is enhanced from 94.642% to 97.235%. Key words — Part-of-speech tagging, Hidden Markov model, Markov family model, Word clustering. I. Introduction Tagging words with their correct part-of-speech (singular proper noun, predeterminer, etc.) is an important precursor to further automatic natural language processing. Part-of-speech tagging is used as an early stage of linguistic text analysis in many applications, including subcategorization acquisition, text-to-speech synthesis, and corpus indexing. Two prominent distinct approaches to be found in previous work are rulebased morphological analysis on the one hand, and stochastic model[1−13] such as Hidden Markov models[1] (HMMs) on the other hand. Rule-based morphological analysis relies on hand-crafted rules to decompose input tokens into their morphological components, computing the resultant lexical category as a function of those components. Such systems incorporate the linguistic competence of their human authors, to the extent that such competence can be and is expressed in the systems’ rule sets. Unfortunately, the construction of hand-crafted rule set for unrestricted input tokens of a given language is a time-consuming and labor-intensive task. Another common problem for token- wise rule-based approaches is that of ambiguity-in order to determine which of multiple possible analyses of a single token is the correct one, some reference in the context in which the token occurs is usually required. The recent top performing statistic-based approaches to Chinese POS tagging include Hidden Markov Models[1] , Maximum Entropy approaches[2] , and Condition Random Fields learning[3] . Stochastic tagging techniques such as Hidden Markov Models rely on both lexical and bigram probabilities estimated from a tagged training corpus in order to computer the most likely PoS tag sequence for each sequence of input tokens. The existence of hand-tagged training corpora for many languages and the robustness of the resulting models have made stochastic taggers quite popular. Disadvantages for HMM taggers include the large amount of training data required to achieve high levels of accuracy, as well as the fact that no clear allowance is made in traditional HMM tagging architectures for prior linguistic knowledge. The Hidden Markov Models used for tagging have three assumptions[1] : (1) limited horizon, (2) time invariant (stationary), (3) simplifying assumption: probability of a word depends only on its own tag, but these assumptions (especially the third assumption) are too crude. In this paper, Markov Family model[14] , a new statistical model was introduced. Under the assumption that the probability of a word depends both on its own tag and previous word, but its own tag and previous word are independent if the word is known, we simplify the Markov Family model and use for part-ofspeech tagging successfully. Experimental results show that this part-of-speech tagging method based on Markov Family model has greatly improved the precision comparing the conventional POS tagging method based on Hidden Markov model under the same testing conditions. The rest of the paper is organized as follows: Section II introduces the baseline POS tagging model based on Hidden Markov models and purposes a POS tagging method based on improved Hidden Markov model. A clustering algorithm will then be introduced in Section III, and Section III explores the ∗ Manuscript Received Apr. 2012; Accepted May 2013. This work is supported by the National Natural Science Foundation of China (No.61262035); the Science and Technology Foundation of Education Department of Jiangxi Province, China (No.GJJ12271, No.GJJ12742); the Natural Science Foundation of Jiangxi Province, China (No.20122BAB201033). Chinese Journal of Electronics 110 notion that automatically built tagger performance can be further improved by smoothing when the labeled training corpus is limited. Section IV further improves tagger performance by combining syntactic parsing with part-of-speech tagging. 2014 can find the sequence of tags t1,n = t1 , · · · , tn that maximizes the probability of the tag sequence given the word sequence w1,n = w1 , · · · , wn . argmax P (t1,n |w1,n ) = argmax t1,n II. The POS Tagging Model Based on Improved HMM t1,n t1,n t1,n t1,n P (w1,n |t1,n )P (t1,n ) = n P (w1,n |t1,n ) =P (wn |w1 , · · · , wn−1 , t1 , · · · , tn−1 , tn ) · P (w1,n−1 |t1,n−1 ) P (w1,n |t1,n ) = P (wn |wn−1 , tn ) · P (w1,n−1 |t1,n−1 ) P (wn−1 , tn |wn ) · P (wn ) P (wn−1 , tn ) P (tn |wn ) · P (wn−1 |wn ) · P (wn ) = P (tn |wn−1 ) · P (wn−1 ) P (tn |wn ) · P (wn |wn−1 ) = P (tn |wn−1 ) P (wn |wn−1 , tn ) = i=1 × P (tn−1 |tn−2 ) × · · · × P (t2 |t1 ) n P (wi |ti ) × P (ti |ti−1 ) (2) = argmax P (t1,n |w1,n ) t1,n = argmaxP (w1 |t1 ) · P (t1 ) P (wi−1 , ti |wi ) = P (wi−1 |wi ) · P (ti |wi ) (3) For simplicity, also suppose that word sequence {wi }i≥1 and tag sequence {ti }i≥1 are all 2-order Markov chain, thus n P (ti |wi ) · P (ti |ti−1 ) P (ti |wi−1 ) i=2 (8) Once have a probabilistic model, the next challenge is to find an effective algorithm for finding the maximum probability tag sequence given an input. The Viterbi Algorithm[1] is a dynamic programming method which efficiently computers for a given word sequence w1 , · · · , wn most likely to generate the tag sequence t1 , · · · , tn according to the model parameters. The computer proceeds as follows (see Fig.1). i=1 (We define p(t1 |t0 ) = 1.0 to simplify our notation.) 2. The POS tagging model based on improved HMM A major unrealistic assumption with HMM tagging model is that successive words (observations) are independent and identical distribution within a tag (state). Under the assumption that the probability of a word depends both on its own tag and previous word, but its own tag and previous word are independent if the word is known, Markov Family model has been successfully applied to Part-of-speech tagging. Let S1 be the finite set of Part-of-Speech tags, S2 be the finite set of words. Suppose word sequence {wi }i≥1 and tag sequence {ti }i≥1 are Markov chain of Markov Family model, thus a word’s tag and its previous word are independent if the word is known: (7) So t1,n × P (tn−1 |t1,n−2 ) × · · · × P (t2 |t1 ) n P (wi |ti ) × P (tn |tn−1 ) = (6) From the Eq.(3), can get P (wi |t1,n ) × P (tn |t1,n−1 ) i=1 (5) According to the properties of Markov family model, have: (1) We now introduce this expression to parameters that can be estimated from the training corpus. In addition to the Limited Horizon assumption, we make two assumptions about words: words are independent of each other, and a word’s identity only depends on its tag. (4) where P (w1,n |t1,n )P (t1,n ) P (w1,n ) = argmax P (w1,n |t1,n )P (t1,n ) P (w1,n |t1,n )P (t1,n ) P (w1,n ) = argmax P (w1,n |t1,n )P (t1,n ) 1. The POS tagging model based on HMM For a tagset (T) and a finite set of word (W), it is customary to define a bigram HMM part-of-speech tagger (T, A, W, B, π), where the probability functions A, B, and π are estimated from a tagged training corpus. Under such a model, part-of-speech tags are represented as states of the model, and the task of finding the most likely tag sequence t1,n for an input word sequence w1,n can be formulated as a search for the most likely sequence of HMM states given the observation sequence w1,n : argmax P (t1,n |w1,n ) = argmax t1,n comment: Given: a sentence of length n, the number of the tag set is T . comment: initialization δ1 (tj ) = P (w1 |tj ) · P (tj ), 1 ≤ j ≤ T Ψ (tj ) = 0 1 ≤ j ≤ T comment: Induction for i := 1 to n − 1 step 1 do for all tags tj do δi+1 (tj ) = max [δi (tj ) × P (tj |wi+1 ) × P (tj |tk )/P (tj |wi )] 1≤k≤T Ψi+1 (tj ) = argmax[δi (tj ) × P (tj |wi+1 ) × P (tj |tk )/P (tj |wi )] 1≤j≤T end 11 end 12 comment: Termination and path-readout, X1 , · · · , Xn are the tags choose for words w1 , · · · , wn 13 Xn = argmax δn (j) 14 15 16 17 1≤j≤T for j := n − 1 to 1 step −1 do Xj = Ψj+1 (Xj+1 ) end P (X1 , · · · , Xn ) = argmax δn (j) 1≤j≤T Fig. 1. Algorithm for tagging A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing III. The POS Tagging Model Employing Word Clustering 1. Clustering algorithm based on word similarity Word clustering[15−19] is an important fundamental work of automatic language processing. A large amount of previous research has focused on how to best cluster similar words together. There are many different clustering algorithms, but they can be classified into a few basic types. There are two types of structures produced by clustering algorithms, hierarchical clustering and flat or non-hierarchical clustering. Flat clustering simply consists of a certain number of clusters and the relation between clusters is often undetermined. Most algorithms produce flat clusters and improve them by iterating a reallocation operation that reassigns objects. The tree of a hierarchical clustering can be produced either bottom-up, by starting with the individual objects and grouping the most similar ones, or top-down, whereby one starts with all the objects and divides them into groups so as to maximize withingroup similarity. Conventional statistical clustering methods usually base on greedy principle. The common Metric for evaluating a clustering algorithm is the likelihood function or perplexity of the corpus. Conventional clustering algorithms often converge to a local optimum, so global optimum is not guaranteed, and initial choices can influence final result. In order to solve above problems, a definition of word similarity by utilizing mutual information was presented. Based on word similarity, the definition of word set similarity was given. Experiments[20] show that word clustering algorithm based on similarity is better than conventional greedy clustering method in speed and performance. Unlike above method, the base that we cluster words is the similarity[20] between words. First, we must find an appropriate word similarity metric. The common corpus-based approach for computing word similarity is based on representing a word (or term) by the set of its word co-occurrence statistics. It relies on the assumption that the meaning of words is related to their patterns of co-occurrence with other words in the text. This assumption was proposed in early linguistic work, as expressed in Harris’ distributional hypothesis: “· · · the meaning of entities, and the meaning of grammatical relations among them, is related to the restriction of combinations of these entities relative to other entities”. The famous statement “You shall know a word by the company it keeps!” is another expression of this assumption. Assume that the two words w1 and w2 are similar, and then we may infer that they have similar mutual information with some other words. Now we define the similarity between words w1 and w2 as follow: sim(w1 , w2 ) = P (w)[min(I(w, w1 ), I(w, w2 )) + min(I(w1 , w), I(w2 , w))] w P (w)[max(I(w, w1 ), I(w, w2 )) + max(I(w1 , w), I(w2 , w))] w (9) where I(wi , wj ) represents the point wise mutual information 111 between the two words wi and wj . I(wi , wj ) = log p(wi , wj ) p(wi )p(wj ) (10) where p(wi ) and p(wj ) are the probabilities of the events w1 and w2 (occurrences of words, in our case) and p(wi , wj ) is the probability of the joint event (a concurrence pair). Furthermore, we define the left similarity and right similarity between words w1 and w2 as: P (w) min(I(w1 , w), I(w2 , w)) w simL (w1 , w2 ) = P (w) max(I(w1 , w), I(w2 , w))] (11) P (w) min(I(w, w1 ), I(w, w2 )) w w simR (w1 , w2 ) = w P (w) max(I(w, w1 ), I(w, w2 ))] (12) Based on word similarity, the similarity between clusters C1 and C2 may be defined as: C(wi )C(wj )sim(wi , wj ) sim(C1 , C2 ) = wi ∈C1 ,w∈C2 w∈C1 C(wi ) (13) C(wj ) wj ∈C2 where C(wi ), C(wj ) represent the number of the words wi and wj occur in the corpus. The left similarity and right similarity between clusters may be defined similarly. The clustering algorithm is as follow: (a) Compute similarity between words. (b) Begin with N (N is the number of words in lexicon) clusters, one for each word. (c) Select the two clusters which have the biggest similarity, and create a new cluster by merging the two clusters together. (d) Computer the similarity between the new cluster and other cluster. (e) Check if the termination condition (the value of the biggest similarity between clusters is less than a predetermined threshold, or the desired number of clusters is reached) is meet, if yes, the program is terminated; or go to (c). 2. The POS Tagging model Employing Word Clustering To solve the problem of sparse data, the parameters of Eq.(8) can be estimated by smoothing method, such as: P (ti |wi ) = λCi P (ti |wi ) + (1 − λCi )P (ti |Ci ) (14) where Ci represents the cluster that wi belongs to, λCi , 0 < λCi < 1 is a smoothing parameter. IV. The POS Tagging Model Combined with Syntactic Parsing Traditionally, in Chinese language processing, word segmentation, POS tagging and syntactic parsing are implemented sequentially. That is, the input Chinese sentence is segmented into words first, secondly the segmented result is passed to POS tagging component, then syntactic parsing is proceeded on the base of POS tagging. It can be seen that Chinese Journal of Electronics 112 Part-of-Speech is an indispensable feature in syntactic parsing. However, this processing strategy has following disadvantages: (1) The word lexicons used in POS tagging and syntactic parsing may be different, this difference is difficult to overcome and largely drops the system accuracy although different optimal algorithms may be applied to POS tagging and syntactic parsing; (2) With speed in consideration, the two-stage processing strategy is not efficient. Therefore, we apply the strategy of integrating POS tagging and syntactic parsing in a single stage. Head-driven statistical models for natural language parsing[21] are the most representative lexicalized syntactic parsing models. To take advantage of the role of lexicalized information, each PCFG rule in Head-driven statistical models can be lexicalized by associating a word w and a Part-of-speech (POS) tag t with each nonterminal X in the tree. Due to the introduction of lexicalized information, the serious problem of sparse data is unavoidable. To relieve the data sparseness problem, the right of each rule in Head-driven statistical models is broken up into three parts: head-child; left and right modifiers of head-child. The first thing to note is that each internal rule in a lexicalized PCFG has the form: P (ht, hw) →Lm (ltm , lwm ) · · · L1 (lt1 , lw1 )H(ht, hw) R1 (rt1 , rw1 ) · · · Rn (rtn , rwn ) m+1 · (19) Eq.(18) is the probability about Part-of-Speech tagging in syntactic parsing. Again supposes rwi , rti−1 , rti−2 , · · · , rt1 are conditional independent with respect to rti , then we have: Pi (rti |rti−1 , rti−2 , · · · , rt1 , rwi ) = Pi (rti |rti−1 , rti−2 , · · · , rt1 ) · Pi (rti |rwi ) Pi (rti ) (20) The probability in formula (20) Pi (rti |rti−1 , rti−2 , · · · , rt1 ) Pi (rti ) Pi (rti , rti−1 , rti−2 , · · · , rt1 ) = Pi (rti ) · Pi (rti−1 , rti−2 , · · · , rt1 ) (21) is the point wise mutual information between rti , rti−1 , rti−2 , · · · , rt1 , so the probability meaning of Eq.(20) is very clear, and is also agree with language phenomenon. The probability Pi (rti |rti−1 , rti−2 , · · · , rt1 ) in Eq.(20) can be solved by introducing the Part-of-speech tagging model based on the Part-of-Speech collocation relationship between adjacent words. V. Experimental Results 1. Word clustering experiments (1) Corpus We use some annotated corpora selected from People’s Daily newspaper 1998 for training and testing. Firstly, to compare the greedy clustering method based on minimum entropy and the hierarchical clustering method based on word similarity[20] , we take a 2M corpus for testing and some corpora for comparing the run time of two clustering method. (2) Word clustering experimental results The experiment results are demonstrated in Table 1. Pi (Li (lti , lwi )|H, P, h, Δl (i − 1)) i=1 n+1 Pi (rwi |H, P, h, Δr (i − 1)) (15) where H(ht, hw) is the head-child of the rule, which inherits the headword/tag pair h from its parent P . Lm (ltm , lwm ) · · · L1 (lt1 , lw1 ) and R1 (rt1 , rw1 ) · · · Rn (rtn , rwn ) are left and right modifiers of H(ht, hw). Either n or m may be zero, and n = m = 0 for unary rules. We will extend the left and right sequences to include a terminating STOP symbol, allowing a Markov process to model the left and right sequences. Thus Lm+1 = Rn+1 = ST OP . The probability of an internal rule Eq.(15) can be rewritten (exactly) using the chain rule of probabilities: Ph (H|P (ht, hw)) · 2014 Pi (Ri (rti , rwi )|H, P, h, Δr (i − 1)) (16) i=1 Here Δl (i − 1) and Δr (i − 1) are functions of the surface string below the previous modifiers. To Integrate Part-of-speech Tagging with syntactic parsing, the probability of an internal rule Eq.(15) is modified as follow: Table 1. Clustering experiment results of two algorithms Clustering Greedy Algorithm based algorithm algorithm on similarity Perplexity 283 218 Accuracy rate 91.54% 95.38% From Table 1, it can be seen that the perplexity is reduced from 283 to 218, and word clustering algorithm based on similarity has better performance than conventional greedy m+1 clustering method. Pi (Li (lti , lwi )|Li−1 (lti−1 , lwi−1 ), · · · , Ph (H|(ht, hw) · As shown in Fig.2, the run time of greedy clustering i=1 method increases sharply as the size of corpus increases, and L1 (lt1 , lw1 ), H, P, h) the efficiency of hierarchical clustering method based on word n+1 similarity is obviously higher than the efficiency of greedy clus Pi (Ri (rti , rwi )|Ri−1 (rti−1 , rwi−1 ), · · · , · tering method. With the testing size becoming larger, the imi=1 pact of word clustering of hierarchical clustering method based (17) R1 (rt1 .rw1 ), H, P, h) on word similarity appears more obvious. 2. POS tagging experiments The probability in Eq.(17) We used an annotated corpus selected from People’s Daily Pi (Ri (rti , rwi )|Ri−1 (rti−1 , rwi−1 ), · · · , R1 (rt1 .rw1 ), H, P, h) newspaper 1998 for training and testing. The corpus uses 42 may be decomposed the product of two probabilities: tags, and has about 244974 tokens. Some properties about the Pi (rti |rti−1 , rti−2 , · · · , rt1 , rwi ) (18) annotated corpus are shown in Table 2. A Part-of-speech Tagging Model Employing Word Clustering and Syntactic Parsing 113 Fig. 3. The relation between training test and tagging errors Fig. 2. Run time comparison of two clustering algorithms VI. Conclusions Table tags 1 2 3 4 5 6 7 42 2. Some properties of the annotated corpus types cent(%) tokens cent(%) 20048 89.720 162246 66.230 1934 8.655 50243 20.510 297 1.329 21419 8.743 51 0.228 9901 4.042 10 0.045 424 0.173 4 0.018 155 0.173 1 0.004 586 0.239 22345 244974 total The baseline system is the Part-of-Speech tagging model based on Hidden Markov model; model 1 is the POS tagging model based on improved HMM; model 2 is the POS tagging model Integrating syntactic parsing, based on improved HMM; model 3 is the POS tagging model Integrating syntactic parsing and word clustering, based on improved HMM. The experimental results are demonstrated in Table 3. Table 3. Tagging experimental results model baseline Model 1 Model 2 Model 3 accuracy 94.642% 96.214% 97.126% 97.235% From Table 3, it can be seen that the tagging method based on improved HMM has obviously higher performance than the conventional POS tagging method based on Hidden Markov model under the same testing conditions; the precision is enhanced from 94.642% to 96.214%. The reason for this improvement may be that the assumption of conditional independence in improved HMM is more realistic than the assumption of independence in HMM. In addition, model 3 has the highest accuracy among the four Part-of-speech tagging models. Next, we analyzed the relation between training corpus and tagging accuracy. As shown in Fig.3, the relation between the scale of training corpus and the enhancement of tagging accuracy is not linear. In general, the scale of training corpus is bigger, the obtained probability parameter will be closer to the true language phenomenon, the tagging accuracy will also enhance. But when the scale of training corpus reaches some degree, the enhancement amplitude of tagging accuracy becomes smaller, and system performance improves more and more slow. With the training size becoming smaller, the impact of word clustering appears more obvious, avoiding a noticeable decrease in tagging accuracy. (1) The advent of Hidden Markov model (HMM) has brought about a considerable progress in natural language processing and speech recognition technology. However a number of unrealistic assumptions with HMMs are still regarded as obstacles for its potential effectiveness. A major one is the inherent assumption that successive observations are independent and Identical distribution (IID) within a state. In order to overcome the defects of the classical HMM, Markov Family model, a new statistical model is introduced in this paper and it overcomes the defects of unrealistic assumptions about HMM. Tagging experimental results have verified the efficacy of the proposed model, the precision is enhanced from 94.642% to 96.214%. (2) Traditionally, in Chinese language processing, word segmentation, POS tagging and syntactic parsing are implemented sequentially. In this paper, we apply the strategy of integrating POS tagging and syntactic parsing in a single stage. By decomposing and modifying the rules of Head-driven statistical models for natural language parsing, the POS Tagging model Integrating syntactic parsing, based on improved HMM, further enhances the precision from 96.214% to 97.126%. (3) Cluster-based statistic language model is an important method to solve the problem of sparse data. At least for the Chinese POS tagging task, using the word clustering can reduce the influence of the shortage of the labeled corpus. References [1] Christopher D. Manning and Hinrich Schutze, Foundations of Statistical Natural Language Processing, London, the MIT Press, pp.136–157, 1999. [2] K. Toutanova, D. Klein, C.D. Manning, Y. Singer, “FeatureRich Part-of-Speech Tagging with a Cyclic Dependency Network”, Proceedings of the 2003 Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics, Edmonton, Canada, pp.252– 259, 2003. [3] W. Jiang, Y. Guan, X.L. Wang, “Conditional Random Fields Based POS Tagging”, Computer Engineering and Applications, Vol.42, No.21, pp.13–16, 2006. [4] Jiang Tao, Yao Tianshun, Zhang Li, “Application Study of Example Based Chinese Word Segmentation and Part-of-speech Tagging Method”, Journal of Chinese Computer Systems, Vol.28, No.11, pp.2090–2093, 2007. (in Chinese) [5] Eugene Charniak, Curtis Hendricson, “Neil Jacobson, and Mike Perkowitz. Equations for Part-of-Speech tagging”, Proceedings 114 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] Chinese Journal of Electronics of the Eleventh National Conference on Artificial Intelligence, Menlo Park, AAAI Press/MIT Press, pp.784–789, 1993. T. Brants, “A statistical Part-of-Speech tagger”, Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP-2000), Seattle, pp.224–231, 2000. Wei Ou, Wu Jian, Sun Yufang, “Analysis and Improvement of Statistics-Based Chinese Part-of-Speech Tagging”, Journal of Software, Vol.11, No.4, pp.473–480, 2000. (in Chinese) Liang Yimin, Huang De-gen, “Chinese Part-of-speech Tagging Based on Full Second-order Hidden Markov Model”, Computer Engineering, Vol.31, No.10, pp.177–179, 2005. (in Chinese) Qu Gang, Lu Ru-zhan, “An Improved Part-of-Speech (POS) Tagging System”, Journal of Shanghai Jiaotong University, Vol.37, No.6, pp.897–900, 2003. (in Chinese) J. Gimenez, L. Marquez, “Fast and accurate part-of-speech tagging: The SVM approach revisited”, Proceedings of the International Conference on Recent Advances in Natural Language Processing (4th RANL P), Bulgaria, pp.158–165, 2003. Zhao Yan, Wang Xiao-long, Liu Bing-quan, Guan Yi, “Fusion of Clustering Trigger-Pair Features for POS Tagging Based on Maximum Entropy Model”, Journal of Computer Research and Development, Vol.43, No.2, pp.268–274, 2006. (in Chinese) Xing Fu-kun, Song Rou, Luo Zhi-yong, “Symbol-and-Statistics Decoding Model and Its Application in Chinese POS Tagging”, Journal of Chinese Information Processing, Vol.24, No.1, pp.20–24, 2010. (in Chinese) Zhu Cong-hui, Zhao Tie-jun, Zheng De-quan, “Joint Chinese Word Segmentation and POS Tagging System with Undirected Graphical Models”, Journal of Electronics & Information Technology, Vol.32, No.3, pp.700–704, 2010. (in Chinese) Yuan Li-chi, “A speech recognition method based on improved hidden Markov model”, Journal of Central South University: Natural Science, Vol.39, No.6, pp.1303–1308, 2008. (in Chinese) Takuya Matsuzaki, Yusuke Miyao, Jun’ichi Tsujii, “An Efficient Clustering Algorithm for ClassBased Language Models”, Pro- [16] [17] [18] [19] [20] [21] 2014 ceedings of the 7th Conference on Computational Natural Language Learning (CoNLL-2003), Edmonton, Canada, pp.119– 126, 2003. Ido Dagan, “Context word similarity and estimation from sparse data”, Computer Speech and Language, Vol.9, No.2, pp.123– 152, 1995. Douglass R. Cutting, David R. Karger, Jan O. Pedersen, John R. Tukey, “Scatter/garther: A Cluster-based Approach to Browsing Large Document Collections”, Proceedings of the 15th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’92), Copenhagen, Denmark, pp.318–329, 1992. Lillian Lee, Similarity-Based approaches to Natural Language Processing, Harvard University, Cambridge, MA. pp.56–72, 1997. Yael Karov, Shimon Edelman, “Learning Similarity-Based Word Sense Disambiguation from Sparse Data”, Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, pp.42–55, 1996. Yuan Li-chi, “Word Clustering Based on Similarity and Vari-Gram Language Model”, Journal of Chinese Computer ystems, Vol.30, No.5, pp.912–915, 2009. (in Chinese) Collins M, “Head-Driven Statistical Models for Natural Language Parsing”, Computational Linguistics, Vol.29, No.4, pp.589–637, 2003. YUAN Lichi was born in 1973, Ph.D., associate professor. His research mainly focuses on natural language processing. (Email: [email protected])
© Copyright 2024 Paperzz