國立雲林科技大學 National Yunlin University of Science and Technology Probabilistic self-organizing maps for qualitative data Ezequiel Lopez-Rubio NN, Vol.23 2010, pp. 1208–1225 Presenter : Wei-Shen Tai 2010/11/17 Intelligent Database Systems Lab Outline N.Y.U.S.T. I. M. Introduction Basic concepts The model Experimental results Conclusions Comments 2 Intelligent Database Systems Lab Motivation N.Y.U.S.T. I. M. Non-continuous data in self-organization map Re-codify the categorical data to fit the existing continuous valued models (1of-k coding )or impose a distance measure on the possible values of a qualitative variable (distance hierarchy). SOM depends heavily on the possibility of adding and subtracting input vectors, and on a proper distance measure among them. 3 Intelligent Database Systems Lab Objective N.Y.U.S.T. I. M. A probability-based SOM Without the need of any distance measure between the values of the input variables. 4 Intelligent Database Systems Lab Chow–Liu algorithm N.Y.U.S.T. I. M. Obtain the maximum mutual information spanning tree. Compute the probability of input x belonged to the tree. 5 Intelligent Database Systems Lab Robbins–Monro algorithm A stochastic approximation algorithm Its goal is to find the value of some parameter τ which satisfies A random variable Y which is a noisy estimate of ζ This algorithm proceeds iteratively to obtain a running estimation θ (t) of the unknown parameter τ where ε(t) is a suitable step size. (similar to LR(t) in SOM) 6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Map and units Map definition N.Y.U.S.T. I. M. Each mixture component i is associated with a unit in the map. Structure of the units 7 Intelligent Database Systems Lab Self-organization Find BMU Learning method N.Y.U.S.T. I. M. 8 Intelligent Database Systems Lab Initialization and summary Initialization of the map Summary N.Y.U.S.T. I. M. 1. Set the initial values for all mixture components i. 2. Obtain the winner unit of an input xt and the posterior responsibilities Rti of the winner. 3. For every component i, estimate its parameters πi(t),ψijh(t) and ξijhks(t). 4. Compute the optimal spanning tree of each component. 5. If the map has converged or the maximum time step T has been reached, stop. Otherwise, go to step 2. 9 Intelligent Database Systems Lab Experimental results N.Y.U.S.T. I. M. Cars in three graphic results 10 Intelligent Database Systems Lab Quality measures N.Y.U.S.T. I. M. 11 Intelligent Database Systems Lab Conclusion N.Y.U.S.T. I. M. A probabilistic self-organizing map model Learns from qualitative data which do not allow meaningful distance measures between values. 12 Intelligent Database Systems Lab Comments Advantage This proposed model can handle categorical data without distance measure between units (neurons) and inputs during the training . That is, categorical data are handled by mapping probability instead of 1-of-k coding and distance hierarchy in this model. Drawback N.Y.U.S.T. I. M. The size of weight vector will explosively grow as the number of categorical attributes and their possible values. That makes these computational processes become complex as well. It fits for categorical data but mixed-type data. Application Categorical data in SOMs. 13 Intelligent Database Systems Lab
© Copyright 2026 Paperzz