1 - 雲林科技大學

國立雲林科技大學
National Yunlin University of Science and Technology
Probabilistic self-organizing maps
for qualitative data
Ezequiel Lopez-Rubio
NN, Vol.23 2010, pp. 1208–1225
Presenter : Wei-Shen Tai
2010/11/17
Intelligent Database Systems Lab
Outline






N.Y.U.S.T.
I. M.
Introduction
Basic concepts
The model
Experimental results
Conclusions
Comments
2
Intelligent Database Systems Lab
Motivation

N.Y.U.S.T.
I. M.
Non-continuous data in self-organization map


Re-codify the categorical data to fit the existing continuous valued models (1of-k coding )or impose a distance measure on the possible values of a
qualitative variable (distance hierarchy).
SOM depends heavily on the possibility of adding and subtracting input
vectors, and on a proper distance measure among them.
3
Intelligent Database Systems Lab
Objective

N.Y.U.S.T.
I. M.
A probability-based SOM

Without the need of any distance measure between
the values of the input variables.
4
Intelligent Database Systems Lab
Chow–Liu algorithm


N.Y.U.S.T.
I. M.
Obtain the maximum mutual information spanning tree.
Compute the probability of input x belonged to the tree.
5
Intelligent Database Systems Lab
Robbins–Monro algorithm

A stochastic approximation algorithm


Its goal is to find the value of some parameter τ which
satisfies
A random variable Y which is a noisy estimate of ζ

This algorithm proceeds iteratively to obtain a running
estimation θ (t) of the unknown parameter τ

where ε(t) is a suitable step size. (similar to LR(t) in SOM)
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
Map and units

Map definition


N.Y.U.S.T.
I. M.
Each mixture component i is associated with a unit in the map.
Structure of the units
7
Intelligent Database Systems Lab
Self-organization

Find BMU

Learning method
N.Y.U.S.T.
I. M.
8
Intelligent Database Systems Lab
Initialization and summary

Initialization of the map

Summary





N.Y.U.S.T.
I. M.
1. Set the initial values for all mixture components i.
2. Obtain the winner unit of an input xt and the posterior responsibilities
Rti of the winner.
3. For every component i, estimate its parameters πi(t),ψijh(t) and ξijhks(t).
4. Compute the optimal spanning tree of each component.
5. If the map has converged or the maximum time step T has been
reached, stop. Otherwise, go to step 2.
9
Intelligent Database Systems Lab
Experimental results

N.Y.U.S.T.
I. M.
Cars in three graphic results
10
Intelligent Database Systems Lab
Quality measures
N.Y.U.S.T.
I. M.
11
Intelligent Database Systems Lab
Conclusion

N.Y.U.S.T.
I. M.
A probabilistic self-organizing map model
 Learns from qualitative data which do not allow
meaningful distance measures between values.
12
Intelligent Database Systems Lab
Comments

Advantage



This proposed model can handle categorical data without distance
measure between units (neurons) and inputs during the training .
That is, categorical data are handled by mapping probability instead of
1-of-k coding and distance hierarchy in this model.
Drawback



N.Y.U.S.T.
I. M.
The size of weight vector will explosively grow as the number of
categorical attributes and their possible values. That makes these
computational processes become complex as well.
It fits for categorical data but mixed-type data.
Application

Categorical data in SOMs.
13
Intelligent Database Systems Lab