2. Maximum Entropy Joint PMF

國立雲林科技大學
National Yunlin University of Science and Technology
General statistical inference for discrete and
mixed spaces by an approximate application
of the maximum entropy principle
Advisor:Dr.Hsu
Graduate: Keng-Wei Chang
Author: Lian Yan and David J. Miller
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Outline







Motivation
Objective
Introduction
Maximum Entropy Joint PMF
Extensions for More General Inference Problems
Experimental Results
Conclusions and Possible Extensions
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Motivation

maximum entropy (ME) joint probability mass
function (pmf)


powerful and not require expression of conditional
independence
the huge learning complexity has severely limited
the use of this approach
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Objective


propose an approach can quite tractable
learning
extend to with mixed data
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1. Introduction


probability mass function (pmf)
joint pmf, can compute a posteriori probabilities



for a single, fixed feature given knowledge of the remaining
feature values  statistical classification
with some feature values missing  statistical classification
for any (e.g., user-specified) discrete feature dimensions given
values for the other features  generalized classification
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1. Introduction




Multiple Networks Approach
Bayesian Networks
Maximum Entropy Models
Advantages of the Proposed ME Method
over BN’s
Intelligent Database Systems Lab
1.1 Multiple Networks Approach

multilayer perceptrons (MLP’s), radial basis
functions, support vector machines
one would train one network for each feature
 example:
documents classification to multiple topics
 one network was used to make an individual
yes/no decision for presence of each possible
topic
 multiple networks approach

Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.1 Multiple Networks Approach

several potential difficulties


increased learning and storage complexities
accuracy of inferences


ignores dependencies between features
example:
network predict F1 = 1 and F2 = 1 respectively
but the joint event (F1=1, F2=1) has zero probability
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
1.2 Bayesian Networks


handles missing features and captures
dependencies between the multiple features
joint pmf explicitly


a product of conditional probability
versatile tools for inference that have a convenient,
informative representation…
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.2 Bayesian Networks

several difficulties with BN


explicitly conditional independence relations
between features
optimizing over the set of possible BN structures


sequential, greedy methods  may be suboptimal
sequential learning  where to stop to avoid overfitting
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.3 Maximum Entropy Models

Cheeseman proposed


maximum entropy (ME) joint pmf consistent with
arbitrary lower order probability constraints
powerful, allowing joint pmf to express general
dependencies between features
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.3 Maximum Entropy Models

several difficulties with ME

difficult learning for estimating the ME



Ku and Kullback proposed an iterative algorithm, satisfies
one constraint at a time, but cause violation of others
they only presented results for dimension N = 4 and J = 2
discrete values per feature
Peral cites complexity as the main barriers to using ME
Intelligent Database Systems Lab
1.4 Advantages of the Proposed ME Method
over BN’s

N.Y.U.S.T.
I.M.
our approach


not requir explicit conditional independence
an effective joint optimization learning technique
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF

a random feature vector
F  ( F1 , F2 ,..., FN ), Fi  Ai and Ai  {1,2,3,..., | Ai |}

full discrete feature space
 A1  A2 ...  AN
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2. Maximum Entropy Joint PMF

pairwise pmf {P[ Fm , Fn ], m, n  m}


N.Y.U.S.T.
I.M.
constrain the joint pmf P[F ] to agree with
the ME joint pmf consistent with these
pairwise pmf’s has the Gibbs form
Lagrange multiplier
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF

N.Y.U.S.T.
I.M.
Lagrange multiplier



equality constraint on the individual pairwise
probability P[ Fm  f m , Fn  f n ]
the joint pmf is specified by the set of Lagrange
multipliers   { ( Fm  f m , Fn  f n ), m , n  m, f m  Am f n  An ]
these probabilities also depend on Γ, they can
often be tractably computed
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF

two major difficulties



optimization requires calculating P[ f ]  intractable
cost D will require marginalizations over the joint
pmf  intractable
approximate ME was inspired
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1 Review of the ME Formulation for
Classification
N.Y.U.S.T.
I.M.
~

random feature vector F  ( F , C ), C {1,2,..., K}
still has intractable form (1)
~
 classification does require computing P[ F ]
but rather just the a posteriori probabilities

still not feasible!
Intelligent Database Systems Lab
2.1 Review of the ME Formulation for
Classification

here we review a tractable , approximate method



Joint PMF Form
Support Approximation
Lagrangian Formulation
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
2.1.1 Joint PMF Form

via Bayes rule

where {P[ f ], f  A1  A2  ...  AN }
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.2 Support Approximation

the approximation may have some effect on
accuracy of the learned model { ( Fi  fi , C  c)}
but will not sacrifice our capability
full feature space  subset

 A1  A2  ...  AN 
 computationally feasible
 example:
N =19  40 billion  100  reduction is huge

Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation

 { f m  ( f1( m) , f 2( m) ,..., f N( m) ), m  1,..., |
i.e.,
then the joint entropy for F
|}
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation

suggest the cross entropy

the cross entropy/Kullback distance
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation

For pairwise constraints involving the class label
P[Fk, C]
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation

overall constraint cost D is formed as a sum of all
the individual pairwise costs

given D and H, can form the Lagrangian cost function
Intelligent Database Systems Lab
3. Extensions for More General
Inference Problems

General statistical Inference





Joint PMF Representation
Support Approximation
Lagrangian Formulatoin
Discussion
Mixed Discrete and Continuous Feature Space
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
3.1.1 Joint PMF Representation

the posteriori probabilities have
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.1 Joint PMF Representation

respect to each feature Fi, the joint pmf as
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.2 Support Approximation

reduced joint pmf for F

if there is a set
( i )
S( f
( i )
)  { fm 
: fm
( i )
 f
Intelligent Database Systems Lab
( i )
}
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin

the joint entropy H can be written
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin

pairwise pmf PM[Fk, Fl] can be calculated in two
different ways
and
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin

overall constraint cost D
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.2. Discussion

Choice of Constraints



encode all probabilities of second order
Tractability of Learning
Qualitative Comparison of Methods
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous
Feature Space

feature vector will be written ( F , A)
F  ( F 1, F2 ,..., FN d )
A  ( A1 , A2 ,..., AN c )

our objective is to learn {P[c | f , a]}
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.3. Mixed Discrete and Continuous
Feature Space
N.Y.U.S.T.
I.M.

given our choice of constraints, these probabilities

decompose the joint density as
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous
Feature Space

a conditional mean constraint on Ai given C = c

a pair of continuous features Ai, Aj
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
4. Experiment Results

Evaluation of generalized classification
performance used solely for classification


Mushroom, Congress, Nursery, Zoo, Hepatitis
Generalized classification performance on data
sets indicates multiple possible class features


N.Y.U.S.T.
I.M.
Solar Flare, Flag, Horse Colic
Classification performance on data sets with
mixed continuous and discrete features

Credit Approval, Hepatitis, Horse Colic
Intelligent Database Systems Lab
4. Experiment Results

N.Y.U.S.T.
I.M.
the ME method was compared with





BN
DT
powerful extension of DT
mixtures of DT
multilayer perceptrons (MLP)
Intelligent Database Systems Lab
4. Experiment Results

N.Y.U.S.T.
I.M.
for a arbitrary feature to be inrerred, Fi, computes
the a posteriori probabilities
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
4. Experiment Results

use the following criteria to evaluate all the
methods
(1) misclassification rate on the test set for the data set’s
class label
(2) (1) with a single feature missing randomly
(3) average misclassification rate on the test set
(4) misclassification rate on the test set, based on
predicting a pair of randomly chosen features
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
4. Experiment Results
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
5. Conclusions and Possible Extensions




Regression
Large-Scale Problems
Model Selection-Searching for ME Constraints
Applications
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Personal opinion

…
Intelligent Database Systems Lab