國立雲林科技大學
National Yunlin University of Science and Technology
General statistical inference for discrete and
mixed spaces by an approximate application
of the maximum entropy principle
Advisor:Dr.Hsu
Graduate: Keng-Wei Chang
Author: Lian Yan and David J. Miller
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 11, NO. 3, MAY 2000
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Outline
Motivation
Objective
Introduction
Maximum Entropy Joint PMF
Extensions for More General Inference Problems
Experimental Results
Conclusions and Possible Extensions
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Motivation
maximum entropy (ME) joint probability mass
function (pmf)
powerful and not require expression of conditional
independence
the huge learning complexity has severely limited
the use of this approach
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Objective
propose an approach can quite tractable
learning
extend to with mixed data
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1. Introduction
probability mass function (pmf)
joint pmf, can compute a posteriori probabilities
for a single, fixed feature given knowledge of the remaining
feature values statistical classification
with some feature values missing statistical classification
for any (e.g., user-specified) discrete feature dimensions given
values for the other features generalized classification
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1. Introduction
Multiple Networks Approach
Bayesian Networks
Maximum Entropy Models
Advantages of the Proposed ME Method
over BN’s
Intelligent Database Systems Lab
1.1 Multiple Networks Approach
multilayer perceptrons (MLP’s), radial basis
functions, support vector machines
one would train one network for each feature
example:
documents classification to multiple topics
one network was used to make an individual
yes/no decision for presence of each possible
topic
multiple networks approach
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.1 Multiple Networks Approach
several potential difficulties
increased learning and storage complexities
accuracy of inferences
ignores dependencies between features
example:
network predict F1 = 1 and F2 = 1 respectively
but the joint event (F1=1, F2=1) has zero probability
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
1.2 Bayesian Networks
handles missing features and captures
dependencies between the multiple features
joint pmf explicitly
a product of conditional probability
versatile tools for inference that have a convenient,
informative representation…
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.2 Bayesian Networks
several difficulties with BN
explicitly conditional independence relations
between features
optimizing over the set of possible BN structures
sequential, greedy methods may be suboptimal
sequential learning where to stop to avoid overfitting
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.3 Maximum Entropy Models
Cheeseman proposed
maximum entropy (ME) joint pmf consistent with
arbitrary lower order probability constraints
powerful, allowing joint pmf to express general
dependencies between features
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
1.3 Maximum Entropy Models
several difficulties with ME
difficult learning for estimating the ME
Ku and Kullback proposed an iterative algorithm, satisfies
one constraint at a time, but cause violation of others
they only presented results for dimension N = 4 and J = 2
discrete values per feature
Peral cites complexity as the main barriers to using ME
Intelligent Database Systems Lab
1.4 Advantages of the Proposed ME Method
over BN’s
N.Y.U.S.T.
I.M.
our approach
not requir explicit conditional independence
an effective joint optimization learning technique
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF
a random feature vector
F ( F1 , F2 ,..., FN ), Fi Ai and Ai {1,2,3,..., | Ai |}
full discrete feature space
A1 A2 ... AN
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2. Maximum Entropy Joint PMF
pairwise pmf {P[ Fm , Fn ], m, n m}
N.Y.U.S.T.
I.M.
constrain the joint pmf P[F ] to agree with
the ME joint pmf consistent with these
pairwise pmf’s has the Gibbs form
Lagrange multiplier
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF
N.Y.U.S.T.
I.M.
Lagrange multiplier
equality constraint on the individual pairwise
probability P[ Fm f m , Fn f n ]
the joint pmf is specified by the set of Lagrange
multipliers { ( Fm f m , Fn f n ), m , n m, f m Am f n An ]
these probabilities also depend on Γ, they can
often be tractably computed
Intelligent Database Systems Lab
2. Maximum Entropy Joint PMF
two major difficulties
optimization requires calculating P[ f ] intractable
cost D will require marginalizations over the joint
pmf intractable
approximate ME was inspired
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1 Review of the ME Formulation for
Classification
N.Y.U.S.T.
I.M.
~
random feature vector F ( F , C ), C {1,2,..., K}
still has intractable form (1)
~
classification does require computing P[ F ]
but rather just the a posteriori probabilities
still not feasible!
Intelligent Database Systems Lab
2.1 Review of the ME Formulation for
Classification
here we review a tractable , approximate method
Joint PMF Form
Support Approximation
Lagrangian Formulation
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
2.1.1 Joint PMF Form
via Bayes rule
where {P[ f ], f A1 A2 ... AN }
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.2 Support Approximation
the approximation may have some effect on
accuracy of the learned model { ( Fi fi , C c)}
but will not sacrifice our capability
full feature space subset
A1 A2 ... AN
computationally feasible
example:
N =19 40 billion 100 reduction is huge
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation
{ f m ( f1( m) , f 2( m) ,..., f N( m) ), m 1,..., |
i.e.,
then the joint entropy for F
|}
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation
suggest the cross entropy
the cross entropy/Kullback distance
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation
For pairwise constraints involving the class label
P[Fk, C]
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
2.1.3 Lagrangian Formulation
overall constraint cost D is formed as a sum of all
the individual pairwise costs
given D and H, can form the Lagrangian cost function
Intelligent Database Systems Lab
3. Extensions for More General
Inference Problems
General statistical Inference
Joint PMF Representation
Support Approximation
Lagrangian Formulatoin
Discussion
Mixed Discrete and Continuous Feature Space
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
N.Y.U.S.T.
I.M.
3.1.1 Joint PMF Representation
the posteriori probabilities have
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.1 Joint PMF Representation
respect to each feature Fi, the joint pmf as
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.2 Support Approximation
reduced joint pmf for F
if there is a set
( i )
S( f
( i )
) { fm
: fm
( i )
f
Intelligent Database Systems Lab
( i )
}
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin
the joint entropy H can be written
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin
pairwise pmf PM[Fk, Fl] can be calculated in two
different ways
and
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin
overall constraint cost D
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.1.3 Lagrangian Formulatoin
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.2. Discussion
Choice of Constraints
encode all probabilities of second order
Tractability of Learning
Qualitative Comparison of Methods
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous
Feature Space
feature vector will be written ( F , A)
F ( F 1, F2 ,..., FN d )
A ( A1 , A2 ,..., AN c )
our objective is to learn {P[c | f , a]}
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
3.3. Mixed Discrete and Continuous
Feature Space
N.Y.U.S.T.
I.M.
given our choice of constraints, these probabilities
decompose the joint density as
Intelligent Database Systems Lab
3.3. Mixed Discrete and Continuous
Feature Space
a conditional mean constraint on Ai given C = c
a pair of continuous features Ai, Aj
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
4. Experiment Results
Evaluation of generalized classification
performance used solely for classification
Mushroom, Congress, Nursery, Zoo, Hepatitis
Generalized classification performance on data
sets indicates multiple possible class features
N.Y.U.S.T.
I.M.
Solar Flare, Flag, Horse Colic
Classification performance on data sets with
mixed continuous and discrete features
Credit Approval, Hepatitis, Horse Colic
Intelligent Database Systems Lab
4. Experiment Results
N.Y.U.S.T.
I.M.
the ME method was compared with
BN
DT
powerful extension of DT
mixtures of DT
multilayer perceptrons (MLP)
Intelligent Database Systems Lab
4. Experiment Results
N.Y.U.S.T.
I.M.
for a arbitrary feature to be inrerred, Fi, computes
the a posteriori probabilities
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
4. Experiment Results
use the following criteria to evaluate all the
methods
(1) misclassification rate on the test set for the data set’s
class label
(2) (1) with a single feature missing randomly
(3) average misclassification rate on the test set
(4) misclassification rate on the test set, based on
predicting a pair of randomly chosen features
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
4. Experiment Results
N.Y.U.S.T.
I.M.
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
5. Conclusions and Possible Extensions
Regression
Large-Scale Problems
Model Selection-Searching for ME Constraints
Applications
Intelligent Database Systems Lab
N.Y.U.S.T.
I.M.
Personal opinion
…
Intelligent Database Systems Lab
© Copyright 2026 Paperzz