arg max H(P). - CWI Amsterdam

Peter Grünwald
Joint work with A. Philip Dawid
Game Theory, Maximum Generalized
Entropy, Minimum Discrepancy,
Robust Bayes and Pythagoras
Peter Grünwald
CWI Amsterdam
www.cwi.nl/~pdg
Joint work with A.P. Dawid, University College, London
Bangalore, October 2003
Overview
1.
2.
3.
4.
5.
6.
Maximum Entropy and Game Theory
Maximum Generalized Entropy
M.G.E. and Robust Bayes (Result I)
Minimum Discrepancy
Pythagoras (Result II)
Conclusion
CWI is the National Research Institute for Mathematics
and Computer Science in the Netherlands.
Setting
Maximum Entropy Principle
Finite (for now) Sample Space
Suppose we only know that
We are asked to make probabilistic predictions/
decisions about
According to ‘MaxEnt’, we should predict using
the unique
that maximizes entropy
under the constraint
:
Set of all distributions over
Convex Closed Subset of
Shannon (for now) Entropy:
Generalized Entropy, Game Theory and
Pythagoras
Jaynes 1957
1
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Does it make any sense?
• MaxEnt applied in speech recognition,
computer vision, stock market prediction…
• …but not always clear why it would be a good
idea to use it!
• Various rationales and criticisms have been
given over time
• Topsøe (1979) offers a game-theoretic
interpretation/rationale
Basic Result
Information Inequality:
If
Basic Result
Information Inequality:
If
then
therefore we can write
Basic Result
then
so that
??? Von Neumann 1928 ???
Generalized Entropy, Game Theory and
Pythagoras
Information Inequality:
If
then
so that
Topsøe 1979! (in great generality)
2
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Basic Result, cont.
MaxEnt as a game between Nature and Statistician
with loss measured by ‘log loss’
MaxEnt worst-case optimal strategy for Nature:
Basic Result, part II
MaxEnt
worst-case optimal strategy for Nature:
achieved for
achieved for
MaxEnt worst-case optimal strategy for Statistician:
surprising!
achieved for
Basic Result, part II
MaxEnt
worst-case optimal strategy for Statistician:
achieved for
Basic Result, part II
MaxEnt
worst-case optimal strategy for Statistician:
achieved for
Nature has to satisfy constraint
Statistician can choose anything she likes
Generalized Entropy, Game Theory and
Pythagoras
3
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Basic Result, part II
The Clue
MaxEnt
worst-case optimal strategy for Statistician:
…but what if we are interested in another loss
function?
Similar Story Can
Still Be Told!
achieved for
• This justifies the Maximum Entropy Principle when
the ‘log loss’ is the proper loss function to use:
– Coding, Kelly Gambling
Overview
1.
2.
3.
4.
5.
6.
Maximum Entropy and Game Theory
Maximum Generalized Entropy
M.G.E. and Robust Bayes (Result I)
Minimum Discrepancy
Pythagoras (Result II)
Conclusion
Game/Decision Theory
,
,
Action Space, Sample Space, Constraint Set
Randomized actions (set of distributions over
)
Loss Function
Our Game!
Statistician’s Choice
Nature’s Choice
Generalized Entropy, Game Theory and
Pythagoras
4
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Example: Logarithmic Loss
Generalized Entropy
Here actions are formally same as probability distributions
Logarithmic loss is a proper scoring rule, i.e. for all
Generalized Entropy
CENTRAL DEFINITION
For (arbitrary) loss function
, the
` -entropy of ’ is defined by
:
CENTRAL DEFINITION
For (arbitrary) loss function
, the
` -entropy of ’ is defined by
De Groot 1962
Rao 1982
Generalized Entropy
always concave
(infimum of linear functions)
Shannon Entropy is special case:
Generalized Entropy, Game Theory and
Pythagoras
often differentiable
5
Peter Grünwald
Joint work with A. Philip Dawid
Example: Brier (squared) Loss
Bangalore, October 2003
Example: Brier (squared) Loss
Brier loss is proper scoring rule
Example: Brier (squared) Loss
Generalized Entropy, Game Theory and
Pythagoras
Example: 0/1 - Loss
6
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Example: 0/1 - Loss
Overview
1.
2.
3.
4.
5.
6.
Main Theorem (baby version)
•
•
•
Main Theorem (baby version)
…then:
Assume
•
Maximum Entropy and Game Theory
Maximum Generalized Entropy
M.G.E. and Robust Bayes (Result I)
Minimum Discrepancy
Pythagoras (Result II)
Conclusion
finite
convex and closed
closed
bounded from above
is reached for some
is reached for some
achieving
Game has a value!
Generalized Entropy, Game Theory and
Pythagoras
7
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
But what is new here?
• Mathematically:
– Nothing new in baby version
– In paper we present an adult version
• General sample spaces, unbounded loss functions, noncompact sets of constraints…
• New proof technique
• Conceptually:
‘maximum generalized entropy is robust Bayes’
• New view leads to new math results later
Discrepancy
(= generalized relative entropy)
Overview
1.
2.
3.
4.
5.
6.
Maximum Entropy and Game Theory
Maximum Generalized Entropy
M.G.E. and Robust Bayes (Result I)
Minimum Discrepancy
Pythagoras (Result II)
Conclusion
Example Discrepancy: Brier score
For given loss function L, we can define the
discrepancy
by
Relative Entropy is special case:
Generalized Entropy, Game Theory and
Pythagoras
• This is just the squared Euclidean distance!
8
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Minimum Relative Entropy
Principle
For a given ‘prior’ distribution
pick distribution
achieving
and constraint
• Interpretation: is the member of
that is
closest to , i.e. it is the projection of
on
Pythagorean Property
As noted by Csiszár, relative entropy behaves
in some ways like squared Euclidean distance:
for all priors
and all
we have
Under some extra conditions we have equality.
Csiszár 1975, 1991, many others
Pythagorean Theorem,
graphicallly
Overview
1.
2.
3.
4.
5.
6.
Generalized Entropy, Game Theory and
Pythagoras
Maximum Entropy and Game Theory
Maximum Generalized Entropy
M.G.E. and Robust Bayes (Result I)
Minimum Discrepancy
Pythagoras (Result II)
Further Developments
9
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
Relative Games
Main Theorem
Grünwald and Dawid, 2002
For every loss function L and reference act e,
we can define the relative loss
by
For all ,
the game
such that
is finite for all
has a value, i.e.
reached for saddlepoint
if and only if, for all
:
Conclusion
Pythagoras = Von Neumann
Who could have guessed?
In words:
The Pythagorean Property holds iff
the minimax theorem applies to the
loss function under consideration
For example:
minimax theorem holds for squared loss ;
Pythagorean property reduces to high-school
Pythagorean theorem
Generalized Entropy, Game Theory and
Pythagoras
•
We have shown:
– Maximum (Gen.) Entropy = Robust
Bayes
– Pythagoras = Von Neumann
• Three further results in full paper:
– Relation to Bregman divergences
– Generalized Exponential Families
– Generalized Redundancy-Capacity
(Gallagher-Ryabko-Haussler) Theorem
10
Peter Grünwald
Joint work with A. Philip Dawid
Bangalore, October 2003
How general is Pythagorean
property?
Thank you for your attention!
• Both squared Euclidean distance and relative
entropy are examples of Bregman divergences
• Pythagoras known to hold for such divergences
nonempty
Our discrepancies
Generalized Entropy, Game Theory and
Pythagoras
nonempty
nonempty
Bregman’s divergences
11