Probabilistic Programming

Probabilistic Programming
Miguel Lázaro-Gredilla
[email protected]
January 2013
Machine Learning Group
http://www.tsc.uc3m.es/~miguel/MLG/
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
The software stack: A digression
The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Classical machine learning
I
I
Large number of tools with diverse backgrounds
I
k-means
I
Principal Component Analysis
I
Independent Component Analysis
I
Classical Neural Networks
I
Support Vector Machines
I
Density estimation via Parzen windows
I
Recursive Least Squares
I
Least Mean Squares
I
(just an arbitrary sample, we could go on and on...)
Pragmatic, unsystematic, non-probabilistic
1/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Third generation machine learning
I
Data sets described as instances of a probabilistic model
I
Gaussian Process regression/classification
I
Latent Dirichlet Allocation
I
Bayes Point Machine
I
...
I
We can infer unknowns in a principled way
I
Features
I
Each tool can be expressed as a Bayesian network
I
Systematic, modular, standardized approach
I
Proposals are models, not algorithms
I
Detach inference from model
2/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (I/V)
I
Should we just dump classical ML and jump on the Bayesian
bandwagon?
I
Most classical ML tools can be written as some type of
inference on some Bayesian network
I
So just update classical ML using a Bayesian interpretation
I
Gain insight on the model behind the algorithm
I
See overlaps emerge
I
Use other types of inference
I
It may become obvious how to enhance them
3/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (II/V)
I
Classical algorithm: k-means
I
Bayesian model (assuming normalized data)
p(xn |zn , {µk }) = N (xn |µzn , v I)
p(v ) = InvGamma(v |1, 1)
p(µk ) = N (µk |0, 10I)
p(zn |w) = Discrete(zn |w)
p(w) = Dirichlet(w|1k×1 )
I
Classical model corresponds to
I
Maximum likelihood for p({xn }|{zn }, {µk }),
obtained using hard-EM optimization
I
Particular case of Gaussian mixture model
4/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (III/V)
I
Classical algorithm: Extended Recursive Least Squares
(adaptive)
I
Bayesian model
p(xn |wn ) = N (xn |w>
n un , v )
p(wn |wn−1 , α, β) = N (wn |(1 − β)wn−1 , βαI)
p(v ) = InvGamma(v |1, 1)
p(α) = InvGamma(α|1, 1)
I
p(β/(1 − β)) = InvGamma(β/(1 − β)|1, 1)
Classical model corresponds to
I
Posterior mean for wn , for some magically selected α and v
I
Particular case of Kalman filter
I
Contrived assumptions needed to represent exponentially
weighted RLS in this framework.
I
Hints that it might not make sense, see [KRLST].
5/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (IV/V)
I
Classical algorithm: Principal Component Analysis
I
Bayesian model
p(xn |yn , W, v ) = N (xn |W> yn , v I)
p(yn ) = N (yn |0, I)
p([W]d-th col ) = N (wd |0, diag([α1 , . . . , αD ])
p(v ) = InvGamma(v |1, 1)
p(αd ) = InvGamma(v |1, 1)
I
Classical model corresponds to
I
Maximum likelihood for p(xn |W, v ) when v → 0,
with W product of orthogonal and ordered diagonal matrix
I
Restrictions on W make it unique, but don’t change the model
I
New possibilities: What if we do max. lik. for p(xn |{yn }, v )?
6/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Updating classical machine learning (V/V)
I
Classical algorithm: Support Vector Machines
I
Bayesian model
I
See: Sollich, P. (2002). Bayesian Methods for Support Vector
Machines: Evidence and Predictive Class Probabilities.
Machine Learning, 46:21-52.
I
Ingenious approach, but not very natural
(involves using three classes to solve a binary problem)
7/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
The software stack: A digression
The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Some observations
According to the previous slides
I
Most ML tools have a Bayesian model description
I
Full description takes a few lines
I
The type of inference (ML, MAP, point estimates, full
Bayesian posterior) is independent of the model
I
Though the tractability of each does depend on the model
In the process of creating new ML tools
I
What makes a new ML tool worth is:
A novel model
I
What we spend most time working on is:
Making inference tractable on the new model
8/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The idea
Probabilistic programming
I
Define a language to describe Bayesian models (“programs”)
I
Create some “compiler” that understands those programs and
generates inference engines for them
The new workflow (emphasis is on model design)
I
Spend more time thinking about the model
I
Program it (just a few lines!)
I
Optional: Sample data from the model
I
Feed data to the inference engine and assess the results
I
Model wasn’t that good, do it over
9/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Programming paradigms (I/II)
A random assortment of them:
I
Imperative (Matlab)
I
Procedural (C)
I
Object oriented (C++)
I
Declarative (SQL)
I
Functional (Ocaml, F#)
I
Metaprogramming (LISP)
I
Domain specific language (Spice)
10/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Programming paradigms (II/II)
For probabilistic programming:
I A domain specific language may be simpler for the user, but
I
Doesn’t integrate well with existing codebases
I
Doesn’t interface well with DB access, plotting capabilities,
etc.
I
Functional languages with metaprogramming can be useful to
write a “guest probabilistic program” within a “host
programming language” (we’ll see F# examples)
I
This can be done with imperative style, but looks uglier
(we’ll see Python examples)
11/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
An incomplete list of probabilistic programming languages
I
BUGS: Bayesian inference using Gibbs sampling
I
HANSEI: Extends OCaml, discrete distributions only
I
Hierarchical Bayesian Compiler (HBC): Large-scale models
and non-parametric process priors
I
PyMCMC: MCMC algorithms for Python classes
I
Church: Extends Scheme to describe Bayesian models
I
Infer.NET: Provides a probabilistic language within the .NET
platform
See more on http://probabilistic-programming.org
12/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
The software stack: A digression
The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The Java case
Three big operating systems
I
Linux, OSX, Windows
...and one language to rule them all
I
Sun Microsystems designed Java to run on the JVM
I
...and implemented the JVM to run on
linux, mac, and windows
I
platform independence, bliss for programmers
13/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
The .NET case
Microsoft reacted and created the JVM counterpart: The CLR
Which language was used to target the CLR?
I
The .NET languages: VB.NET, C#, F#, IronPython...
I
Different languages with a common set of libraries,
it is easy to interface them
Which operating systems did the CLR run on?
I
Microsoft released the specification and standardized it
I
CLR-like implementations arose for linux/mac: Mono
I
...but the Windows version is always ahead, more complete,
with additional tools, better tested, etc.
Microsoft tries to win in the cross-platform territory (sweet spot
for developers) while still favoring its flagship product, Windows:
Opposing objectives
14/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Infer.NET’s interoperability
I
Infer.NET targets all .NET languages, with a focus on
C#, F#, and IronPython
I
It can be used on Mono (Mac/Linux) or the CLR (Windows)
(better experience/less buggy on the latter)
I
The .NET languages have a growing set of tools for scientific
computing, but nowhere near Matlab yet
I
IronPython cannot use Numpy/Scipy/Matplotlib natively
I
A new tool called Sho provides an IronPython shell with
Matlab-like capabilities (but Windows only)
15/27
Contents
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
The software stack: A digression
The modeling language
References
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Using Infer.NET
Infer.NET provides:
I A probabilistic modeling language embedded in other
languages
I
I
F# allows a more natural embedding
Compilation to three inference engines
I
EP: Approximation includes all non-zero probability points
I
VB: Approximation avoids zero probability points
I
MCMC: Gibbs sampling, slower
Let’s browse Microsoft Research examples and create a new model
http://research.microsoft.com/en-us/um/cambridge/
projects/infernet/
16/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Two coins (F#)
17/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Two coins (IronPython)
18/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Learning a Gaussian (F#)
19/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Learning a Gaussian (IronPython)
20/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Truncated Gaussian (F#)
21/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Truncated Gaussian (IronPython)
22/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Gaussian Mixture (F#)
23/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Gaussian Mixture (IronPython)
24/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
k-means (F#)
[Demo]
25/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
Conclusions
I
Probabilistic programming is in its infancy
I
We might produce alternative language
definitions/implementations
I
We can leverage it to test new models faster
I
We can build custom models on the fly
26/27
Towards a modern machine learning
Probabilistic programming languages
Infer.NET
References
References
I [BayesSVM] Sollich, P. (2002). Bayesian Methods for Support Vector
Machines: Evidence and Predictive Class Probabilities. Machine Learning,
46:21-52.
I [ProbProg] The probabilistic programming wiki
http://probabilistic-programming.org/wiki/Home
I [InferNET] T. Minka, J. Winn, J. Guiver, and D. Knowles Infer.NET 2.5,
Microsoft Research Cambridge, 2012.
http://research.microsoft.com/infernet
I [KRLST] M. Lázaro-Gredilla, S. Van Vaerenbergh, and I. Santamarı́a. “A
Bayesian approach to tracking with kernel recursive least-squares”, MLSP
2011.
27/27