Computational aspects of Bayesian variable selection

Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Computational aspects of Bayesian variable
selection
Nicolas Chopin
(based on joint work with Christian Schäfer and James
Ridgeway)
CREST-ENSAE
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Variable selection
Motivation
I
Bayesian variable selection typically requires sampling binary
vectors γ; γi = 1 (resp. = 0) if covariate i is included (resp.
is not).
I
Sampling binary vectors leads to specific difficulties. In
particular, MCMC tends to behave too “locally”.
I
Approximate methods (e.g. shotgun algorithm) do not have a
strong justification, do not necessarily work so well, and
cannot reliably approximate posterior expectations (e.g.
computating median model).
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Variable selection
Objective
I
To sample effectively binary vectors using some form of
adaptive Monte Carlo.
I
Key to adaptivitity will be the design of a parametric family of
proposal distributions (defined with respect to a binary space).
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Variable selection
Variable selection in linear models
In this talk, we will focus on linear regression models, but extension
to generalized linear models will be discussed towards the end.
I
The relationship between the observed explained variable
y ∈ Rm and the covariates Z = [z 1 , . . . , z d ] ∈ Rm,d is
y | β, γ, σ 2 , Z ∼ N Z diag [γ] β, σ 2 I .
I
The parameter γ ∈ Γ = {0, 1}d determines which covariates
are included in or dropped from the linear regression model.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Variable selection
Model choice criteria
There are several ways to assign a score to each model γ:
I
Bayes: Using appropriate prior distributions for β, σ 2 and γ,
the marginal posterior distribution π(γ) = π(γ | y, Z) can be
computed.
I
Frequentism: A given criterion (e.g. BIC, AIC) −π(γ) must
be minimised.
The abstract problem:
I
We want to integrate or optimise with respect to π(γ) defined
on the multivariate binary space Γ = {0, 1}d .
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Adaptive MC
Adaptive Monte Carlo
A Monte Carlo algorithm is adaptive if it has the ability to adapt
itself, automatically and sequentially, using past simulations.
Adaptive algorithms typically rely on a parametric family of
proposal distributions {qθ (·) θ ∈ Θ}, with the following properties:
1. one can sample from qθ (·).
2. One can estimate θ from current samples.
3. Reasonable modelling power.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Adaptive MC
Examples of Adaptive MC
Integration:
I
Adaptive Importance Sampling
I
Adaptive MCMC
I
SMC (sequential Monte Carlo)
Optimisation:
I
Cross-entropy method (Rubinstein, 2002)
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Importance sampling
Toy example
0.4
π0 ≡ 1/8
π
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
111
110
101
100
011
010
001
000
0.0
0.0
0.1
0.1
0.2
0.2
0.3
0.3
0.4
Suppose we want to sample from π using the uniform π0 as
importance function.
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Importance sampling
Draw an independent sample
xk ∼ π0 (γ)
000
010
011
101
111
0.4
wk ∝ 1
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
0.1
0.2
0.3
We draw an independent sample
x1 , . . . , xn from the importance
function π0 .
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Importance sampling
Compute the importance weights
xk ∼ π0 (γ)
000
010
011
101
111
wk ∝ π(xk )
0.3
0.4
We compute the importance
weights
0.1
0.2
wk ∝
π(xk )
π0 (xk )
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
for each particle xk .
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Importance sampling
Simple importance sampling
Based on the identity:
Eπ [ϕ(γ)] =
X
=
X
π(γ)ϕ(γ)
γ
π0 (γ)
γ
= Eπ0 [
π(γ)
ϕ(γ)
π0 (γ)
π(γ)
ϕ(γ)]
π0 (γ)
but the weight function π/π0 often induces too much variability.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Importance sampling
0.4
π0 ≡ 1/8
π
111
110
101
100
011
010
001
000
111
110
101
100
011
010
001
000
0.0
0.0
0.1
0.1
0.2
0.2
0.3
0.3
0.4
The step is too big
We add intermediary distributions to obtain a smooth evolution.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential importance sampling
0.4
0.1
0.2
π1 = 23 π0 + 31 π
010
011
100
101
110
111
010
011
100
101
110
111
π3 = π
0.1
Computational aspects of Bayesian variable selection
001
000
0.0
Nicolas Chopin
111
110
101
100
011
010
001
000
0.0
0.1
0.2
0.3
001
000
0.0
0.4
111
110
101
100
011
010
001
π2 = 13 π0 + 23 π
0.2
0.3
0.4
000
0.0
0.1
0.2
0.3
π0 ≡ 1/8
0.3
0.4
Evolution of intermediary distributions
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential importance sampling
0.4
[0]
π0 (xk )
0.1
111
110
101
100
011
010
001
000
0.4
111
110
101
100
011
010
001
000
0.0
0.1
0.0
0.4
0.3
[1] π2 (xk )
[2]
π1 (xk )
[2] π3 (xk )
[3]
wk ∝ wk
∝ π(xk )
0.1
π2 (xk )
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
Nicolas Chopin
111
110
101
100
011
010
001
000
0.0
0.1
0.2
wk ∝ wk
0.2
0.3
[0] π1 (xk )
[1]
wk ∝ wk
0.2
0.3
wk ∝ π0 (xk ) ∝ 1
0.2
0.3
0.4
Evolution of intermediary importance weights
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential importance sampling
[0]
0.4
[0]
[1]
0.2
100
101
110
111
101
110
111
011
010
[3]
(xk , wk )nk=1
0.1
Computational aspects of Bayesian variable selection
011
010
001
000
0.0
Nicolas Chopin
111
110
101
100
011
010
001
000
0.0
0.1
0.2
0.3
[0]
100
[2]
001
000
0.4
111
110
101
100
011
010
001
000
0.0
0.1
0.2
0.1
0.0
0.4
[0]
(xk , wk )nk=1
0.2
0.3
[0]
(xk , wk )nk=1
0.3
(xk , wk )nk=1
0.3
0.4
Evolution of the particle system
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential importance sampling
[0]
[2]
(xk , wk )nk=1
0.3
0.4
Monitor the weight degeneracy
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
0.1
0.2
As the weights become rather
uneven, we stop the weighting
and run a resample-move step.
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential Monte Carlo
[0]
[2]
(x̂k , wk )nk=1
0.3
0.4
Resample step
111
110
101
100
011
010
001
0.3
0.4
000
0.0
0.1
0.2
We draw an independent sample
x̂1 , . . . , x̂n from x1 , . . . , xn with
[2]
P (x̂ = xk ) = wk .
Approximately, x̂1 , . . . , x̂n is
distributed according to π2 .
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
0.1
0.2
Particles with low weights have
vanished. The system contains
multiple copies of some particles.
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential Monte Carlo
[1]
[2]
(xk , wk )nk=1
0.2
0.3
0.4
Move step
111
110
101
100
011
010
001
0.4
000
0.0
0.1
Let κ2 be a MCMC kernel with
invariant measure π2 :
π2 κ2 = π2
[1]
[0]
Nicolas Chopin
Computational aspects of Bayesian variable selection
111
110
101
100
011
010
001
000
0.0
0.1
0.2
0.3
The particle xk ∼ κ2 (x̂k , ·) is
still distributed according to π2 .
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential Monte Carlo
Remark on asymptotic validity of SMC
Even with these additional resample-move steps, the SMC
algorithm is valid, in the following sense: At the final iteration, the
weighted average
Pn
w ϕ(xk )
k=1
Pn k
k=1 wk
is a consistent, asymptotically normal estimator of Eπ [ϕ]:
Pn
√
wk ϕ(xk )
π
k=1
P
n
− E [ϕ] → N (0, V(ϕ))
n
k=1 wk
(Chopin, 2004)
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential Monte Carlo
Choice of suitable MCMC transition kernels
A good choice is an independent Metropolis-Hastings kernel with
proposal distribution qt on Γ. We sample from κt (ξ, ·) as follows:
I
I
Generate a proposal state γ ∼ qt .
πt (γ)qt (ξ)
Return γ with probability 1 ∧
and ξ otherwise.
πt (ξ)qt (γ)
This Markov kernel leaves πt invariant. The distribution qt is
chosen to be close to πt .
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Sequential Monte Carlo
Choice of a proposal distribution
For the choice of the proposal distribution qt , we refer to our initial
discussion, i.e. we need a parametric family {qθ (·), θ ∈ Θ} such
that ...
1. we can sample from qθ .
2. we can estimate θt from the particle approximation of πt .
3. the model qθt is flexible enough to come close to πt .
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary models suitable for SMC
Independent binary model
For a mean vector θ = p ∈ (0, 1)d we define a trivial independent
binary model as
def.
q(γ | p) =
d
Y
b(γi | pi ),
def.
b(γ | p) = pγ (1 − p)1−γ .
i=1
The model is very easy to fit and to sample from. However, it has
rather limited modelling power.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary models suitable for SMC
Logistic regression binary model
For a set of regression coefficients β = {β i ∈ Ri , i ∈ D}, we
define the logistic binary model as
def.
q(γ | β) =
d
Y
b γi | p(γ |0:i−1 β i ) ,
i=1
def.
p(x) =
1
1 + exp(−x)
where b(γ | p) = pγ (1 − p)1−γ . The first component γ0 = 1 is a
constant added to the binary vector.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary models suitable for SMC
Calibration of the logistic model
Let xk , 1 ≤ k ≤ n, be a particle approximation to πt .
I
We estimate a logistic regression model of dimension i for
each component γi . This is computationally demanding.
I
There are d ! different logistic binary models. However, the
order does not seem to matter in practice.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Choice of the binary model
Calibration speed vs. modelling power
independent model
logistic model
calibration speed
4
6
modelling power
6
4
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Choice of the binary model
Calibration speed vs. high acceptance rates
In the beginning of SMC, πt is still close to a uniform distribution.
I
Kernels based on the independent model achieve sufficient
mutation rates.
I
Calibration of the algorithm is fast.
Towards the end of SMC, πt is close to π and thus rather complex.
I
Kernels based on the independent model fail to efficiently
diversify the particles. We switch to kernels based on the
logistic model.
I
Calibration becomes computationally a lot more demanding.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Choice of the binary model
Some ideas to speed up the calibration
We exclude components γi from the logistic model with a marginal
probability Pπt (γi = 1) close to either bound of the unit interval.
I
These components often suffer from complete separation.
I
Reducing the dimension of the logistic model speeds up the
calibration procedure.
We use the current parameter θt as starting point for the iterative
fitting procedure when calibrating θt+1 .
I
In our experiments, the number of iterations needed for
convergence drop from about 15 to 3.4 on average.
I
Keeping track of starting points while the dimension of the
model changes is a little involved.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Outline
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
The Boston housing example
Famous dataset, contains information collected by the U.S Census
Service concerning housing in the area of Boston Mass.
Main question is: “how much am I ready to pay for clean air?”;
y(i) is median value of owner-occupied homes, covariates include:
I
nitric oxides concentration
I
per capita crime rate
I
distances to five Boston employment centres
I
...
105 covariates in total (including cross covariates), so 2105 models.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
Setup
We integrate a Bayesian posterior with 105 covariates.
I
The data comes from the Boston Housing Dataset. We
crossed all columns to increase the number of covariates.
We compare
1. Sequential Monte Carlo using 20.000 particles. Convergence
requires about 1.1 · 106 evaluations of π.
2. Markov chain Monte Carlo with symmetric MH kernel running
for about 2.5 · 106 evaluations of π.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
Plot of estimates and variation
I
We perform 200 runs of each algorithms.
I
The white boxes contain 80% of the results.
I
The black boxes indicate the outliers.
I
The bar indicates the median.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
Comparison
1. Sequential Monte Carlo
2. Markov chain Monte Carlo
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
Summary of simulations
Advantages of SMC over plain MCMC
I
Adaption is trivial if we employ suitable binary models.
I
SMC is theoretically easy to parallelise (even twice).
I
SMC beats plain MCMC in terms of evaluations of π.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Numerical illustration
Extension to generalized linear models
In generalized linear models, one cannot integrate out the
coefficients β. Possible options seem to be:
1. To define a parametric family of proposal for (β, γ).
2. to use the same algorithm as before, but with p(γ|y) replaced
by a Laplace approximation; note that exact results can be
obtained at the final stage through an importance sampling
step.
3. to use the same algorithm as before, but with p(γ|y) replaced
by an unbiased estimator of this quantity, in the spirit of
PMCMC (Andrieu et al, 2011) or SMC2 (C. et al., 2012); say
we use importance sampling from the Laplace Gaussian
approximation to the true posterior (coonditional on γ).
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
DF_1
DF_2
DF_3
DF_4
DF_5
DF_6
DF_7
DF_8
DF_9
DF_10
DF_11
DF_12
DF_13
DF_14
DF_15
DF_16
DF_17
DF_18
DF_19
DF_20
DF_21
DF_22
DF_23
DF_24
DF_25
DF_26
DF_27
DF_28
DF_29
DF_30
DF_31
DF_32
DF_33
DF_34
DF_35
DF_36
DF_37
DF_38
DF_39
DF_40
DF_41
DF_42
DF_43
DF_44
DF_45
DF_46
DF_47
DF_48
DF_49
DF_50
DF_51
DF_52
DF_53
DF_54
DF_55
DF_56
DF_57
DF_58
DF_59
DF_60
DF_61
DF_62
DF_63
DF_64
DF_65
DF_66
DF_67
DF_68
DF_69
DF_70
DF_71
DF_72
DF_73
DF_74
DF_75
DF_76
DF_77
DF_78
DF_79
DF_80
DF_81
DF_82
DF_83
DF_84
DF_85
DF_86
DF_87
DF_88
DF_89
DF_90
DF_91
OXY_DIS
OXY_X
OXY_Y
OXY_Z
DF_1
DF_2
DF_3
DF_4
DF_5
DF_6
DF_7
DF_8
DF_9
DF_10
DF_11
DF_12
DF_13
DF_14
DF_15
DF_16
DF_17
DF_18
DF_19
DF_20
DF_21
DF_22
DF_23
DF_24
DF_25
DF_26
DF_27
DF_28
DF_29
DF_30
DF_31
DF_32
DF_33
DF_34
DF_35
DF_36
DF_37
DF_38
DF_39
DF_40
DF_41
DF_42
DF_43
DF_44
DF_45
DF_46
DF_47
DF_48
DF_49
DF_50
DF_51
DF_52
DF_53
DF_54
DF_55
DF_56
DF_57
DF_58
DF_59
DF_60
DF_61
DF_62
DF_63
DF_64
DF_65
DF_66
DF_67
DF_68
DF_69
DF_70
DF_71
DF_72
DF_73
DF_74
DF_75
DF_76
DF_77
DF_78
DF_79
DF_80
DF_81
DF_82
DF_83
DF_84
DF_85
DF_86
DF_87
DF_88
DF_89
DF_90
DF_91
OXY_DIS
OXY_X
OXY_Y
OXY_Z
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Nicolas Chopin
Computational aspects of Bayesian variable selection
Numerical illustration
More on binary models
Numerical illustration
Numerical illustration for a Probit model
1. Markov chain Monte Carlo
1.0
0.8
0.6
0.4
0.2
0.0
2. Sequential Monte Carlo (Laplace, no correction)
1.0
0.8
0.6
0.4
0.2
0.0
CREST
DF_1
DF_2
DF_3
DF_4
DF_5
DF_6
DF_7
DF_8
DF_9
DF_10
DF_11
DF_12
DF_13
DF_14
DF_15
DF_16
DF_17
DF_18
DF_19
DF_20
DF_21
DF_22
DF_23
DF_24
DF_25
DF_26
DF_27
DF_28
DF_29
DF_30
DF_31
DF_32
DF_33
DF_34
DF_35
DF_36
DF_37
DF_38
DF_39
DF_40
DF_41
DF_42
DF_43
DF_44
DF_45
DF_46
DF_47
DF_48
DF_49
DF_50
DF_51
DF_52
DF_53
DF_54
DF_55
DF_56
DF_57
DF_58
DF_59
DF_60
DF_61
DF_62
DF_63
DF_64
DF_65
DF_66
DF_67
DF_68
DF_69
DF_70
DF_71
DF_72
DF_73
DF_74
DF_75
DF_76
DF_77
DF_78
DF_79
DF_80
DF_81
DF_82
DF_83
DF_84
DF_85
DF_86
DF_87
DF_88
DF_89
DF_90
DF_91
OXY_DIS
OXY_X
OXY_Y
OXY_Z
DF_1
DF_2
DF_3
DF_4
DF_5
DF_6
DF_7
DF_8
DF_9
DF_10
DF_11
DF_12
DF_13
DF_14
DF_15
DF_16
DF_17
DF_18
DF_19
DF_20
DF_21
DF_22
DF_23
DF_24
DF_25
DF_26
DF_27
DF_28
DF_29
DF_30
DF_31
DF_32
DF_33
DF_34
DF_35
DF_36
DF_37
DF_38
DF_39
DF_40
DF_41
DF_42
DF_43
DF_44
DF_45
DF_46
DF_47
DF_48
DF_49
DF_50
DF_51
DF_52
DF_53
DF_54
DF_55
DF_56
DF_57
DF_58
DF_59
DF_60
DF_61
DF_62
DF_63
DF_64
DF_65
DF_66
DF_67
DF_68
DF_69
DF_70
DF_71
DF_72
DF_73
DF_74
DF_75
DF_76
DF_77
DF_78
DF_79
DF_80
DF_81
DF_82
DF_83
DF_84
DF_85
DF_86
DF_87
DF_88
DF_89
DF_90
DF_91
OXY_DIS
OXY_X
OXY_Y
OXY_Z
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Nicolas Chopin
Computational aspects of Bayesian variable selection
Numerical illustration
More on binary models
Numerical illustration
Numerical illustration for a Probit model (2)
1. Sequential Monte Carlo (Laplace, with correction)
1.0
0.8
0.6
0.4
0.2
0.0
2. SMC2
1.0
name: musk_full_is, algo: smc, criterion: Full IS, dim: 95, runs: 5, time: 22:50:29, evals: 780.0
0.8
0.6
0.4
0.2
0.0
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary vector models
A little more on binary models (if time permits) ...
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary vector models
Latent variable models
For a vector µ ∈ Rd and a correlation matrix Σ ∈ Rd,d , we define
the normal binary distribution as
def.
q(γ | µ, Σ) = P 1(0,∞) (vi ) = γi , i ∈ D ,
v ∼ N (µ, Σ)
6 We cannot evaluate the mass function q(γ | µ, Σ). The model
is thus difficult to employ in an importance sampling context.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary vector models
Calibration of the normal model
Let xk , 1 ≤ k ≤ n be a particle approximation to πt .
I
Compute the sample moments sij =
I
Adjust µi and σij such that
1
n
Pn
k=1 xk,i xk,j
Φ1 (µi ) = sii
Φ2 (µi , µj ; σij ) = sij
6 The locally adjusted matrix Σ might not be positive definite!
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary vector models
Additive models
Consider additive models based on
Q
P
π(γ) = I⊆D cI i∈I γi .
6 The quadratic binary model
q(γ | A) = a0 + γ | Aγ
is mostly not non-negative when fitted to data.
6 If A is positive definite, the mean of q(γ | A) is typically
around ( 21 , . . . , 21 ) and the correlation is low.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Binary vector models
Multiplicative models
Consider multiplicative models based on
Q
P
π(γ) = exp( I⊆D cI i∈I γi ).
6 The quadratic exponential (log-linear) binary model
q(γ | A) = exp (a0 + γ | Aγ)
does not permit calculation of the marginals distributions.
6 We cannot compute q(γ) = q(γ1 )
Qd
i=2 q(γi |γ 1:i−1 ),
which is
necessary for sampling from q(γ).
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Conclusion
Other adaptive algorithms that may benefit from this work
I
Adaptive MCMC;
I
the Cross-entropy method (for optimising π): in that case, no
need for evaluating the proposal density pointwise. See the
paper for some example.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Conclusion
Conclusion
I
In Adaptive MC, you must keep in mind the three following
requirements for your parametric family of proposal
distributions: (1) easy to sample; (2) easy to fit; (3)
reasonable modelling power.
I
Sampling in {0, 1}d is really more difficult than sampling in
Rd .
I
We think we covered most of the reasonable options for
constructing a parametric familty of proposal distributions for
binary vectors.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST
Motivation
Adaptive MC
Sequential MC
multivariate binary dist’s
Numerical illustration
More on binary models
Conclusion
References
I
Schäfer, C. and Chopin, N. (2012). Adaptive Monte Carlo on
multivariate binary sampling spaces, Statistics and Computing
(in press).
I
on-going work with James Ridgeway on generalized linear
models.
Nicolas Chopin
Computational aspects of Bayesian variable selection
CREST