Conditions for Posterior Contraction in the Sparse Normal Means

Conditions for Posterior Contraction in the
Sparse Normal Means Problem
S.L. van der Pas, J.-B. Salomond and J. Schmidt-Hieber
Bayes Club, April 22, 2016
Conclusion
The horseshoe is not special.
1 / 36
Sparsity
2 / 36
Sparsity
Image courtesy of The Palomar Observatory
3 / 36
Sparsity
Detection of supernovae transients.
Clements N, Sarkar SK, Guo W. (2011) Astronomical transient detection using grouped p-values and controlling the
false discovery rate. Statistical Challenges in Modern Astronomy V. Eds. Feigelson ED, Babu GJ, Springer-Verlag.
4 / 36
Sparsity
Background pixel from i th night: N (µi , σi2 ).
Source j: strength θj .
Hypotheses:
H 0 : θj = 0
H 1 : θj > 0
5 / 36
Sparsity
Breast cancer gene expression data.
Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. (2004) Sparse graphical models for exploring gene
expression data. Journal of Multivariate Analysis 90, 196-212.
6 / 36
Sparsity
Wavelet coefficients.
From James S. Walker, Wavelet-based Image Compression.
7 / 36
The sparse normal means problem

Nearly black vector θ ∈ `0 [pn ].
At most pn nonzeroes (signals).
Observe
Yi = θi + εi ,
i = 1, . . . , n,
where εi ∼ N (0, 1), i.i.d.
Assume: pn → ∞, pn /n → 0
as n → ∞.
0
0
0
nonzero
0
0
0
0
..
.





























θ=









0




0


 nonzero 




0


..




.
0
8 / 36
Minimax risk
As n, pn → ∞:
inf
b 2 = 2pn log n (1 + o(1)),
sup Eθ kθ − θk
pn
b n θ∈`0 [pn ]
θ∈R
where k.k denotes the `2 norm.
[Donoho, Johnstone, Hoch and Stern (1992)]
9 / 36
Goal
Assumption: there is some true θ0 generating the data.
Goal: recovery and optimal posterior contraction.
10 / 36
The horseshoe
Introduced by Carvalho, Polson and Scott (2010).
θi |λi , τ ∼ N 0, τ 2 λ2i ,
λi ∼ C + (0, 1),
τ = 0.05
i = 1, . . . , n.
τ=1
1.0
0.8
0.6
0.4
0.2
0.0
−3
−2
−1
0
θ
1
2
3
11 / 36
The horseshoe works really well
I
Great performance in many simulation studies.
I
Posterior mean achieves minimax rate. [vdP, Kleijn, van der
Vaart (2014)]
I
Posterior contracts at the minimax rate. [vdP, Kleijn, van der
Vaart (2014)]
Caveat: τ can be at most of order
pn
n
p
log(n/pn ).
12 / 36
Why does the horseshoe work so well?
I
Pole at zero.
I
Heavy tails.
6
Balancing act leads to thresholding
effect.
4
2
0
2
4
6
8
y
13 / 36
Scale mixtures of normals
θi | σi2 ∼ N (0, σi2 ),
σi2 ∼ π(σi2 ),
i = 1, . . . , n
π : [0, ∞) → [0, ∞) density on the positive reals.
Examples:
I
Horseshoe:
π(u) =
I
1
1
√
.
πτ u(1 + u/τ 2 )
Normal-gamma:
π(u) =
β τ τ −1 −βu
u
e
.
Γ(τ )
14 / 36
Unrealistic assumption
We assume that pn , the number of nonzeroes, is known.
15 / 36
Result
Under three conditions on π(·):
I
The posterior contracts at the minimax rate.
I
The posterior mean achieves the minimax rate.
16 / 36
Conditions - examples
I
Global-local scale mixtures of normals
θi | σi2 , τ 2 ∼ N (0, σi2 τ 2 ), σi2 ∼ π
e(σi2 ), i = 1, . . . , n
with
π
e(u) = Ku −(1+a) L(u),
where K > 0 is a constant and L : (0, ∞) → (0, ∞) a
non-constant, slowly varying function.
I
I
I
I
I
I
Horseshoe.
Normal-exponential-gamma.
Three parameter beta normal mixtures.
Generalized double Pareto.
Inverse gamma.
Half-t.
Ghosh and Chakrabarti (2015): results assuming
a ∈ [1/2, 1). Our conditions: a ≥ 1/2.
 2a
p
τ ≤ (pn /n) log(n/pn ) a ∈ [1/2, 1)
τ 2 ≤ pn /n
p
τ 2 ≤ (pn /n) log(n/pn )

a=1
a>1
17 / 36
Conditions - more examples
I
Inverse-Gaussian. [Caron and Doucet (2008)]
(pn /n)K . τ . (pn /n)
I
p
log(n/pn ) for some K > 1.
Horseshoe+. [Bhadra, Datta, Polson and Willard (2015)]
(pn /n)K . τ . (pn /n)(log(n/pn ))−1/2 for some K > 1.
I
Normal-gamma. [Griffin and Brown (2005), Caron and Doucet
(2008)]
(pn /n)K . τ . (pn /n)
I
p
log(n/pn ) for some K .
Spike-and-slab Lasso. [Rockova (2015)]
(pn /n)K . ω . (pn /n)
p
log(n/pn ) ≤ 1/2 for some K > 1 and
τ = (pn /n)α with α ≥ 1.
18 / 36
If it looks like a horseshoe and shrinks like a
horseshoe...
π(σ2i )
π(σ2i )
τ = 0.05
τ=1
5
π(σ2i )
τ = 0.05
τ=1
5
3
3
3
1
1
1
0
1
0
1
σ2i
τ = 0.05
τ=1
2
p(θi)
τ = 0.05
τ=1
1
−2
0
θi
1
σ2i
p(θi)
1
0
0
σ2i
p(θi)
2
τ = 0.05
τ=1
5
2
0
2
τ = 0.05
τ=1
1
−2
0
θi
2
0
−2
0
θi
2
Figure : Horseshoe, Inverse-Gaussian and normal-gamma.
19 / 36
Conditions - summary
Write sn =
pn
n
log(n/pn ).
20 / 36
Conditions - nonzero means
Write sn =
pn
n
log(n/pn ).
21 / 36
Condition for the nonzero means
There exist constants b0 , C 0 > 0, b, K ≥ 0 and u∗ ≥ 1 such that
π(u) = Ln (u)e−bu for a uniformly regular varying function L and
C 0 π(u) ≥
p K
n
n
e−b
0
u
for all u ≥ u∗ .
I
Tail of π may decay exponentially fast.
I
n dependence of π should behave roughly as a power of
pn /n.
22 / 36
Condition for the nonzero means - special case
For global-local scale mixtures of normals:
θi | σi2 , τ 2 ∼ N (0, σi2 τ 2 ),
σi2 ∼ π
e(σi2 ),
i = 1, . . . , n
the following is sufficient:
I
π
e is uniformly regular varying and does not depend on n.
I
τ = (pn /n)α for α ≥ 0.
For example, for the horseshoe:
π
e(u) =
1
1
√
π u(1 + u)
we have: a−3/2 ≤ π
e(au)/e
π (u) ≤ 2a−1/2 (1 + a)−1 for all a ∈ [1, 2] and u ≥ 1.
23 / 36
Conditions - zero means
Write sn =
pn
n
log(n/pn ).
24 / 36
Conditions for the zero means
Multiple choice!
Choose
I
Condition A;
I
Condition B;
I
Conditions 2 and 3.
25 / 36
Conditions A and B
sn =
pn
n
log(n/pn )
Condition A: There exists a constant C such that
C pn p
π(u) ≤ 3/2
log(n/pn )
n
u
for all u ≥ sn .
Condition B: There exists a constant C such that
Z ∞
pn
π(u)du ≤ C .
n
sn
Nearly all mass is contained in the shrinking interval [0, sn ].
26 / 36
Conditions 2 and 3
Condition 2: There is a constant C > 0 such that
R1
0
π(u)du ≥ C.
Prior π puts finite mass on values between [0, 1].
Condition 3: Write bn =
Z
1
Z
uπ(u)du +
sn
1
∞
p
log(n/pn ). There is a constant C > 0 such that
Z bn2
b3
1
√ π(u)du ≤ Csn .
min u, √n π(u)du + bn
u
u
1
Decay of π away from a neighborhood of zero should be fast.
27 / 36
Conditions - summary
Write sn =
pn
n
log(n/pn ).
28 / 36
Are the conditions sharp?
Condition 3(κn ): There is a constant C > 0 such that
Z
1
Z
∞
uπ(u)du +
κn
sn
1
We cannot relax to
Z bn2
b3
1
√ π(u)du ≤ Csn .
min u, √n π(u)du + bn
u
u
1
R1
sn
uπ(u)du . tn for tn sn .
Theorem: For any positive sequence (κn )n tending to zero,
there exists a prior π satisfying Conditions 2 and 3(κn ), and a
positive sequence (Mn )n tending to infinity, such that
Eθ0 =0 Π(θ : kθk22 ≤ Mn log n | Y n ) → 0,
n → ∞.
29 / 36
Simulations
30 / 36
The simulation results are not surprising
MISE for pn = 10, all means
Lasso, λ = 2n/log(n)
Lasso, λ = 1
GC, a = 0.1
GC, a = 0.4
Normal−Gamma ● ● ●
Horseshoe
2000
1000
500
100
50
●
●
●
●
p
pn = 10, nonzero means equal to 5 2 log n.
31 / 36
The simulation results are not surprising
nonzero means
2000
1000
500
100
50
●
●
●
●
p
pn = 10, nonzero means equal to 5 2 log n.
32 / 36
The simulation results are not surprising
zero means
500
100
50
●
●
10
●
●
100
250
500
1000
n
p
pn = 10, nonzero means equal to 5 2 log n.
33 / 36
What’s next?
I
Adaptivity
I
Model selection
I
Credible sets
Marginal 95% credible sets
empirical Bayes with MMLE
●
●
7
●●
●
1.5
0
●
●
●
●
●
●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●
●● ●
● ●
●
●
34 / 36
Conclusion
The horseshoe is not special.
35 / 36
... or is it?
36 / 36