Conditions for Posterior Contraction in the Sparse Normal Means Problem S.L. van der Pas, J.-B. Salomond and J. Schmidt-Hieber Bayes Club, April 22, 2016 Conclusion The horseshoe is not special. 1 / 36 Sparsity 2 / 36 Sparsity Image courtesy of The Palomar Observatory 3 / 36 Sparsity Detection of supernovae transients. Clements N, Sarkar SK, Guo W. (2011) Astronomical transient detection using grouped p-values and controlling the false discovery rate. Statistical Challenges in Modern Astronomy V. Eds. Feigelson ED, Babu GJ, Springer-Verlag. 4 / 36 Sparsity Background pixel from i th night: N (µi , σi2 ). Source j: strength θj . Hypotheses: H 0 : θj = 0 H 1 : θj > 0 5 / 36 Sparsity Breast cancer gene expression data. Dobra A, Hans C, Jones B, Nevins JR, Yao G, West M. (2004) Sparse graphical models for exploring gene expression data. Journal of Multivariate Analysis 90, 196-212. 6 / 36 Sparsity Wavelet coefficients. From James S. Walker, Wavelet-based Image Compression. 7 / 36 The sparse normal means problem Nearly black vector θ ∈ `0 [pn ]. At most pn nonzeroes (signals). Observe Yi = θi + εi , i = 1, . . . , n, where εi ∼ N (0, 1), i.i.d. Assume: pn → ∞, pn /n → 0 as n → ∞. 0 0 0 nonzero 0 0 0 0 .. . θ= 0 0 nonzero 0 .. . 0 8 / 36 Minimax risk As n, pn → ∞: inf b 2 = 2pn log n (1 + o(1)), sup Eθ kθ − θk pn b n θ∈`0 [pn ] θ∈R where k.k denotes the `2 norm. [Donoho, Johnstone, Hoch and Stern (1992)] 9 / 36 Goal Assumption: there is some true θ0 generating the data. Goal: recovery and optimal posterior contraction. 10 / 36 The horseshoe Introduced by Carvalho, Polson and Scott (2010). θi |λi , τ ∼ N 0, τ 2 λ2i , λi ∼ C + (0, 1), τ = 0.05 i = 1, . . . , n. τ=1 1.0 0.8 0.6 0.4 0.2 0.0 −3 −2 −1 0 θ 1 2 3 11 / 36 The horseshoe works really well I Great performance in many simulation studies. I Posterior mean achieves minimax rate. [vdP, Kleijn, van der Vaart (2014)] I Posterior contracts at the minimax rate. [vdP, Kleijn, van der Vaart (2014)] Caveat: τ can be at most of order pn n p log(n/pn ). 12 / 36 Why does the horseshoe work so well? I Pole at zero. I Heavy tails. 6 Balancing act leads to thresholding effect. 4 2 0 2 4 6 8 y 13 / 36 Scale mixtures of normals θi | σi2 ∼ N (0, σi2 ), σi2 ∼ π(σi2 ), i = 1, . . . , n π : [0, ∞) → [0, ∞) density on the positive reals. Examples: I Horseshoe: π(u) = I 1 1 √ . πτ u(1 + u/τ 2 ) Normal-gamma: π(u) = β τ τ −1 −βu u e . Γ(τ ) 14 / 36 Unrealistic assumption We assume that pn , the number of nonzeroes, is known. 15 / 36 Result Under three conditions on π(·): I The posterior contracts at the minimax rate. I The posterior mean achieves the minimax rate. 16 / 36 Conditions - examples I Global-local scale mixtures of normals θi | σi2 , τ 2 ∼ N (0, σi2 τ 2 ), σi2 ∼ π e(σi2 ), i = 1, . . . , n with π e(u) = Ku −(1+a) L(u), where K > 0 is a constant and L : (0, ∞) → (0, ∞) a non-constant, slowly varying function. I I I I I I Horseshoe. Normal-exponential-gamma. Three parameter beta normal mixtures. Generalized double Pareto. Inverse gamma. Half-t. Ghosh and Chakrabarti (2015): results assuming a ∈ [1/2, 1). Our conditions: a ≥ 1/2. 2a p τ ≤ (pn /n) log(n/pn ) a ∈ [1/2, 1) τ 2 ≤ pn /n p τ 2 ≤ (pn /n) log(n/pn ) a=1 a>1 17 / 36 Conditions - more examples I Inverse-Gaussian. [Caron and Doucet (2008)] (pn /n)K . τ . (pn /n) I p log(n/pn ) for some K > 1. Horseshoe+. [Bhadra, Datta, Polson and Willard (2015)] (pn /n)K . τ . (pn /n)(log(n/pn ))−1/2 for some K > 1. I Normal-gamma. [Griffin and Brown (2005), Caron and Doucet (2008)] (pn /n)K . τ . (pn /n) I p log(n/pn ) for some K . Spike-and-slab Lasso. [Rockova (2015)] (pn /n)K . ω . (pn /n) p log(n/pn ) ≤ 1/2 for some K > 1 and τ = (pn /n)α with α ≥ 1. 18 / 36 If it looks like a horseshoe and shrinks like a horseshoe... π(σ2i ) π(σ2i ) τ = 0.05 τ=1 5 π(σ2i ) τ = 0.05 τ=1 5 3 3 3 1 1 1 0 1 0 1 σ2i τ = 0.05 τ=1 2 p(θi) τ = 0.05 τ=1 1 −2 0 θi 1 σ2i p(θi) 1 0 0 σ2i p(θi) 2 τ = 0.05 τ=1 5 2 0 2 τ = 0.05 τ=1 1 −2 0 θi 2 0 −2 0 θi 2 Figure : Horseshoe, Inverse-Gaussian and normal-gamma. 19 / 36 Conditions - summary Write sn = pn n log(n/pn ). 20 / 36 Conditions - nonzero means Write sn = pn n log(n/pn ). 21 / 36 Condition for the nonzero means There exist constants b0 , C 0 > 0, b, K ≥ 0 and u∗ ≥ 1 such that π(u) = Ln (u)e−bu for a uniformly regular varying function L and C 0 π(u) ≥ p K n n e−b 0 u for all u ≥ u∗ . I Tail of π may decay exponentially fast. I n dependence of π should behave roughly as a power of pn /n. 22 / 36 Condition for the nonzero means - special case For global-local scale mixtures of normals: θi | σi2 , τ 2 ∼ N (0, σi2 τ 2 ), σi2 ∼ π e(σi2 ), i = 1, . . . , n the following is sufficient: I π e is uniformly regular varying and does not depend on n. I τ = (pn /n)α for α ≥ 0. For example, for the horseshoe: π e(u) = 1 1 √ π u(1 + u) we have: a−3/2 ≤ π e(au)/e π (u) ≤ 2a−1/2 (1 + a)−1 for all a ∈ [1, 2] and u ≥ 1. 23 / 36 Conditions - zero means Write sn = pn n log(n/pn ). 24 / 36 Conditions for the zero means Multiple choice! Choose I Condition A; I Condition B; I Conditions 2 and 3. 25 / 36 Conditions A and B sn = pn n log(n/pn ) Condition A: There exists a constant C such that C pn p π(u) ≤ 3/2 log(n/pn ) n u for all u ≥ sn . Condition B: There exists a constant C such that Z ∞ pn π(u)du ≤ C . n sn Nearly all mass is contained in the shrinking interval [0, sn ]. 26 / 36 Conditions 2 and 3 Condition 2: There is a constant C > 0 such that R1 0 π(u)du ≥ C. Prior π puts finite mass on values between [0, 1]. Condition 3: Write bn = Z 1 Z uπ(u)du + sn 1 ∞ p log(n/pn ). There is a constant C > 0 such that Z bn2 b3 1 √ π(u)du ≤ Csn . min u, √n π(u)du + bn u u 1 Decay of π away from a neighborhood of zero should be fast. 27 / 36 Conditions - summary Write sn = pn n log(n/pn ). 28 / 36 Are the conditions sharp? Condition 3(κn ): There is a constant C > 0 such that Z 1 Z ∞ uπ(u)du + κn sn 1 We cannot relax to Z bn2 b3 1 √ π(u)du ≤ Csn . min u, √n π(u)du + bn u u 1 R1 sn uπ(u)du . tn for tn sn . Theorem: For any positive sequence (κn )n tending to zero, there exists a prior π satisfying Conditions 2 and 3(κn ), and a positive sequence (Mn )n tending to infinity, such that Eθ0 =0 Π(θ : kθk22 ≤ Mn log n | Y n ) → 0, n → ∞. 29 / 36 Simulations 30 / 36 The simulation results are not surprising MISE for pn = 10, all means Lasso, λ = 2n/log(n) Lasso, λ = 1 GC, a = 0.1 GC, a = 0.4 Normal−Gamma ● ● ● Horseshoe 2000 1000 500 100 50 ● ● ● ● p pn = 10, nonzero means equal to 5 2 log n. 31 / 36 The simulation results are not surprising nonzero means 2000 1000 500 100 50 ● ● ● ● p pn = 10, nonzero means equal to 5 2 log n. 32 / 36 The simulation results are not surprising zero means 500 100 50 ● ● 10 ● ● 100 250 500 1000 n p pn = 10, nonzero means equal to 5 2 log n. 33 / 36 What’s next? I Adaptivity I Model selection I Credible sets Marginal 95% credible sets empirical Bayes with MMLE ● ● 7 ●● ● 1.5 0 ● ● ● ● ● ●●●●●●●●●●●●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●●●●●●●●●●● ●● ●●●●●●● ●●●●●●● ●●●●●●●●●●●●●●●●●●●● ●●● ●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●● ● ● ● ● ● 34 / 36 Conclusion The horseshoe is not special. 35 / 36 ... or is it? 36 / 36
© Copyright 2026 Paperzz