$ ' On using Sufficient Statistics in (Power) Expected Posterior Prior for Bayesian Model Comparison Dimitris Fouskakis, Department of Mathematics, School of Applied Mathematical and Physical Sciences, National Technical University of Athens, Athens, Greece; e-mail: [email protected]. Joint work with: Ioannis Ntzoufras Department of Statistics Athens University of Economics and Business Athens, Greece; e-mail: [email protected] & Luis Pericchi Department of Mathematics University of Puerto Rico San Juan, PR, USA; e-mail: [email protected] Presentation is available at: & www.math.ntua.gr/∼fouskakis/Presentations/Cancun/Presentation Cancun.pdf. % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 2 $ Synopsis 1. Bayesian Model Comparison 2. Expected-Posterior Prior (EPP) 3. Power-Expected-Posterior Prior (PEP) 4. EPP & PEP using Sufficient Statistics 5. Example 1: Normal Mean Hypothesis Testing 6. Example 2: Normal Linear Regression 7. An Alternative Approach 8. Discussion & % July 2014: ' 1 Twelfth World Meeting of ISBA, Cancun, Mexico 3 $ Bayesian Model Comparison Within the Bayesian framework the comparison between models M0 and M1 is evaluated via the Posterior Odds (PO) P OM1 ,M0 f (M0 |y) f (y|M0 ) π(M0 ) ≡ = × = BFM0 ,M1 × OM0 ,M1 f (M1 |y) f (y|M1 ) π(M1 ) (1) which is a function of the Bayes Factor BFM0 ,M1 and the Prior Odds OM0 ,M1 . In the above f (y|M ) is the marginal likelihood under model M and f (M ) is the prior probability of model M . The marginal likelihood is given by: Z f (y|M ) = f (y|θ, M )π(θ|M )dθ, (2) where f (y|θ, M ) is the likelihood under model M with parameters θ and π(θ|M ) is the prior distribution of model parameters given model M . & % July 2014: ' 2 Twelfth World Meeting of ISBA, Cancun, Mexico 4 $ Expected-Posterior Priors (EPP) Pérez & Berger (2002, Biometrika) developed priors for use in model comparison, through utilization of the device of imaginary training samples. They defined the expected-posterior prior (EPP) as the posterior distribution of a parameter vector for the model under consideration, averaged over all possible imaginary samples y ∗ = (y1∗ , . . . , yn∗ ∗ )T coming from a “suitable” predictive distribution m∗ (y ∗ ). Hence the EPP for the parameters of any model Mℓ is πℓEP P (θ ℓ ) = Z πℓN (θ ℓ |y ∗ ) m∗ (y ∗ ) dy ∗ , (3) where πℓN (θ ℓ |y ∗ ) is the posterior of θ ℓ for model Mℓ using a baseline prior πℓN (θ ℓ ) and data y ∗ . & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 5 Features of EPP $ • The EPP naturally arises as the posterior distribution averaged over all possible samples coming from a predictive measure, which is usually the prior predictive of a baseline model. In nested cases usually the baseline model is the simplest model. In this case EPP is the same as the Intrinsic Prior. • EPP is a method to make priors compatible across models, through their dependence on a common marginal data distribution. • One of the advantages of using EPPs is that impropriety of baseline priors causes no indeterminacy. Impropriety in m∗ also does not cause indeterminacy, because m∗ is common to the EPPs for all models. • Usually we choose the smallest n∗ for which the posterior is proper; this is the minimal training sample size. • Main Issue: In variable selection problems specification of X∗ℓ . & % July 2014: ' 3 Twelfth World Meeting of ISBA, Cancun, Mexico 6 Power-Expected-Posterior (PEP) Priors = Z πℓP EP (θ ℓ ; δ) = Z πℓEP P (θ ℓ ) | {z w } w Ä $ πℓN (θ ℓ |y ∗ ) m∗ (y ∗ ) dy ∗ | {z w } | {z w } w w Ä Ä πℓN (θ ℓ |y ∗ , δ) m∗ (y ∗ , δ) dy ∗ | {z } | {z w w } w w Ä Ä we substitute the likelihood terms with the density normalized power - likelihoods (as in Fouskakis, Ntzoufras & Draper, 2014) f (y∗ | βℓ , σ 2 , mℓ ; X∗ℓ , δ) ∝ f (y∗ |βℓ , σ 2 , mℓ ; X∗ℓ )1/δ = fNn∗ (y∗ ; X∗ℓ βℓ , δσ 2 In∗ ) We can set δ = n∗ , n∗ = n and therefore X∗ℓ = Xℓ ; by this way we also dispense with the selection of the training samples. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 7 $ Features of PEP PEP priors method amalgamates ideas from Intrinsic Priors, EPPs, Unit Information Priors and Power Priors, to unify ideas of Non-Data Objective Priors. Problems that aims to solve PEP (but keeping the advantages of Intrinsic Priors and EPPs): • Dependence of training sample size. • Lack of robustness respect to the sample irregularities. • Excessive weight of the prior when the number of parameters is close to the number of data. PEP solves those but still being a fully objective method. • Choose δ = n and X∗ℓ = Xℓ . & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 8 $ Sensitivity analysis on imaginary sample size 1.0 Figure 1: Posterior marginal inclusion probabilities, for n∗ values from 17 to n = 50, with the PEP prior methodology (simulated example for a variable selection problem in normal linear model). X7 X1 0.8 0.6 0.4 X5 X6 0.2 Posterior Inclusion Probabilities X11 X10 0.0 X13 & 20 25 30 35 40 45 Imaginary/Training Data Sample Size (n∗ ) 50 % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 9 $ Sensitivity analysis on imaginary sample size (cont.) 0 −5 −15 −10 Posterior Values 5 10 Figure 2: Boxplots of the posterior distributions of the regression coefficients. For each coefficient, the left-hand boxplot summarizes the EPP results and the right-hand boxplot displays the PEP posteriors; solid lines in both posteriors identify the MLEs. We used the first 20 observations from the simulated data-set and a randomly selected training sample of size n∗ = 17. & β1 β2 β3 β4 β5 β6 β7 β8 β9 β10 Regression Coefficients β11 β12 β13 β14 β15 % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 10 $ Features of PEP (cont.) • In Fouskakis, Ntzoufras & Draper, 2014 (Bayesian Analysis, forthcoming) we illustrated the robustness of the PEP with respect to the training sample size • and also that PEP is not so informative when the number of parameters is close to the number of data. • However, the Intrinsic priors implied by the PEP method and more generally the theoretical properties of PEP were until now still unexplored. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 11 $ Features of PEP (cont.) In this talk we: 1. Derive the PEP priors for important models, including General Linear Gaussian Models. – For Normal models the PEP prior can be expressed as a mixture of g-priors. – the PEP priors have attractive characteristics, like being centered around Null Models (Savage’s Continuity Condition) and not concentrating (point masses) around the null as the training sample sizes is allowed to grow. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 12 $ Features of PEP (cont.) 2. Prove the equivalence between EPP & PEP based on individual training samples and training samples based on Sufficient Statistics. – Practical Advantage: Sufficient Statistics are far cheaper to generate than large training samples. – Tentative Research Ideas: Used different versions of sampling distributions of Sufficient Statistics to construct the EPP & PEP priors. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 13 $ Features of PEP (cont.) 3. Prove that PEP induces a Model Selection Method that, for : i) For fixed p and fixed n is free of information inconsistency (This is NOT the case for g-priors). ii) For fixed p and growing n, the PEP method is consistent. & % July 2014: ' 4 Twelfth World Meeting of ISBA, Cancun, Mexico 14 $ EPP & PEP using Sufficient Statistics • The idea is to express the EPP and PEP priors as an average over all possible sets of sufficient statistics based on imaginary data coming from the baseline (simplest) model instead of all possible sets of imaginary data. • The above might result to a great reduction of the problem dimensionality. This can be beneficial especially in PEP where the dimension is n and also in cases where the prior and the posterior are not available in closed form expression. • Assumptions: – n > dℓ for all ℓ. – M0 ⊂ Mℓ for all ℓ. Hence the minimal sufficient statistic of each model under consideration is always a sufficient statistic of M0 . & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 15 For the Regular Exponential Family $ Suppose that the likelihood (under model Mℓ ) has the following form: kℓ δ X X wℓi (θ ℓ ) tℓi (yj∗ ) . f (y ∗ |θ ℓ , Mℓ ) = hℓ (y1∗ ) . . . hℓ (yδ∗ )cℓ [θ ℓ ]δ exp i=1 T ∗ℓ Pδ = Tℓi (y ) = j=1 tℓi (yj∗ ), i = 1, . . . , kℓ , is a : Then = sufficient statistic (of dimension kℓ ) for θ ℓ , under model Mℓ . If the set [wℓ1 (θ ℓ ), . . . , wℓkℓ (θ ℓ ), for all θ ℓ ], for all ℓ, contains an open set in Rkℓ , the distribution of T ∗ℓ has the form: Ãk ! ℓ X ∗ ∗ δ ∗ ∗ ∗ fT ∗ (u1 , . . . , ukℓ |θ ℓ , M ) = Hℓ (u1 , . . . , ukℓ )cℓ [θ ℓ ] exp wℓi (θ)u∗i . ℓ ∗ T ∗ ) , . . . , Tℓk (Tℓ1 ℓ Tℓi∗ ∗ j=1 (4) i=1 In the above we assume that the imaginary data have been generated from a model M ∗ (that’s why in the above formula we condition on M ∗ ). & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 16 Details (cont.) πℓEP P SS (θ ℓ ) = πℓP EP SS (θ ℓ |δ) = Z Z $ πℓN (θ ℓ |T ∗ℓ )Gℓ (T ∗ℓ |M ∗ )dT ∗ℓ πℓN (θ ℓ |T ∗ℓ , δ)Gℓ (T ∗ℓ |M ∗ , δ)dT ∗ℓ We the above definition we still have π0EP P SS (θ 0 ) = π0P EP SS (θ 0 |δ) = π0N (θ 0 ). Specification of the hyper-prior Gℓ : Z Gℓ (T ∗ℓ |M ∗ = M0 ) = gℓ (T ∗ℓ |θ 0 , M0 )π0N (θ 0 )dθ 0 . We consider gℓ to be the actual sampling distribution of T ∗ℓ for model Mℓ , given that the imaginary data have been generated from model M0 (i.e. M ∗ = M0 ), i.e. & gℓ (u∗1 , . . . , u∗kℓ |θ 0 , M0 ) = fT ∗ (u∗1 , . . . , u∗kℓ |θ 0 , M0 ). ℓ % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 17 Theorem (omitting details) $ Let Y ∗ = (Y1∗ , . . . , Yδ∗ )T be independent and identically distributed random variables. Let us further consider T ∗ℓ = (Tℓ1 (Y ∗ ), . . . , Tℓkℓ (Y ∗ ))T to be a (minimal) sufficient statistic of dimension kℓ < n for the parameter vector θ ℓ under model Mℓ . We assume that the reference model M0 is nested in Mℓ and therefore T ∗ℓ is also sufficient statistic for θ 0 under model M0 . If g0 (T ∗ℓ |θ 0 ) is the sampling distribution of T ∗ℓ under the reference model M0 , then Z πℓEP P SS (θ ℓ ) = πℓN (θ ℓ |T ∗ℓ )m0 (T ∗ℓ )dT ∗ℓ Z Z = πℓN (θ ℓ |T ∗ℓ ) g0 (T ∗ℓ |θ 0 )π0N (θ 0 )dθ 0 dT ∗ℓ Z = πℓN (θ ℓ |Y ∗ )m0 (Y ∗ )dY ∗ = π EP P (θ ℓ ). Remark: The proof of the above Theorem is general and it holds for any likelihood function. Therefore it holds for PEP as well. & % July 2014: ' 5 Twelfth World Meeting of ISBA, Cancun, Mexico 18 $ Example 1: Normal Mean Hypothesis Testing Let y = (y1 , . . . , yn )T be a random sample from Normal(µ, σ 2 ). We would like to test the hypothesis H0 : µ = 0 versus H1 : µ 6= 0. Hence the two competing models are M0 : Normal(0, σ 2 ) versus M1 : Normal(µ, σ 2 ) (µ 6= 0). The baseline (reference) prior under the two models is given by π0N (σ) ∝ σ −1 and π1N (µ, σ) ∝ σ −2 . Let y ∗ = (y1∗ , . . . , yδ ∗ )T be a training (imaginary) sample of size δ: δ ≤ n. It is well known that the sufficient statistic for µ, σ under M1 , is: ³ ´ T = T 1 , T2 = ³ Y, n X i=1 ´ (Yi − Y )2 . For the following analysis we use M0 as the reference model. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 19 $ Example 1: EPP using sufficient statistics The distribution of T is ³ T1 ∼ Normal µ, and T1 , T2 are independent. 2 σ δ ´ and T2 ∼ Gamma ¡ δ−1 2 , 1 2σ 2 ¢ . (5) The EPP using the distribution of the sufficient statistics is given by · N ∗ ∗ ¸ m0 (T1 , T2 ) IPrior Eq. EP P SS N ∗ ∗ π1 (µ, σ) = π2 (µ, σ)ET1 ,T2 |µ,σ ∗ ∗ mN 1 (T1 , T2 ) Z 1 ´ ³ ¡ δ−1 δ ¢ −1 σ2 fN µ; 0, δt fB t; 2 , 2 dt . = σ 0 & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 20 $ Example 1: PEP using sufficient statistics Under the PEP approach we consider the normalized version of the power likelihood resulting in f (y ∗ |µ, σ, δ) = fN (y ∗ ; µ1δ , δσ 2 I δ ) . The distribution of T now is ¡ T1 ∼ Normal µ, σ and T1 , T2 are independent. & 2 ¢ and T2 ∼ Gamma ¡ δ−1 2 , 1 2δσ 2 ¢ . (6) % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 21 $ Then, the PEP using the distribution of the sufficient statistics is given by · N ∗ ∗ ¸ m0 (T1 , T2 ) IPrior Eq. π2P EP SS (µ, σ) = π2N (µ, σ)ET1∗ ,T2∗ |µ,σ ∗ ∗ mN 1 (T1 , T2 ) Z 1 ³ ´ ¡ δ−1 δ ¢ 2 −1 σ fN µ; 0, t fB t; 2 , 2 dt . = σ 0 & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 22 $ Example 1: Comparison Method EPP PEP & π1 (µ, σ) σ σ R −1 1 0 R −1 1 0 ³ fN µ; 0, ³ fN µ; 0, 2 σ δt 2 σ t ´ ´ E[µ|σ] ¡ fB t; ¡ fB t; δ−1 δ 2 , 2 δ−1 δ 2 , 2 ¢ ¢ dt 0 dt 0 V [µ|σ] 1 2 2δ−3 δ→∞ δ σ δ−3 −−−→ 0 δ→∞ 2 σ 2 2δ−3 δ−3 −−−→ 2σ % July 2014: ' 6 Twelfth World Meeting of ISBA, Cancun, Mexico 23 Example 2: Normal Linear Regression $ Let y = (y1 , . . . , yn )T be a random sample. We would like to compare the models: M0 : Normal(y|X0 β 0 , σ02 ), π1N (β 0 , σ0 ) ∝ σ01+k0 vs. M1 : Normal(y|X1 β 1 , σ12 ), π2N (β 1 , σ1 ) ∝ σ11+k1 where X0 is an (n × k0 ) matrix, X1 is an (n × k1 ) matrix, k0 < k1 and M0 is nested in M1 . Let y ∗ = (y1∗ , . . . , yδ ∗ )T be a training (imaginary) sample of size δ: δ ≤ n and let X∗0 and X∗1 denote the corresponding design matrices. It is well known that the sufficient statistic for β 1 , σ1 under M1 , is: ³ ´ 2 c, R . T1 = β 1 1 For the following analysis we use M0 as the reference model. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 24 $ Example 2: EPP & PEP using sufficient EPP PEP −(k0 +1) R 1 σ1 0 −(k0 +1) R 1 σ1 0 ³ fN β e ; 0, ³ fN β e ; 0, ³ σ12 t V ´ δσ12 t V ´T ¡ fB t; ´ ¡ fB t; δ+k0 −k1 δ−k0 , 2 2 δ+k0 −k1 δ−k0 , 2 2 In the above tables we have β 1 = β T0 , β Te , ¸ · ³ ´−1 X0∗ T Xe∗ and X∗1 = [X∗0 |X∗e ] V −1 = Xe∗ T Iδ − X0∗ X0∗ T X0∗ & ¢ dt ¢ dt % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 25 $ Example 2: EPP & PEP using sufficient (cont.) Method E[β e |β 0 , σ1 ] EPP 0 PEP 0 1 −1 V δ→∞ δ where we assume that A = lim & V [β e |β 0 , σ1 ] 2δ−3 2 σ δ+k0 −3 1 V δ→∞ −−−→ 0 δ→∞ 2δ−3 2 2 −1 σ V − − − → 2σ δ δ+k A 1 1 0 −3 is a positive semi-definite matrix. % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 26 $ PEP & Information Consistency For any model Mi if {y m , m = 1, 2, ....} is a sequence of data vectors of fixed size such that as m → ∞, supa,βi fi (y m |a, βi ) Λi0 (y m ) = →∞ supa f0 (y m |a) then BFi0 (y m ) → ∞ & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 27 $ PEP & Model Selection Consistency P EP For fixed p and growing n, the PEP method is consistent; i.e. BFjT → 0 as n → ∞, where Mj 6= MT and MT is the true data generating regression model. & % July 2014: ' 7 Twelfth World Meeting of ISBA, Cancun, Mexico 28 $ An Alternative Approach In this case we consider the part of the likelihood (or the power-likelihood under the PEP approach) of y1∗ , . . . , yδ ∗ that depends on both T ∗ℓ and θ and we normalize it with respect to T ∗ℓ in order to form a pdf of T ∗ℓ . Therefore ´ ³P kℓ ∗ exp w (θ )u ℓi ℓ i i=1 ∗ ∗ ´ ³P . gℓ (u1 , . . . , ukℓ |θ ℓ , M0 ) = R kℓ ∗ ∗ ∗ exp i=1 wℓi (θ ℓ )ui du1 . . . dukℓ Again in the above we assume that the imaginary data have been generated from the model M0 (i.e. M ∗ = M0 ). Furthermore, again, under this approach the EPP (or PEP) under model M0 is equal to the reference prior π0N (θ 0 ). & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 29 $ Example 1 (cont.) Method EPP 1 σ EPP 2 PEP 1 PEP 2 & −1 0 σ σ R1 R −1 1 0 R −1 1 0 σ fN ³ π1 (µ, σ) ´ ¡ δ−1 δ ¢ 2 σ µ; 0, δt fB t; 2 , 2 dt ³ fN µ; 0, ³ fN µ; 0, R −1 1 0 ³ 2 σ δt 2 σ t fN µ; 0, ´ ´ 2 σ t ¡ fB t; 1, ¡ fB t; ´ ¡ 3 2 ¢ δ−1 δ 2 , 2 fB t; 1, 3 2 ¢ dt ¢ dt dt E[µ|σ] V [µ|σ] 0 σ 2 2δ−3 δ δ−3 0 ∞ (for δ fixed) 0 (for δ → ∞) 0 σ 2 2δ−3 δ−3 0 ∞ (δ free) % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 30 Example 2 (cont.) Method EPP 1 EPP 2 PEP 1 PEP 2 π1 (β 1 , σ1 ) ´ ³ ¡ δ+k0 −k1 δ−k0 ¢ −(k0 +1) R 1 σ12 , 2 dt β e ; 0, t V fB t; σ1 f 2 0 N −(k0 +1) R 1 σ1 0 −(k0 +1) R 1 σ1 0 −(k0 +1) R 1 σ1 0 ³ fN β e ; 0, ³ fN β e ; 0, ³ fN β e ; 0, ³ β T0 , β Te σ12 t V ´ δσ12 t V δσ12 t V ´T ¡ fB t; ´ ´ ¡ fB t; ¡ fB t; k0 +2 k1 −k0 +2 2 , 2 ¢ δ+k0 −k1 δ−k0 , 2 2 k0 +2 k1 −k0 +2 2 , 2 In the above tables we have β 1 = , ¸ · ³ ´−1 X0∗ T Xe∗ and X∗1 = [X∗0 |X∗e ] V −1 = Xe∗ T Iδ − X0∗ X0∗ T X0∗ & $ dt ¢ ¢ dt dt % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 31 $ Example 2 (cont.) Method E[β e |β 0 , σ1 ] V [β e |β 0 , σ1 ] EPP 1 0 EPP 2 0 PEP 1 0 2δ−3 2 −1 2 A − − − → 2σ δ δ+k V σ 1 1 0 −3 PEP 2 0 2 σ V −→ δ k1k+2 1 0 2δ−3 2 δ+k0 −3 σ1 V δ→∞ −−−→ 0 δ→∞ k1 +2 2 σ V −→ 1 k0 k0 6=0 0 δ→∞ δ→∞ k1 +2 2 −1 σ1 A k 6=0 k0 0 1 −1 V δ δ→∞ where we assume that A = lim & is a positive semi-definite matrix. % July 2014: ' 8 Twelfth World Meeting of ISBA, Cancun, Mexico 32 $ Discussion • PEP priors have some attractive characteristics, like being centered around Null Models and not concentrating (point masses) around the null as the training sample sizes is allowed to grow. • We proved the equivalence between EPP & PEP based on individual training samples and training samples based on Sufficient Statistics. • Alternative Approach: Use of a different sampling distribution of Sufficient Statistic to construct the PEP priors. Results are great but still justification is missing. • PEP induces a Model Selection Method that, for : i) For fixed p and fixed n is free of information inconsistency. ii) For fixed p and growing n is consistent. iii) For fixed p is robust w.r.t training sample size. & % July 2014: ' Twelfth World Meeting of ISBA, Cancun, Mexico 33 $ Thank You Mexico! & %
© Copyright 2025 Paperzz