SELECTED TOPICS ON BOOTSTRAP (OR: BACK TO STATISTICAL THEORY) DANA KANER M.SC. SEMINAR IN STATISTICS, JUNE 2017 APPROXIMATE LIKELIHOODS - INTRODUCTION ๏ง Suppose that we have a data ๐ = ๐ฅ1 , โฆ , ๐ฅ๐ , n i.i.d observations from ๐น (non-parametric). ๏ง Our statistic ๐ = ๐ก(๐น) estimates the parameter of interest ๐ = ๐ก ๐น . ๏ง We seek an approximate likelihood function for ๐ฝ. ๏ง Why? ๏ง The likelihood can combine information from multiple independent experiments. ๏ง In order do construct (along with the prior distribution) a Bayesian posterior distribution for inference. APPROXIMATE LIKELIHOODS - INTRODUCTION ๏ง First, weโll look at a parametric likelihood of ๐, ๐ฟ ๐ = ๐ 1๐ ๐ฅ๐ ๐ . ๏ง In most situations, ๐ ๐ฅ โ depends on additional โnuisanceโ parameters ๐ beside ๐. ๏ง How to get rid of ๐ once we have ๐ ๐ ๐ฝ, ๐ ? - Our main objective. ๏ง The profile likelihood: ๐ฟ๐๐๐ ๐ = ๐ฟ ๐, ๐๐ . ๏ง The marginal likelihood: ๐ฟ๐๐๐ ๐ = ๐๐ฃ ๐ฃ|๐ , where ๐ฃ = ๐ฃ ๐ฅ is a function whose density ๐๐ฃ ๐ฃ|๐ involves only ๐ ๏ ๐๐ฃ ๐ฃ|๐ = ๐ ๐ ๐ . ๏ง The problem: we donโt always know ๐ฉ ๐ ๐ฝ, ๐ ๏ approximate likelihoods. THE EMPIRICAL LIKELIHOOD ๏ง The nonparametric likelihood for F: ๐ฟ ๐น = ๐ 1 ๐น( ๐ฅ๐ ), where ๐น( ๐ฅ๐ ) is the probability of ๐ฅ๐ under F. ๏ง The profile likelihood for ๐ฝ: ๐ฟ๐๐๐ ๐ = sup ๐ฟ(๐น) ๐น:๐ก ๐น =๐ ๏ง Computation difficulties ๏ restricting the problem to ๐ญ๐ , a discrete distribution with probability mass ๐ค๐ on ๐ฅ๐ , ๐ = 1 โฆ ๐. ๏ง The empirical likelihood: ๐ณ๐๐๐ ๐ฝ = ๐๐๐ ๐:๐ ๐ญ๐ =๐ฝ ๐ ๐ ๐๐ ๏ n parameters. The probability to obtain our sample under ๐น๐ค THE EMPIRICAL LIKELIHOOD - EXAMPLE ๏ง Test scores data set ๐ (took random 22 out of 88, p=5). ๏ง Yellow curve: based on the approximation - likelihood ๏ง ๐ = estimate of the max eigenvalue of the covariance matrix. 1.0 0.8 0.6 0.4 0.2 ๐ ~๐ 2 ๐, ๐๐ต๐ โ ๐ฟ๐๐๐ ๐ = (๐โ๐)2 โ 2 2 ๐ ๐ต๐ ๐ ๏ง Green curve: the empirical likelihood - ๐ฟ๐๐๐ ๐ = ๐ ๐ข๐ ๐ค:๐ก ๐น๐ค =๐ ๐ 1 ๐ค๐ 200 400 600 800 ๐ 1000 1200 THE EMPIRICAL LIKELIHOOD - GOOD TO KNOW ๏ง Computation is difficult (the example is an approximation). ๏ง Attractive properties: ๏ง It is possible to show that in suitable smooth problems, the likelihood ratio derived from the empirical likelihood โ2log ๐ฟ๐๐๐ ๐ ๐ฟ๐๐๐ ๐ ๏ง Works well with transformations (๐ โ ๐(๐)). ๏ง Simple extension to multiple parameters of interest. ~๐12 . APPROXIMATE PIVOT METHODS ๏ง Assume ๐ โ ๐ โ ๐~๐ป, where ๐ is the real value under F, ๐ is a pivotal quantity (does not depend in any unknown parameters) ๏ we can estimate ๐ป for a single value ๐ = ๐ and then infer its value for all ๐. How can we estimate ๐ป? ๏ง Let ๐ โ โ ๐~๐ป. ๏ง Usually ๐ป can not be given in a close form, it is estimated by B BS samples APPROXIMATE PIVOT METHODS ๐1โ ๐1.โ ๐1โ = ๐1.โ โ ๐ ๐ป: The empirical distribution of ๐1โ , โฆ , ๐๐ตโ ๐ = ๐ฅ1 , โฆ , ๐ฅ๐ ๐๐๐๐ ๐น ๐ computed based on X ๐๐ตโ ๐๐ต.โ ๐๐ตโ = ๐๐ต.โ - ๐ APPROXIMATE PIVOT METHODS ๏ง Let โ(๐) be the kernel density estimate of the distribution h, based on ๐1โ , โฆ , ๐๐ตโ : 1 โ ๐ = ๐ตโ ๐ ๐ต ๐=1 ๐ โ ๐๐โ ๐( ) ๐ ๏ง The approximate marginal ๐ฟ ๐ : ๐ฟ ๐ โ ๐ ๐| ๐ = โ ๐ โ ๐ โ โ ๐ 1 = ๐ตโ ๐ ๐ต ๐=1 (๐ โ ๐) โ ๐ ๐ (๐๐โ โ๐) 1 = ๐ตโ ๐ ๐ต ๐=1 2๐ โ ๐ โ ๐๐โ ๐ ๐ APPROXIMATE PIVOT METHODS - EXAMPLE 1.0 . ๏ง Blue curve: approximate likelihood using B=100 and a Gaussian kernel with manually chosen window width s. likelihood ๏ง Yellow curve: ๐ฟ๐๐๐ ๐ = ๐ (๐โ๐)2 โ 2 2๐๐ต๐ 0.8 0.6 0.4 0.2 200 400 600 800 ๐ 1000 1200 BOOTSTRAP PARTIAL LIKELIHOOD ๏ง Similarly to the approximate pivot methods, the goal is to estimate a marginal likelihood ๐ฟ ๐ = ๐(๐|๐). ๏ง The difference: instead assuming the pivot assumption, according to this method we compute the estimated probability ๐(๐|๐) based on nested BS samples. BOOTSTRAP PARTIAL LIKELIHOOD ๐1โ ๐1.โ โโ ๐1,1 ๐(๐ก|๐1.โ )= โโ ๐1,1 โโ ๐1,๐ต 2 โโ ๐1,๐ต 2 ๐๐ตโโ1 ,1 ๐๐ตโโ1 ,1 1 ๐ต2 โ ๐ โโ ๐กโ๐1,๐ ๐ต2 ๐=1 ๐( ๐ ) ๐ = ๐ฅ1 , โฆ , ๐ฅ๐ ๐๐๐๐ ๐น ๐ computed based on X ๐๐ตโ1 ๐๐ต.โ1 ๐๐ตโโ1 ,๐ต2 ๐๐ตโโ1 ,๐ต2 ๐(๐ก|๐๐ต.โ1 )= 1 ๐ต2 โ ๐ โโ ๐กโ๐๐ต ๐ต2 1 ,๐ ) ๐=1 ๐( ๐ ๏ง Now, for t = ๐, ๐ ๐ ๐๐.โ provides an estimate of ๐ฟ ๐ for ๐ = ๐๐.โ . ๏ง For a smooth estimate ~ p ๐ ๐ : we apply a scatterplot smoother to ๐๐.โ , ๐ ๐ ๐๐.โ ๐ต1 ๐=1 . BOOTSTRAP PARTIAL LIKELIHOOD - EXAMPLE 1.0 . ๏ง Red curve: Bootstrap partial likelihood, ๐ต1 = ๐ต2 = 40 likelihood ๏ง Yellow curve: ๐ฟ๐๐๐ ๐ = ๐ (๐โ๐)2 โ 2 2๐๐ต๐ 0.8 0.6 0.4 0.2 ๏ง (overall 1600 BS samples). The window size s of the kernel 200 400 600 800 ๐ ๏ง function and the scatterplot smoother where chosen manually. 1000 1200 SUMMARY ๏ง Warning: It is important to note that in general, non of the described methods produce a true likelihood: a function that is proportional to the probability of a fixed event in the sample space. 1.0 0.8 . Green curve:The empirical likelihood. likelihood Yellow curve: ๐ฟ๐๐๐ ๐ = ๐ (๐โ๐)2 โ 2 2๐๐ต๐ 0.6 0.4 Blue curve:Approximate pivot likelihood. Red curve: Bootstrap partial likelihood. 0.2 200 400 600 800 ๐ 1000 1200 PARAMETRIC AND NON-PARAMETRIC INFERENCE ๏ง The objective of this section: study the relationship of Bootstrap and Jackknife methodologies to more traditional parametric approaches. ๏ง We will focus on estimation of statistic ๐โs variance. โExactโ Nonparametric Bootstrap Approximate Jackknife Infinitesimal Jackknife Nonparametric delta Parametric Bootstrap Fisher information Parametric delta INFLUENCE FUNCTIONS ๐ ๐โ๐ ๐ญ+๐๐น๐ โ๐(๐ญ) ๐ ๐โโ ๏ง The influence function: ๐ผ ๐, ๐ญ = ๐ฅ๐ข๐ฆ ๏ง ๐ก ๐น = ๐ธ๐น ๐ : ๏ง ๐ ๐ฅ, ๐น = lim ๐ฅ 1โ๐ ๐(๐ฅ)+๐๐ฟ๐ฅ ๐๐ฅโ ๐ฅ๐(๐ฅ)๐๐ฅ ๐ ๐โโ ๏ง lim โ๐ ๐ฅ๐(๐ฅ)๐๐ฅ+๐ ๐โโ ๐ฅ๐ฟ๐ฅ ๐๐ฅ ๐ = โ๐ธ๐น ๐ + ๐ฅ ๏ง ๐ก ๐น = ๐๐๐๐๐๐๐น ๐ : ๏ง ๐ ๐ฅ, ๐น = ๐ ๐๐๐(๐ฅโ๐๐๐๐๐๐ ๐ ) 2๐(๐๐๐๐๐๐ ๐ ) = INFLUENCE FUNCTIONS ๏ง Consider a plug-in ๐ = ๐ก ๐น ๏ง ๐ ๐ญ =๐ ๐ญ + ๐ ๐ ๐ ๐๐ผ ~ ๐ = ๐ก ๐น and the expansion: ๐๐ , ๐ญ + ๐ ๐ถ๐ ( ) ๐ ๏ง Consider ๐น โ , the empirical distribution corresponding to the BS sample ๐ฅ โ . ๏ง ๐๐๐๐ญ ๐ ๐ญโ ๏ง โ ๐๐๐๐ญ ๐ ๐ญ BS estimate of Var(t(๐น)) โ ๐ ๐ ๐๐๐๐ญ ๐ผ ๐, ๐ญ = ๐ ๐ ๐ฌ๐ญ ๐ผ๐ ๐, ๐ญ since ๐ฌ๐ญ ๐ผ ๐, ๐ญ =0 JK , IJ & NON PARAMETRIC BS ๏ง Suppose that ๐ญ = ๐ญ, and in ๐ผ ๐, ๐ญ , ๐ = โ ๐ ,๐ ๐โ๐ โ0 ๏ง We obtain something close to the JK variance estimate: ๏ง ( ๐โ๐ ๐ ) ๐ ๐ ๐ (๐ ๐ญ ๐ โ๐ ๐ญ . ๐โ๐ ) ๐ )๐ โ ( ๐ ๐ (๐ ๏ง Now, suppose that ๐ญ = ๐ญ, and ๐ผ ๐, ๐ญ โ ๏ง The infinitesimal jackknife estimate: ๐ ๐ ๐ญ ๐ โ๐ ๐ญ . )๐ = ๐๐๐๐ฑ๐ฒ ๐ ๐ญ . ๐ ๐ ๐ฌ๐ญ ๐ผ๐ ๐, ๐ญ = ( )๐ ๐๐๐๐ฐ๐ฑ ๐ ๐ญ = ๐ ๐ ( ) ๐ ๐ ๐ ๐ ๐ผ (๐๐ , ๐ ๐ ๐ ๐ผ (๐๐ , ๐ญ). ๐ญ) JK , IJ & NON PARAMETRIC BS ๐๐๐๐ญ ๐ ๐ญโ ๏ง If ๐ก ๐น = ๐ก ๐น + ๏ง ๐๐๐๐ญ ๐(๐ญโ ) = 1 ๐ โ ๐๐๐๐ญ ๐ ๐ญ ๐ 1๐ ๐๐๐๐ฐ๐ฑ ๐ โ ๐ ๐ ๐๐๐๐ญ ๐ผ ๐, ๐ญ ๐ฅ๐ , ๐น ๏ ๐ญ = = ๐ ๐ ๐ฌ๐ญ ๐ผ๐ ๐, ๐ญ โExactโ Nonparametric Bootstrap ๐โ๐ ๐๐๐๐ฑ๐ฒ ๐(๐ญ) ๐ Approximate Jackknife Infinitesimal Jackknife Nonparametric delta Parametric Bootstrap Fisher information Parametric delta THE PARAMETRIC BOOTSTRAP FOR MLE ๏ง Consider a sample ๐ drawn from ๐น, a known distribution function with parameter ๐. We would like to estimate a function of ๐ in the real world. ๏ง First, we compute our maximum likelihood estimator ๐. ๏ง We draw B Bootstrap samples ๐ โ of size n from the density ๐๐ ๐ฅ . โ ๏ง We then calculate the maximum likelihood estimate ๐๐ for each BS sample โ โ ๏ง We use the empirical distribution of {๐1 , โฆ , ๐๐ต } to estimate qualities of ๐ . For instance, the sample variance estimates ๐ฃ๐๐๐น ๐. THE PARAMETRIC BOOTSTRAP - EXAMPLE ๏ง ๐~๐ ๐, ๐ 2 Normal model ; ๐ = 9 ; interested in the variance of the mean. 50 ๏ง 500 parametric BS samples of size 9 from ๐(๐, ๐ ๐ 2 ). โ ๏ง Histogram of the 500 values ๐1 , โฆ , ๐500 โ 100 0 of the samples mean. ๏ง Superimposed on the histogram is the density ๐ ๐, ๐ ๐ โ1 . 20 40 60 80 100 Exponential model 300 200 โ โ ๏ง BS sample variance of ๐1 , โฆ , ๐500 =183.7, and ๐ ๐ โ1 =177.7 .100 0 20 40 60 80 100 140 PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS ๏ง The score function: ๐ ๐, ๐ฅ = ๐๐(๐,๐ฅ) ๐๐ ๏ง The Fisher information: ๐ ๐ = โ๐ธ = ๐log(๐ฟ ๐,๐ฅ ) ๐๐ ๐2 ๐ ๐,๐ฅ ๐๐ = ๐ ๐log(๐ฟ(๐,๐ฅ๐ ) 1 ๐๐ = โ๐ธ[ 2 ๐ ๐ log ๐ฟ ๐,๐ฅ๐ 1 ๐๐2 ] ๏ง By the asymptotic distribution of the MLE, under certain conditions (regularity of ๐, ๐ is in the interior of the parameter space): ๐ฝ๐ โ ๐ฝ ~๐ต ๐ฝ, ๐ ๐ฝ โ๐ . ๏ง This suggests that the sampling distribution of ๐ฝ can be approximated by ๐ต ๐ฝ, ๐ ๐ฝ โ๐ . PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS ๏ง Suppose we have a vector of parameters ๐ and we want to conduct inference for a real valued function ๐ = โ(ฮท). Let ๐0 be the true value of ๐. ๏ง Assuming ๐ is the MLE, ๐ = โ(๐) is the MLE for ๐. ๏ง Denote the parametric family of distribution functions for x by ๐นฮท , and ๐น = ๐นฮท0 . ๐2 ๐ ๐;๐ โ๐ธ๐น [ ๐๐(๐;๐) ๐๐1 ๏ง ๐ ๐; ๐ฅ = | ๐๐(๐;๐) ๐๐๐ ,๐ ๐ = ] ๐๐1 2 ๐2 ๐ ๐;๐ โ๐ธ๐น [ ] ๐๐1 ๐๐2 โฆ ๐2 ๐ ๐;๐ โ๐ธ๐น [ ] ๐๐1 ๐๐2 โฆ \ | โ โ๐ธ๐น [ ๐2 ๐ ๐;๐ ๐๐๐ 2 ๐โ(๐) ๐๐1 ,โ ๐ = ] | ๐โ(๐) ๐๐๐ . PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS ๏ง By the chain rule, ๐(๐ ๐ผ ) โ๐ = ๐ ๐ผ ๐ป ๐(๐ผ) โ๐ ๐ ๐ผ ๏ง It can be shown that ๐๐ โ ๐ฝ = ๐ ๐ผ ~๐ต(๐(๐ผ๐ ), ๐ ๐ผ๐ ๐ป ๐(๐ผ๐ ) โ๐ ๐(๐ผ๐ )), ๏ง where ๐ผ๐ can be replaced with ๐ผ, a consistent estimator. ๏ง A property of M-estimators: U x, F = โ ๏ ๐ผ ๐, ๐ญ = ๐ ๐ฝ โ๐ ๐๐ ๐ฅ,๐ ๐๐ โ ๐ ๐ฝ; ๐ = ๐ โ ๐ ๐ผ ๐ป ๐ ๐ผ ๏ ๐ผ ๐, ๐ญ๐ผ = ๐ โ ๐ ๐ผ ๐ป ๐ ๐ผ โ๐ ๐ ๐ผ โ๐ |๐ก ๐น ๐๐น ๐ฅ ๐(๐ผ; ๐) โ1 โ ๐ ๐ฅ, ๐ก ๐น PARAMETRIC ML & BS INFERENCE, JACKKNIFE ๏ง ๐ผ ๐, ๐ญ = ๐ โ ๐ ๐ผ ๐ป ๐ ๐ผ ๏ง Now, using ๐ฃ๐๐๐น (๐ก ๐น )โ ๐๐ 1 ๐ ๐ธ๐น (๐ 2 ๐ฅ, ๐น ), we can see that: โ ๐ธ๐น [๐ โ โ ๐ ๐ ๐ ๐ ๏ง ๐๐๐๐ญ ๐ ๐ญ ๏ง ๐โ โ ๐ โ๐ ๐(๐ผ; ๐) ๐ โ1 ๐ธ [๐(๐; ๐ฅ)๐ ๐น ๏ง Since ๐ โ ๐ธ๐น ๐ ๐; ๐ฅ ๐ ๐; ๐ฅ ๐ โ1 ๐ ๐; ๐ฅ ๐ ๐; ๐ฅ ๐ ๐ ๐ ๐; ๐ฅ ๐ ]๐ ๐ โ1 โ1 โ โ ๐ = โ ๐ ๐๐ ๐ = ๐ธ๐น ๐ ๐; ๐ ๐ ๐; ๐ ๐ =๐ ๐ ๐ ]= โ1 โ ๐ = ๐(๐ ๐ผ ) โ๐ PARAMETRIC ML, PARAMETRIC BS, IJ ๏ง Therefore, we can conclude that for a statistic ๐ก ๐น๐ , ๐น = ๐น๐ : ๏ง ๐๐๐๐ฐ๐ฑ ๐ ๐ญ๐ผ = ๐ ๐ ๐ฌ๐ญ๐ผ [๐ผ๐ ๐, ๐ญ๐ผ ] = ๐ ๐ผ ๐ป ๐ ๐ผ ๏ง Furthermore, of ๐ก ๐น๐ is a linear statistic, ๏ง then the IJ and the inverse Fisher information ๏ง both agree with the parametric BS estimate of ๏ง variance for ๐ก ๐น๐ . โ๐ ๐ ๐ผ = ๐(๐ ๐ผ ) โ๐ โExactโ Nonparametric Bootstrap Approximate Jackknife Infinitesimal Jackknife Nonparametric delta Parametric Bootstrap Fisher information Parametric delta THE DELTA METHOD ๏ง Assume ๐๐ ๐ถ๐ด๐ ๐, i. e. ๐ท ๐ ๐๐ โ ๐ โ ๐ 0, ฮฃ๐น ๐ . ๏ง We will assume that ๐๐ : โ๐ โ โ๐ด consists of means: ๐1 , โฆ , ๐๐ด , ๐ท ๏ง By the CLT, ๐ ๐ โ ๐ โ ๐ 0, ฮฃ๐ ๐ 1 ๐ ๐ 1 ๐๐ (๐๐ ) . โ N 0, ฮฃ๐,๐น ๐ . ๏ง Consider โ ๐ : โ๐ด โ โ1 , ๐ปโ โ 0. ๏ง Then ๐ซ ๐(๐(๐ผ๐ ) โ ๐(๐ผ)) โ ๐ต(๐, ๐ต๐ ๐ผ ๐ป ๐ฎ๐,๐ญ (๐ผ)๐ต๐(๐ผ)). ๏ง In other words, we can estimate ๐๐๐๐ญ ๐(๐ผ ) by ๐ต๐ ๐ผ ๐ป ๐ฎ๐,๐ญ (๐ผ))๐ต๐(๐ผ) ๐ . THE DELTA METHOD ๏ง First, weโll look at the expansion: ๏ง โ ๐ โ โ ๐ + ๐ปโ ๐ ๐ ๐โ๐ ๏ง This implies: ๏ง ๐ฃ๐๐ โ ๐ โ ๐ฃ๐๐ โ ๐ + ๐ปโ ๐ ๏ง = ๐ปโ ๐ ๐ ๐ถ๐๐ฃ(๐)๐ปโ ๐ ๐ ๐โ๐ = ๐ปโ ๐ ๐ ฮฃ๐,๐น (๐) ๐ปโ ๐ ๐ THE DELTA METHOD ๏ง The nonparametric delta method: ๐น โ ๐น: ๏ง Replace the first and second moments with the plugin estimates created from ๐น, ๏ง ๐๐๐๐ต๐ซ (๐ ๐ผ ) = ๐ต๐๐ญ ๐ป ๐ฎ๐,๐ญ ๐ต๐๐ญ ๐ ๏ง The parametric delta method: ๐น โ ๐น๐ : ๏ง ๐๐๐๐ท๐ซ (๐ ๐ผ ) = ๐ต๐๐ญ๐ผ ๐ป ๐ฎ๐,๐ญ๐ผ ๐ต๐๐ญ๐ผ ๐ THE DELTA METHOD FOR THE MEAN โ EXAMPLE ๏ง ๐ฝ ๐๐ , โฆ , ๐๐ = ๐ ๐ ๐ ๐ ๐๐ , h ๐ = ๐ โ ๐ปโ๐น = 1. CLT ๏ ฮฃ๐น = ๐ฃ๐๐(๐ฅ). ๏ง Nonparametric: the plugin estimate of variance, ฮฃ๐น = ๐๐ท ๏ง Var (โ ๐ ) = ๐ปโ๐น ๐ ฮฃ๐,๐น ๐ปโ๐น = ๐ 1 ๐2 ๐ 1 (๐ฅ๐ 1 ๐ ๐ 1 (๐ฅ๐ โ ๐ฅ)2 โ ๐ฅ)2 . ๏ง Parametric delta: if we assume x is exponential, ๐ = ๐ฅ, ฮฃ๐น๐ = ๐ 2 ๐น๐ = ๏ง Var๐๐ท (โ ๐ ) = ๐ปโ๐น ๐ ฮฃ๐,๐น ๐ปโ๐น ๐ ๐ ๐ ๐ = 1 ๐๐ฅ 2 1 . ๐ฅ2 THE DELTA METHODโS CONNECTIONS ๏ง If ๐ก ๐น = ๐๐ ๐ถ๐ด๐ ๐ [or a function of means], then: ๏ง ๐๐๐๐ฐ๐ฑ ๐ ๐ญ = ๐๐๐๐ต๐ซ ๐ ๐ญ . ๏ง For a parametric model: โExactโ Nonparametric Bootstrap Approximate Jackknife Infinitesimal Jackknife Nonparametric delta ๏ง ๐๐๐๐ฐ๐ฑ ๐ ๐ญ๐ผ = ๐๐๐๐ท๐ซ ๐ ๐ญ๐ผ = ๐(๐ ๐ผ ) โ๐ Parametric Bootstrap Fisher information Parametric delta THE EXPONENTIAL FAMILY ๏ง A random variable X is said to have a density in the exponential family if ๏ง ๐๐ ๐ฅ = โ0 ๐ฅ ๐ ๐ ๐ฅ โ๐ ๐ ๐ ๐ , where q is a vector of A sufficient statistics, ๐ is the vector of natural parameters, and the support does not depend on ๐. ๏ง ๐ฌ๐ ๐ฟ = ๐โฒ ๐ผ , ๐๐๐ ๐ ๐ฟ = ๐โฒโฒ ๐ผ . ๏ง ๐ฅ~ exp ๐ , ๐ ๐ฅ = ๐๐ โ๐๐ฅ = 1 โ ๐ ๐โ (โ๐ฅ) โ ๐ โ(โ ln ๏ ๐ธ ๐ ๐ ๏ ๐ฃ๐๐ ๐ ๐ 1 ๐ = ๐ธ โ๐ฅ = โ = = 1 ๐2 = ๐2 โ ๐๐ ๐ ๐๐2 ๐ โ ln ๐ ๐๐ ๐ ) PROOF โ THE PARAMETRIC CASE ๏ง Now, say we have n observations ๐ฅ1 , โฆ , ๐ฅ๐ from the exponential family. ๏ง We can conclude that their joint distribution also belongs to the exponential family, with natural parameter ๐, and with sufficient statistics ๐ = ( ๏ง Therefore, ๐๐ ๐ = โ1 ๐ ๐ ๐(๐ ๏ง ๐ฌ๐ธ ๐ฟ = ๐โฒ ๐ผ , ๐๐๐ ๐ธ ๐ฟ ๐๐ ๐ฅ โ๐ ๐ ) 1 ๐ (๐ฅ ) ๐ 1 ๐ ๐ ,โฆ, . = ๐โฒโฒ ๐ผ ๏ง ๐ satisfies the set of equations: ๐ = ๐โฒ ๐ผ โ ๐ = ๐ฌ ๐ธ ๐ฟ ๐๐๐ ๐๐ ๐ญ๐ผ . 1 ๐ (๐ฅ ) ๐ ๐ด ๐ ๐ ). PROOF โ THE PARAMETRIC CASE ๏ง The solution to ๐ = ๐โฒ ๐ผ โ ๐ผ = ๐โฒโ๐ ๐ . ๏ง If we look at the parametric delta method, which begins with ๐ having variance applies ๐ = โ (๐โฒ ๐ ๐โฒโฒ ๐ ๐ โ1 ), ๐ ๏ง Let K be the matrix of derivatives of ๐โฒโ1 โ . ๏ง ๐๐๐๐ท๐ซ ๐ ๐ญ๐ผ = โ ๐ ๐ ๐พ ๐ ๐ฃ๐๐ ๐ ๐พ๐โ ๐ = โ ๐ ๐ ๐พ ๐ ๐โฒโฒ ๐ ๐พ๐ โ ๐ ๐ ๏ง Therefore, ๐๐๐๐ท๐ซ ๐ ๐ญ๐ผ = ๐ ๐ผ ๐ป ๐(๐ผ) โ๐ ๐ ๐ผ . = โ๐ ๐ ๐ผ ๐ป ๐โฒโฒ ๐ผ ๐ ๐ผ ๐ . , and PROOF โ THE PARAMETRIC CASE ๏ง On the other hand, ๏ง ๐ ๐ = ๐ โ ๐ โฒโฒ ๐ = ๐2 ๐ฃ๐๐(๐) ๏ง ๐ ๐ โ1 =โ ๐ ๐ ๐(๐) โ1 โ ๐ = ๐ ๐ผ ๐ป ๐โฒโฒ ๐ผ โ๐ ๐ ๐ผ ๐ SUMMARY โ ESTIMATION OF VAR(๐) Nonparametric Bootstrap Jackknife Infinitesimal Jackknife Nonparametric delta Parametric BS Fisher information Parametric delta ๐ฃ๐๐๐น ๐ก ๐น โ ๐โ1 ( ) ๐ ๐ (๐ก ๐น 1 1 ( )2 ๐ ๐ โ๐ก ๐น ๐ 2 1 ๐ (๐ฅ๐ , . )2 ๐น) ๐ปโ๐น ๐ ๐ด๐,๐น ๐ปโ๐น ๐ ๐ฃ๐๐๐น๐ โ(๐)โ ๐ปโ ๐ ๐ ๐ ๐ โ1 ๐ปโ ๐ = ๐(โ ๐ ) โ1 ๐ปโ๐น๐ ๐ ๐ด๐,๐น๐ ๐ปโ๐น๐ ๐
© Copyright 2025 Paperzz