21,24

SELECTED TOPICS ON BOOTSTRAP
(OR: BACK TO STATISTICAL THEORY)
DANA KANER
M.SC. SEMINAR IN STATISTICS, JUNE 2017
APPROXIMATE LIKELIHOODS - INTRODUCTION
 Suppose that we have a data 𝑋 = 𝑥1 , … , 𝑥𝑛 , n i.i.d observations from 𝐹 (non-parametric).
 Our statistic 𝜃 = 𝑡(𝐹) estimates the parameter of interest 𝜃 = 𝑡 𝐹 .
 We seek an approximate likelihood function for 𝜽.
 Why?
 The likelihood can combine information from multiple independent experiments.
 In order do construct (along with the prior distribution) a Bayesian posterior
distribution for inference.
APPROXIMATE LIKELIHOODS - INTRODUCTION
 First, we’ll look at a parametric likelihood of 𝜃, 𝐿 𝜃 =
𝑛
1𝑝
𝑥𝑖 𝜃 .
 In most situations, 𝑝 𝑥 ⋅ depends on additional “nuisance” parameters 𝜆 beside 𝜃.
 How to get rid of 𝝀 once we have 𝒑 𝒙 𝜽, 𝝀 ? - Our main objective.
 The profile likelihood: 𝐿𝑝𝑟𝑜 𝜃 = 𝐿 𝜃, 𝜆𝜃 .
 The marginal likelihood: 𝐿𝑚𝑎𝑟 𝜃 = 𝑞𝑣 𝑣|𝜃 , where 𝑣 = 𝑣 𝑥 is a function whose
density 𝑞𝑣 𝑣|𝜃 involves only 𝜃  𝑞𝑣 𝑣|𝜃 = 𝑝 𝜃 𝜃 .
 The problem: we don’t always know 𝐩 𝒙 𝜽, 𝝀  approximate likelihoods.
THE EMPIRICAL LIKELIHOOD
 The nonparametric likelihood for F:
𝐿 𝐹 =
𝑛
1 𝐹(
𝑥𝑖 ), where 𝐹( 𝑥𝑖 ) is the probability of 𝑥𝑖 under F.
 The profile likelihood for 𝜽: 𝐿𝑝𝑟𝑜 𝜃 =
sup 𝐿(𝐹)
𝐹:𝑡 𝐹 =𝜃
 Computation difficulties  restricting the problem to 𝑭𝒘 , a discrete distribution
with probability mass 𝑤𝑖 on 𝑥𝑖 , 𝑖 = 1 … 𝑛.
 The empirical likelihood: 𝑳𝒆𝒎𝒑 𝜽 =
𝒔𝒖𝒑
𝒘:𝒕 𝑭𝒘 =𝜽
𝒏
𝟏 𝒘𝒊
 n parameters.
The probability to obtain our sample under 𝐹𝑤
THE EMPIRICAL LIKELIHOOD - EXAMPLE
 Test scores data set 𝑋 (took random 22 out of 88, p=5).
 Yellow curve: based on the approximation -
likelihood
 𝜃 = estimate of the max eigenvalue of the covariance matrix.
1.0
0.8
0.6
0.4
0.2
𝜃 ~𝑁
2
𝜃, 𝜎𝐵𝑆
→ 𝐿𝑛𝑜𝑟 𝜃 =
(𝜃−𝜃)2
−
2
2
𝜎
𝐵𝑆
𝑒
 Green curve: the empirical likelihood -
𝐿𝑒𝑚𝑝 𝜃 =
𝑠𝑢𝑝
𝑤:𝑡 𝐹𝑤 =𝜃
𝑛
1 𝑤𝑖
200
400
600
800
𝜃
1000
1200
THE EMPIRICAL LIKELIHOOD - GOOD TO KNOW
 Computation is difficult (the example is an approximation).
 Attractive properties:
 It is possible to show that in suitable smooth problems, the likelihood ratio
derived from the empirical likelihood −2log
𝐿𝑒𝑚𝑝 𝜃
𝐿𝑒𝑚𝑝 𝜃
 Works well with transformations (𝜃 → 𝑔(𝜃)).
 Simple extension to multiple parameters of interest.
~𝜒12 .
APPROXIMATE PIVOT METHODS
 Assume 𝜌 ≔ 𝜃 − 𝜃~𝐻, where 𝜃 is the real value under F, 𝐇 is a pivotal quantity
(does not depend in any unknown parameters)  we can estimate 𝐻 for a single
value 𝜃 = 𝜃 and then infer its value for all 𝜃.
How can we estimate 𝐻?
 Let 𝜃 ∗ − 𝜃~𝐻.
 Usually 𝐻 can not be given in a close form, it is estimated by B BS samples
APPROXIMATE PIVOT METHODS
𝑋1∗
𝜃1.∗
𝜌1∗ = 𝜃1.∗ − 𝜃
𝐻:
The empirical
distribution of
𝜌1∗ , … , 𝜌𝐵∗
𝑋 = 𝑥1 , … , 𝑥𝑛 𝑓𝑟𝑜𝑚 𝐹
𝜃 computed based on X
𝑋𝐵∗
𝜃𝐵.∗
𝜌𝐵∗ = 𝜃𝐵.∗ - 𝜃
APPROXIMATE PIVOT METHODS
 Let ℎ(𝜌) be the kernel density estimate of the distribution h, based on 𝜌1∗ , … , 𝜌𝐵∗ :
1
ℎ 𝜌 =
𝐵⋅𝑠
𝐵
𝑏=1
𝜌 − 𝜌𝑏∗
𝑘(
)
𝑠
 The approximate marginal 𝐿 𝜃 :
𝐿 𝜃 ≈ 𝑝 𝜃| 𝜃 = ℎ 𝜃 − 𝜃 ≔ ℎ 𝜌
1
=
𝐵⋅𝑠
𝐵
𝑏=1
(𝜃 − 𝜃) −
𝑘
𝑠
(𝜃𝑏∗ −𝜃)
1
=
𝐵⋅𝑠
𝐵
𝑏=1
2𝜃 − 𝜃 − 𝜃𝑏∗
𝑘
𝑠
APPROXIMATE PIVOT METHODS - EXAMPLE
1.0
.
 Blue curve: approximate likelihood using B=100
and a Gaussian kernel with manually chosen window width s.
likelihood
 Yellow curve: 𝐿𝑛𝑜𝑟 𝜃 = 𝑒
(𝜃−𝜃)2
− 2
2𝜎𝐵𝑆
0.8
0.6
0.4
0.2
200
400
600
800
𝜃
1000
1200
BOOTSTRAP PARTIAL LIKELIHOOD
 Similarly to the approximate pivot methods, the goal is to estimate a marginal
likelihood 𝐿 𝜃 = 𝑝(𝜃|𝜃).
 The difference: instead assuming the pivot assumption, according to this method we
compute the estimated probability 𝑝(𝜃|𝜃) based on nested BS samples.
BOOTSTRAP PARTIAL LIKELIHOOD
𝑋1∗
𝜃1.∗
∗∗
𝑋1,1
𝑝(𝑡|𝜃1.∗ )=
∗∗
𝜃1,1
∗∗
𝑋1,𝐵
2
∗∗
𝜃1,𝐵
2
𝑋𝐵∗∗1 ,1
𝜃𝐵∗∗1 ,1
1
𝐵2 ⋅𝑠
∗∗
𝑡−𝜃1,𝑗
𝐵2
𝑗=1 𝑘( 𝑠 )
𝑋 = 𝑥1 , … , 𝑥𝑛 𝑓𝑟𝑜𝑚 𝐹
𝜃 computed based on X
𝑋𝐵∗1
𝜃𝐵.∗1
𝑋𝐵∗∗1 ,𝐵2 𝜃𝐵∗∗1 ,𝐵2
𝑝(𝑡|𝜃𝐵.∗1 )=
1
𝐵2 ⋅𝑠
∗∗
𝑡−𝜃𝐵
𝐵2
1 ,𝑗
)
𝑗=1 𝑘(
𝑠
 Now, for t = 𝜃, 𝑝 𝜃 𝜃𝑏.∗ provides an estimate of 𝐿 𝜃 for 𝜃 = 𝜃𝑏.∗ .
 For a smooth estimate ~ p 𝜃 𝜃 : we apply a scatterplot smoother to
𝜃𝑏.∗ , 𝑝
𝜃
𝜃𝑏.∗
𝐵1
𝑏=1
.
BOOTSTRAP PARTIAL LIKELIHOOD - EXAMPLE
1.0
.
 Red curve: Bootstrap partial likelihood, 𝐵1 = 𝐵2 = 40
likelihood
 Yellow curve: 𝐿𝑛𝑜𝑟 𝜃 = 𝑒
(𝜃−𝜃)2
− 2
2𝜎𝐵𝑆
0.8
0.6
0.4
0.2
 (overall 1600 BS samples). The window size s of the kernel
200
400
600
800
𝜃
 function and the scatterplot smoother where chosen manually.
1000
1200
SUMMARY
 Warning: It is important to note that in general, non of the described methods
produce a true likelihood: a function that is proportional to the probability of a fixed
event in the sample space.
1.0
0.8
.
Green curve:The empirical likelihood.
likelihood
Yellow curve: 𝐿𝑛𝑜𝑟 𝜃 = 𝑒
(𝜃−𝜃)2
− 2
2𝜎𝐵𝑆
0.6
0.4
Blue curve:Approximate pivot likelihood.
Red curve: Bootstrap partial likelihood.
0.2
200
400
600
800
𝜃
1000
1200
PARAMETRIC AND NON-PARAMETRIC INFERENCE
 The objective of this section: study the relationship of Bootstrap and Jackknife
methodologies to more traditional parametric approaches.
 We will focus on estimation of statistic 𝜃’s variance.
“Exact”
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric Bootstrap
Fisher information
Parametric delta
INFLUENCE FUNCTIONS
𝒕 𝟏−𝝐 𝑭+𝝐𝜹𝒙 −𝒕(𝑭)
𝝐
𝝐→∞
 The influence function: 𝑼 𝒙, 𝑭 = 𝐥𝐢𝐦
 𝑡 𝐹 = 𝐸𝐹 𝑋 :
 𝑈 𝑥, 𝐹 = lim
𝑥 1−𝜖 𝑓(𝑥)+𝜖𝛿𝑥 𝑑𝑥− 𝑥𝑓(𝑥)𝑑𝑥
𝜖
𝜖→∞
 lim
−𝜖
𝑥𝑓(𝑥)𝑑𝑥+𝜖
𝜖→∞
𝑥𝛿𝑥 𝑑𝑥
𝜖
= −𝐸𝐹 𝑋 + 𝑥
 𝑡 𝐹 = 𝑚𝑒𝑑𝑖𝑎𝑛𝐹 𝑋 :
 𝑈 𝑥, 𝐹 =
𝑠𝑖𝑔𝑛(𝑥−𝑚𝑒𝑑𝑖𝑎𝑛 𝑋 )
2𝑓(𝑚𝑒𝑑𝑖𝑎𝑛 𝑋 )
=
INFLUENCE FUNCTIONS
 Consider a plug-in 𝜃 = 𝑡 𝐹
 𝒕 𝑭 =𝒕 𝑭 +
𝟏
𝒏
𝒏
𝟏𝑼
~ 𝜃 = 𝑡 𝐹 and the expansion:
𝒙𝒊 , 𝑭 +
𝟏
𝑶𝒑 ( )
𝒏
 Consider 𝐹 ∗ , the empirical distribution corresponding to the BS sample 𝑥 ∗ .
 𝒗𝒂𝒓𝑭 𝒕 𝑭∗

≅ 𝒗𝒂𝒓𝑭 𝒕 𝑭
BS estimate of Var(t(𝐹))
≅
𝟏
𝒏
𝒗𝒂𝒓𝑭 𝑼 𝒙, 𝑭
=
𝟏
𝒏
𝑬𝑭 𝑼𝟐 𝒙, 𝑭
since 𝑬𝑭 𝑼 𝒙, 𝑭
=0
JK , IJ & NON PARAMETRIC BS
 Suppose that 𝑭 = 𝑭, and in 𝑼 𝒙, 𝑭 , 𝝐 = −
𝟏
,𝜖
𝒏−𝟏
↛0
 We obtain something close to the JK variance estimate:
 (
𝒏−𝟏 𝟐
)
𝒏
𝒏
𝟏 (𝒕
𝑭
𝒊
−𝒕 𝑭
.
𝒏−𝟏
)
𝒏
)𝟐 ≅ (
𝒏
𝟏 (𝒕
 Now, suppose that 𝑭 = 𝑭, and 𝑼 𝒙, 𝑭 →
 The infinitesimal jackknife estimate:
𝟏
𝒏
𝑭
𝒊
−𝒕 𝑭
.
)𝟐 = 𝒗𝒂𝒓𝑱𝑲 𝒕 𝑭 .
𝟏
𝒏
𝑬𝑭 𝑼𝟐 𝒙, 𝑭 = ( )𝟐
𝒗𝒂𝒓𝑰𝑱 𝒕
𝑭 =
𝟏 𝟐
( )
𝒏
𝒏 𝟐
𝟏 𝑼 (𝒙𝒊 ,
𝒏 𝟐
𝟏 𝑼 (𝒙𝒊 ,
𝑭).
𝑭)
JK , IJ & NON PARAMETRIC BS
𝒗𝒂𝒓𝑭 𝒕
𝑭∗
 If 𝑡 𝐹 = 𝑡 𝐹 +
 𝒗𝒂𝒓𝑭
𝒕(𝑭∗ )
=
1
𝑛
≅ 𝒗𝒂𝒓𝑭 𝒕 𝑭
𝑛
1𝑈
𝒗𝒂𝒓𝑰𝑱 𝒕
≅
𝟏
𝒏
𝒗𝒂𝒓𝑭 𝑼 𝒙, 𝑭
𝑥𝑖 , 𝐹 
𝑭 =
=
𝟏
𝒏
𝑬𝑭 𝑼𝟐 𝒙, 𝑭
“Exact”
Nonparametric
Bootstrap
𝒏−𝟏
𝒗𝒂𝒓𝑱𝑲 𝒕(𝑭)
𝒏
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric Bootstrap
Fisher information
Parametric delta
THE PARAMETRIC BOOTSTRAP FOR MLE
 Consider a sample 𝑋 drawn from 𝐹, a known distribution function with parameter 𝜃.
We would like to estimate a function of 𝜃 in the real world.
 First, we compute our maximum likelihood estimator 𝜃.
 We draw B Bootstrap samples 𝑋 ∗ of size n from the density 𝑓𝜃 𝑥 .
∗
 We then calculate the maximum likelihood estimate 𝜃𝑏 for each BS sample
∗
∗
 We use the empirical distribution of {𝜃1 , … , 𝜃𝐵 } to estimate qualities of 𝜃 . For
instance, the sample variance estimates 𝑣𝑎𝑟𝐹 𝜃.
THE PARAMETRIC BOOTSTRAP - EXAMPLE
 𝑋~𝑁
𝜇, 𝜎 2
Normal model
; 𝑛 = 9 ; interested in the variance of the mean.
50
 500 parametric BS samples of size 9 from 𝑁(𝜃, 𝑠𝑒 2 ).
∗
 Histogram of the 500 values 𝜃1 , … , 𝜃500
∗
100
0
of the samples mean.
 Superimposed on the histogram is the density 𝑁 𝜃, 𝑖 𝜃
−1
.
20
40
60
80
100
Exponential model
300
200
∗
∗
 BS sample variance of 𝜃1 , … , 𝜃500 =183.7, and 𝑖 𝜃
−1
=177.7 .100
0
20 40 60 80 100 140
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
 The score function: 𝑙 𝜃, 𝑥 =
𝜕𝑙(𝜃,𝑥)
𝜕𝜃
 The Fisher information: 𝑖 𝜃 = −𝐸
=
𝜕log(𝐿 𝜃,𝑥 )
𝜕𝜃
𝜕2 𝑙 𝜃,𝑥
𝜕𝜃
=
𝑛 𝜕log(𝐿(𝜃,𝑥𝑖 )
1
𝜕𝜃
= −𝐸[
2
𝑛 𝜕 log 𝐿 𝜃,𝑥𝑖
1
𝜕𝜃2
]
 By the asymptotic distribution of the MLE, under certain conditions (regularity of 𝑓,
𝜃 is in the interior of the parameter space): 𝜽𝒏 ≔ 𝜽 ~𝑵 𝜽, 𝒊 𝜽
−𝟏
.
 This suggests that the sampling distribution of 𝜽 can be approximated by
𝑵 𝜽, 𝒊 𝜽
−𝟏
.
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
 Suppose we have a vector of parameters 𝜂 and we want to conduct inference for a real
valued function 𝜃 = ℎ(η). Let 𝜂0 be the true value of 𝜂.
 Assuming 𝜂 is the MLE, 𝜃 = ℎ(𝜂) is the MLE for 𝜃.
 Denote the parametric family of distribution functions for x by 𝐹η , and 𝐹 = 𝐹η0 .
𝜕2 𝑙 𝜂;𝑋
−𝐸𝐹 [
𝜕𝑙(𝜂;𝑋)
𝜕𝜂1
 𝑙 𝜂; 𝑥 =
|
𝜕𝑙(𝜂;𝑋)
𝜕𝜂𝑝
,𝑖 𝜂 =
]
𝜕𝜂1 2
𝜕2 𝑙 𝜂;𝑋
−𝐸𝐹 [
]
𝜕𝜂1 𝜕𝜂2
…
𝜕2 𝑙 𝜂;𝑋
−𝐸𝐹 [
]
𝜕𝜂1 𝜕𝜂2
…
\
|
−
−𝐸𝐹 [
𝜕2 𝑙 𝜂;𝑋
𝜕𝜂𝑝 2
𝜕ℎ(𝜂)
𝜕𝜂1
,ℎ 𝜂 =
]
|
𝜕ℎ(𝜂)
𝜕𝜂𝑝
.
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
 By the chain rule, 𝒊(𝒉 𝜼 ) −𝟏 = 𝒉 𝜼 𝑻 𝒊(𝜼) −𝟏 𝒉 𝜼
 It can be shown that 𝜃𝑛 ≔ 𝜽 = 𝒉 𝜼 ~𝑵(𝒉(𝜼𝟎 ), 𝒉 𝜼𝟎
𝑻
𝒊(𝜼𝟎 ) −𝟏 𝒉(𝜼𝟎 )),
 where 𝜼𝟎 can be replaced with 𝜼, a consistent estimator.
 A property of M-estimators: U x, F = −
 𝑼 𝒙, 𝑭 = 𝒊 𝜽
−𝟏
𝜕𝜓 𝑥,𝜃
𝜕𝜃
⋅ 𝒍 𝜽; 𝒙 = 𝒏 ⋅ 𝒉 𝜼 𝑻 𝒊 𝜼
 𝑼 𝒙, 𝑭𝜼 = 𝒏 ⋅ 𝒉 𝜼 𝑻 𝒊 𝜼
−𝟏 𝒉
𝜼
−𝟏
|𝑡
𝐹
𝑑𝐹 𝑥
𝒍(𝜼; 𝒙)
−1
⋅ 𝜓 𝑥, 𝑡 𝐹
PARAMETRIC ML & BS INFERENCE, JACKKNIFE
 𝑼 𝒙, 𝑭 = 𝒏 ⋅ 𝒉 𝜼 𝑻 𝒊 𝜼
 Now, using 𝑣𝑎𝑟𝐹 (𝑡 𝐹 )≈
𝑇𝑖
1
𝑛
𝐸𝐹 (𝑈 2 𝑥, 𝐹 ), we can see that:
≈ 𝐸𝐹 [𝑛 ⋅ ℎ 𝜂 𝑇 𝑖 𝜂
 𝒗𝒂𝒓𝑭 𝒕 𝑭
 𝑛⋅ℎ 𝜂
−𝟏 𝒍(𝜼; 𝒙)
𝜂
−1 𝐸 [𝑙(𝜂; 𝑥)𝑙
𝐹
 Since 𝑛 ⋅ 𝐸𝐹 𝑙 𝜂; 𝑥 𝑙 𝜂; 𝑥
𝑇
−1 𝑙
𝜂; 𝑥 𝑙 𝜂; 𝑥 𝑇 𝑖 𝜂
𝜂; 𝑥 𝑇 ]𝑖 𝜂
−1
−1 ℎ
ℎ 𝜂 = ℎ 𝜂 𝑇𝑖 𝜂
= 𝐸𝐹 𝑙 𝜂; 𝑋 𝑙 𝜂; 𝑋
𝑇
=𝑖 𝜂
𝜂 ]=
−1 ℎ
𝜂 = 𝒊(𝒉 𝜼 ) −𝟏
PARAMETRIC ML, PARAMETRIC BS, IJ
 Therefore, we can conclude that for a statistic 𝑡 𝐹𝜂 , 𝐹 = 𝐹𝜂 :
 𝒗𝒂𝒓𝑰𝑱 𝒕 𝑭𝜼 =
𝟏
𝒏
𝑬𝑭𝜼 [𝑼𝟐 𝒙, 𝑭𝜼 ] = 𝒉 𝜼 𝑻 𝒊 𝜼
 Furthermore, of 𝑡 𝐹𝜂 is a linear statistic,
 then the IJ and the inverse Fisher information
 both agree with the parametric BS estimate of
 variance for 𝑡 𝐹𝜂 .
−𝟏
𝒉 𝜼 = 𝒊(𝒉 𝜼 ) −𝟏
“Exact”
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric
Bootstrap
Fisher information
Parametric delta
THE DELTA METHOD
 Assume 𝜂𝑛
𝐶𝐴𝑁
𝜂, i. e.
𝐷
𝑛 𝜂𝑛 − 𝜂 → 𝑁 0, Σ𝐹 𝜂 .
 We will assume that 𝜂𝑛 : ℝ𝑝 → ℝ𝐴 consists of means: 𝑄1 , … , 𝑄𝐴 ,
𝐷
 By the CLT, 𝑛 𝑄 − 𝜇 → 𝑁 0, Σ𝑞 𝜂
1
𝑛
𝑛
1 𝑄𝑎 (𝑋𝑖 ) .
≔ N 0, Σ𝑞,𝐹 𝜂 .
 Consider ℎ 𝜂 : ℝ𝐴 → ℝ1 , 𝛻ℎ ≠ 0.
 Then
𝑫
𝒏(𝒉(𝜼𝒏 ) − 𝒉(𝜼)) → 𝑵(𝟎, 𝜵𝒉 𝜼 𝑻 𝜮𝒒,𝑭 (𝜼)𝜵𝒉(𝜼)).
 In other words, we can estimate 𝒗𝒂𝒓𝑭 𝒉(𝜼 ) by
𝜵𝒉 𝜼 𝑻 𝜮𝒒,𝑭 (𝜼))𝜵𝒉(𝜼)
𝒏
.
THE DELTA METHOD
 First, we’ll look at the expansion:
 ℎ 𝜂 ≈ ℎ 𝜂 + 𝛻ℎ 𝜂
𝑇
𝜂−𝜂
 This implies:
 𝑣𝑎𝑟 ℎ 𝜂
≈ 𝑣𝑎𝑟 ℎ 𝜂 + 𝛻ℎ 𝜂

= 𝛻ℎ 𝜂
𝑇 𝐶𝑜𝑣(𝜂)𝛻ℎ
𝑇
𝜂
𝜂−𝜂
= 𝛻ℎ 𝜂
𝑇 Σ𝑞,𝐹 (𝜂) 𝛻ℎ
𝑛
𝜂
THE DELTA METHOD
 The nonparametric delta method: 𝐹 → 𝐹:
 Replace the first and second moments with the plugin estimates created from 𝐹,
 𝒗𝒂𝒓𝑵𝑫 (𝒉 𝜼 ) =
𝜵𝒉𝑭 𝑻 𝜮𝒒,𝑭 𝜵𝒉𝑭
𝒏
 The parametric delta method: 𝐹 → 𝐹𝜂 :
 𝒗𝒂𝒓𝑷𝑫 (𝒉 𝜼 ) =
𝜵𝒉𝑭𝜼 𝑻 𝜮𝒒,𝑭𝜼 𝜵𝒉𝑭𝜼
𝒏
THE DELTA METHOD FOR THE MEAN – EXAMPLE
 𝜽 𝒙𝟏 , … , 𝒙𝒏 =
𝟏
𝒏
𝒏
𝟏 𝒙𝒊 ,
h 𝜂 = 𝜂 → 𝛻ℎ𝐹 = 1. CLT  Σ𝐹 = 𝑣𝑎𝑟(𝑥).
 Nonparametric: the plugin estimate of variance, Σ𝐹 =
𝑁𝐷
 Var
(ℎ 𝜂 ) =
𝛻ℎ𝐹 𝑇 Σ𝑞,𝐹 𝛻ℎ𝐹
=
𝑛
1
𝑛2
𝑛
1 (𝑥𝑖
1
𝑛
𝑛
1 (𝑥𝑖
− 𝑥)2
− 𝑥)2 .
 Parametric delta: if we assume x is exponential, 𝜂 = 𝑥, Σ𝐹𝜂 = 𝜎 2 𝐹𝜂 =
 Var𝑃𝐷 (ℎ 𝜂 ) =
𝛻ℎ𝐹 𝑇 Σ𝑞,𝐹 𝛻ℎ𝐹
𝜂
𝜂
𝑛
𝜂
=
1
𝑛𝑥 2
1
.
𝑥2
THE DELTA METHOD’S CONNECTIONS
 If 𝑡 𝐹 = 𝜂𝑛
𝐶𝐴𝑁
𝜂 [or a function of means], then:
 𝒗𝒂𝒓𝑰𝑱 𝒕 𝑭 = 𝒗𝒂𝒓𝑵𝑫 𝒕 𝑭 .
 For a parametric model:
“Exact”
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
 𝒗𝒂𝒓𝑰𝑱 𝒕 𝑭𝜼 = 𝒗𝒂𝒓𝑷𝑫 𝒕 𝑭𝜼 = 𝒊(𝒉 𝜼 ) −𝟏
Parametric Bootstrap
Fisher information
Parametric delta
THE EXPONENTIAL FAMILY
 A random variable X is said to have a density in the exponential family if
 𝑓𝜂 𝑥 = ℎ0 𝑥
𝑇 𝑞 𝑥 −𝜓 𝜂
𝜂
𝑒
, where q is a vector of A sufficient statistics, 𝜂 is the vector of
natural parameters, and the support does not depend on 𝜂.
 𝑬𝒒 𝑿
= 𝝍′ 𝜼 , 𝒗𝒂𝒓 𝒒 𝑿
= 𝝍′′ 𝜼 .
 𝑥~ exp 𝜆 , 𝑓 𝑥 = 𝜆𝑒 −𝜆𝑥 = 1 ⋅ 𝑒 𝜆⋅(−𝑥) ⋅ 𝑒 −(− ln
𝐸 𝑞 𝑋
 𝑣𝑎𝑟 𝑞 𝑋
1
𝜆
= 𝐸 −𝑥 = − =
=
1
𝜆2
=
𝜕2 − 𝑙𝑛 𝜆
𝜕𝜆2
𝜕 − ln 𝜆
𝜕𝜆
𝜆 )
PROOF – THE PARAMETRIC CASE
 Now, say we have n observations 𝑥1 , … , 𝑥𝑛 from the exponential family.
 We can conclude that their joint distribution also belongs to the exponential family,
with natural parameter 𝜂, and with sufficient statistics 𝑄 = (
 Therefore, 𝑓𝜂 𝑄 = ℎ1 𝑞 𝑒 𝑛(𝜂
 𝑬𝑸 𝑿
= 𝝍′ 𝜼 , 𝒗𝒂𝒓 𝑸 𝑿
𝑇𝑞
𝑥 −𝜓 𝜂 )
1 𝑞 (𝑥 )
𝑛 1 𝑖
𝑛
,…,
.
= 𝝍′′ 𝜼
 𝜂 satisfies the set of equations: 𝒒 = 𝝍′ 𝜼 → 𝒒 = 𝑬 𝑸 𝑿
𝒖𝒏𝒅𝒆𝒓 𝑭𝜼 .
1 𝑞 (𝑥 )
𝑛 𝐴 𝑖
𝑛
).
PROOF – THE PARAMETRIC CASE
 The solution to 𝒒 = 𝝍′ 𝜼 → 𝜼 = 𝝍′−𝟏 𝒒 .
 If we look at the parametric delta method, which begins with 𝑄 having variance
applies 𝜃 = ℎ (𝜓′ 𝑞
𝜓′′ 𝜂
𝑛
−1 ),
𝜂
 Let K be the matrix of derivatives of 𝜓′−1 ⋅ .
 𝒗𝒂𝒓𝑷𝑫 𝒉 𝑭𝜼 = ℎ 𝜂
𝑇 𝐾 𝑇 𝑣𝑎𝑟
𝑄 𝐾𝑇ℎ 𝜂 =
ℎ 𝜂 𝑇 𝐾 𝑇 𝜓′′ 𝜂 𝐾𝑇 ℎ 𝜂
𝑛
 Therefore, 𝒗𝒂𝒓𝑷𝑫 𝒕 𝑭𝜼 = 𝒉 𝜼 𝑻 𝒊(𝜼) −𝟏 𝒉 𝜼 .
=
−𝟏
𝒉 𝜼 𝑻 𝝍′′
𝜼 𝒉 𝜼
𝒏
.
, and
PROOF – THE PARAMETRIC CASE
 On the other hand,
 𝑖 𝜂 = 𝑛 ⋅ 𝜓 ′′ 𝜂 = 𝑛2 𝑣𝑎𝑟(𝑄)
 𝑖 𝜃
−1
=ℎ 𝜂
𝑇 𝑖(𝜂) −1 ℎ
𝜂 =
𝒉 𝜼 𝑻 𝝍′′ 𝜼 −𝟏 𝒉 𝜼
𝒏
SUMMARY – ESTIMATION OF VAR(𝜃)
Nonparametric Bootstrap
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric BS
Fisher information
Parametric delta
𝑣𝑎𝑟𝐹 𝑡 𝐹 ∗
𝑛−1
(
)
𝑛
𝑛
(𝑡 𝐹
1
1
( )2
𝑛
𝑖
−𝑡 𝐹
𝑛 2
1 𝑈 (𝑥𝑖 ,
.
)2
𝐹)
𝛻ℎ𝐹 𝑇 𝛴𝑞,𝐹 𝛻ℎ𝐹
𝑛
𝑣𝑎𝑟𝐹𝜂 ℎ(𝜂)∗
𝛻ℎ 𝜂 𝑇 𝑖 𝜂
−1 𝛻ℎ
𝜂 = 𝑖(ℎ 𝜂 ) −1
𝛻ℎ𝐹𝜂 𝑇 𝛴𝑞,𝐹𝜂 𝛻ℎ𝐹𝜂
𝑛

Download Report

21,24

Paperzz.com

Your Paperzz