21,24

SELECTED TOPICS ON BOOTSTRAP
(OR: BACK TO STATISTICAL THEORY)
DANA KANER
M.SC. SEMINAR IN STATISTICS, JUNE 2017
APPROXIMATE LIKELIHOODS - INTRODUCTION
๏‚ง Suppose that we have a data ๐‘‹ = ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› , n i.i.d observations from ๐น (non-parametric).
๏‚ง Our statistic ๐œƒ = ๐‘ก(๐น) estimates the parameter of interest ๐œƒ = ๐‘ก ๐น .
๏‚ง We seek an approximate likelihood function for ๐œฝ.
๏‚ง Why?
๏‚ง The likelihood can combine information from multiple independent experiments.
๏‚ง In order do construct (along with the prior distribution) a Bayesian posterior
distribution for inference.
APPROXIMATE LIKELIHOODS - INTRODUCTION
๏‚ง First, weโ€™ll look at a parametric likelihood of ๐œƒ, ๐ฟ ๐œƒ =
๐‘›
1๐‘
๐‘ฅ๐‘– ๐œƒ .
๏‚ง In most situations, ๐‘ ๐‘ฅ โ‹… depends on additional โ€œnuisanceโ€ parameters ๐œ† beside ๐œƒ.
๏‚ง How to get rid of ๐€ once we have ๐’‘ ๐’™ ๐œฝ, ๐€ ? - Our main objective.
๏‚ง The profile likelihood: ๐ฟ๐‘๐‘Ÿ๐‘œ ๐œƒ = ๐ฟ ๐œƒ, ๐œ†๐œƒ .
๏‚ง The marginal likelihood: ๐ฟ๐‘š๐‘Ž๐‘Ÿ ๐œƒ = ๐‘ž๐‘ฃ ๐‘ฃ|๐œƒ , where ๐‘ฃ = ๐‘ฃ ๐‘ฅ is a function whose
density ๐‘ž๐‘ฃ ๐‘ฃ|๐œƒ involves only ๐œƒ ๏ƒ  ๐‘ž๐‘ฃ ๐‘ฃ|๐œƒ = ๐‘ ๐œƒ ๐œƒ .
๏‚ง The problem: we donโ€™t always know ๐ฉ ๐’™ ๐œฝ, ๐€ ๏ƒ  approximate likelihoods.
THE EMPIRICAL LIKELIHOOD
๏‚ง The nonparametric likelihood for F:
๐ฟ ๐น =
๐‘›
1 ๐น(
๐‘ฅ๐‘– ), where ๐น( ๐‘ฅ๐‘– ) is the probability of ๐‘ฅ๐‘– under F.
๏‚ง The profile likelihood for ๐œฝ: ๐ฟ๐‘๐‘Ÿ๐‘œ ๐œƒ =
sup ๐ฟ(๐น)
๐น:๐‘ก ๐น =๐œƒ
๏‚ง Computation difficulties ๏ƒ  restricting the problem to ๐‘ญ๐’˜ , a discrete distribution
with probability mass ๐‘ค๐‘– on ๐‘ฅ๐‘– , ๐‘– = 1 โ€ฆ ๐‘›.
๏‚ง The empirical likelihood: ๐‘ณ๐’†๐’Ž๐’‘ ๐œฝ =
๐’”๐’–๐’‘
๐’˜:๐’• ๐‘ญ๐’˜ =๐œฝ
๐’
๐Ÿ ๐’˜๐’Š
๏ƒ  n parameters.
The probability to obtain our sample under ๐น๐‘ค
THE EMPIRICAL LIKELIHOOD - EXAMPLE
๏‚ง Test scores data set ๐‘‹ (took random 22 out of 88, p=5).
๏‚ง Yellow curve: based on the approximation -
likelihood
๏‚ง ๐œƒ = estimate of the max eigenvalue of the covariance matrix.
1.0
0.8
0.6
0.4
0.2
๐œƒ ~๐‘
2
๐œƒ, ๐œŽ๐ต๐‘†
โ†’ ๐ฟ๐‘›๐‘œ๐‘Ÿ ๐œƒ =
(๐œƒโˆ’๐œƒ)2
โˆ’
2
2
๐œŽ
๐ต๐‘†
๐‘’
๏‚ง Green curve: the empirical likelihood -
๐ฟ๐‘’๐‘š๐‘ ๐œƒ =
๐‘ ๐‘ข๐‘
๐‘ค:๐‘ก ๐น๐‘ค =๐œƒ
๐‘›
1 ๐‘ค๐‘–
200
400
600
800
๐œƒ
1000
1200
THE EMPIRICAL LIKELIHOOD - GOOD TO KNOW
๏‚ง Computation is difficult (the example is an approximation).
๏‚ง Attractive properties:
๏‚ง It is possible to show that in suitable smooth problems, the likelihood ratio
derived from the empirical likelihood โˆ’2log
๐ฟ๐‘’๐‘š๐‘ ๐œƒ
๐ฟ๐‘’๐‘š๐‘ ๐œƒ
๏‚ง Works well with transformations (๐œƒ โ†’ ๐‘”(๐œƒ)).
๏‚ง Simple extension to multiple parameters of interest.
~๐œ’12 .
APPROXIMATE PIVOT METHODS
๏‚ง Assume ๐œŒ โ‰” ๐œƒ โˆ’ ๐œƒ~๐ป, where ๐œƒ is the real value under F, ๐‡ is a pivotal quantity
(does not depend in any unknown parameters) ๏ƒ  we can estimate ๐ป for a single
value ๐œƒ = ๐œƒ and then infer its value for all ๐œƒ.
How can we estimate ๐ป?
๏‚ง Let ๐œƒ โˆ— โˆ’ ๐œƒ~๐ป.
๏‚ง Usually ๐ป can not be given in a close form, it is estimated by B BS samples
APPROXIMATE PIVOT METHODS
๐‘‹1โˆ—
๐œƒ1.โˆ—
๐œŒ1โˆ— = ๐œƒ1.โˆ— โˆ’ ๐œƒ
๐ป:
The empirical
distribution of
๐œŒ1โˆ— , โ€ฆ , ๐œŒ๐ตโˆ—
๐‘‹ = ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› ๐‘“๐‘Ÿ๐‘œ๐‘š ๐น
๐œƒ computed based on X
๐‘‹๐ตโˆ—
๐œƒ๐ต.โˆ—
๐œŒ๐ตโˆ— = ๐œƒ๐ต.โˆ— - ๐œƒ
APPROXIMATE PIVOT METHODS
๏‚ง Let โ„Ž(๐œŒ) be the kernel density estimate of the distribution h, based on ๐œŒ1โˆ— , โ€ฆ , ๐œŒ๐ตโˆ— :
1
โ„Ž ๐œŒ =
๐ตโ‹…๐‘ 
๐ต
๐‘=1
๐œŒ โˆ’ ๐œŒ๐‘โˆ—
๐‘˜(
)
๐‘ 
๏‚ง The approximate marginal ๐ฟ ๐œƒ :
๐ฟ ๐œƒ โ‰ˆ ๐‘ ๐œƒ| ๐œƒ = โ„Ž ๐œƒ โˆ’ ๐œƒ โ‰” โ„Ž ๐œŒ
1
=
๐ตโ‹…๐‘ 
๐ต
๐‘=1
(๐œƒ โˆ’ ๐œƒ) โˆ’
๐‘˜
๐‘ 
(๐œƒ๐‘โˆ— โˆ’๐œƒ)
1
=
๐ตโ‹…๐‘ 
๐ต
๐‘=1
2๐œƒ โˆ’ ๐œƒ โˆ’ ๐œƒ๐‘โˆ—
๐‘˜
๐‘ 
APPROXIMATE PIVOT METHODS - EXAMPLE
1.0
.
๏‚ง Blue curve: approximate likelihood using B=100
and a Gaussian kernel with manually chosen window width s.
likelihood
๏‚ง Yellow curve: ๐ฟ๐‘›๐‘œ๐‘Ÿ ๐œƒ = ๐‘’
(๐œƒโˆ’๐œƒ)2
โˆ’ 2
2๐œŽ๐ต๐‘†
0.8
0.6
0.4
0.2
200
400
600
800
๐œƒ
1000
1200
BOOTSTRAP PARTIAL LIKELIHOOD
๏‚ง Similarly to the approximate pivot methods, the goal is to estimate a marginal
likelihood ๐ฟ ๐œƒ = ๐‘(๐œƒ|๐œƒ).
๏‚ง The difference: instead assuming the pivot assumption, according to this method we
compute the estimated probability ๐‘(๐œƒ|๐œƒ) based on nested BS samples.
BOOTSTRAP PARTIAL LIKELIHOOD
๐‘‹1โˆ—
๐œƒ1.โˆ—
โˆ—โˆ—
๐‘‹1,1
๐‘(๐‘ก|๐œƒ1.โˆ— )=
โˆ—โˆ—
๐œƒ1,1
โˆ—โˆ—
๐‘‹1,๐ต
2
โˆ—โˆ—
๐œƒ1,๐ต
2
๐‘‹๐ตโˆ—โˆ—1 ,1
๐œƒ๐ตโˆ—โˆ—1 ,1
1
๐ต2 โ‹…๐‘ 
โˆ—โˆ—
๐‘กโˆ’๐œƒ1,๐‘—
๐ต2
๐‘—=1 ๐‘˜( ๐‘  )
๐‘‹ = ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› ๐‘“๐‘Ÿ๐‘œ๐‘š ๐น
๐œƒ computed based on X
๐‘‹๐ตโˆ—1
๐œƒ๐ต.โˆ—1
๐‘‹๐ตโˆ—โˆ—1 ,๐ต2 ๐œƒ๐ตโˆ—โˆ—1 ,๐ต2
๐‘(๐‘ก|๐œƒ๐ต.โˆ—1 )=
1
๐ต2 โ‹…๐‘ 
โˆ—โˆ—
๐‘กโˆ’๐œƒ๐ต
๐ต2
1 ,๐‘—
)
๐‘—=1 ๐‘˜(
๐‘ 
๏‚ง Now, for t = ๐œƒ, ๐‘ ๐œƒ ๐œƒ๐‘.โˆ— provides an estimate of ๐ฟ ๐œƒ for ๐œƒ = ๐œƒ๐‘.โˆ— .
๏‚ง For a smooth estimate ~ p ๐œƒ ๐œƒ : we apply a scatterplot smoother to
๐œƒ๐‘.โˆ— , ๐‘
๐œƒ
๐œƒ๐‘.โˆ—
๐ต1
๐‘=1
.
BOOTSTRAP PARTIAL LIKELIHOOD - EXAMPLE
1.0
.
๏‚ง Red curve: Bootstrap partial likelihood, ๐ต1 = ๐ต2 = 40
likelihood
๏‚ง Yellow curve: ๐ฟ๐‘›๐‘œ๐‘Ÿ ๐œƒ = ๐‘’
(๐œƒโˆ’๐œƒ)2
โˆ’ 2
2๐œŽ๐ต๐‘†
0.8
0.6
0.4
0.2
๏‚ง (overall 1600 BS samples). The window size s of the kernel
200
400
600
800
๐œƒ
๏‚ง function and the scatterplot smoother where chosen manually.
1000
1200
SUMMARY
๏‚ง Warning: It is important to note that in general, non of the described methods
produce a true likelihood: a function that is proportional to the probability of a fixed
event in the sample space.
1.0
0.8
.
Green curve:The empirical likelihood.
likelihood
Yellow curve: ๐ฟ๐‘›๐‘œ๐‘Ÿ ๐œƒ = ๐‘’
(๐œƒโˆ’๐œƒ)2
โˆ’ 2
2๐œŽ๐ต๐‘†
0.6
0.4
Blue curve:Approximate pivot likelihood.
Red curve: Bootstrap partial likelihood.
0.2
200
400
600
800
๐œƒ
1000
1200
PARAMETRIC AND NON-PARAMETRIC INFERENCE
๏‚ง The objective of this section: study the relationship of Bootstrap and Jackknife
methodologies to more traditional parametric approaches.
๏‚ง We will focus on estimation of statistic ๐œƒโ€™s variance.
โ€œExactโ€
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric Bootstrap
Fisher information
Parametric delta
INFLUENCE FUNCTIONS
๐’• ๐Ÿโˆ’๐ ๐‘ญ+๐๐œน๐’™ โˆ’๐’•(๐‘ญ)
๐
๐โ†’โˆž
๏‚ง The influence function: ๐‘ผ ๐’™, ๐‘ญ = ๐ฅ๐ข๐ฆ
๏‚ง ๐‘ก ๐น = ๐ธ๐น ๐‘‹ :
๏‚ง ๐‘ˆ ๐‘ฅ, ๐น = lim
๐‘ฅ 1โˆ’๐œ– ๐‘“(๐‘ฅ)+๐œ–๐›ฟ๐‘ฅ ๐‘‘๐‘ฅโˆ’ ๐‘ฅ๐‘“(๐‘ฅ)๐‘‘๐‘ฅ
๐œ–
๐œ–โ†’โˆž
๏‚ง lim
โˆ’๐œ–
๐‘ฅ๐‘“(๐‘ฅ)๐‘‘๐‘ฅ+๐œ–
๐œ–โ†’โˆž
๐‘ฅ๐›ฟ๐‘ฅ ๐‘‘๐‘ฅ
๐œ–
= โˆ’๐ธ๐น ๐‘‹ + ๐‘ฅ
๏‚ง ๐‘ก ๐น = ๐‘š๐‘’๐‘‘๐‘–๐‘Ž๐‘›๐น ๐‘‹ :
๏‚ง ๐‘ˆ ๐‘ฅ, ๐น =
๐‘ ๐‘–๐‘”๐‘›(๐‘ฅโˆ’๐‘š๐‘’๐‘‘๐‘–๐‘Ž๐‘› ๐‘‹ )
2๐‘“(๐‘š๐‘’๐‘‘๐‘–๐‘Ž๐‘› ๐‘‹ )
=
INFLUENCE FUNCTIONS
๏‚ง Consider a plug-in ๐œƒ = ๐‘ก ๐น
๏‚ง ๐’• ๐‘ญ =๐’• ๐‘ญ +
๐Ÿ
๐’
๐’
๐Ÿ๐‘ผ
~ ๐œƒ = ๐‘ก ๐น and the expansion:
๐’™๐’Š , ๐‘ญ +
๐Ÿ
๐‘ถ๐’‘ ( )
๐’
๏‚ง Consider ๐น โˆ— , the empirical distribution corresponding to the BS sample ๐‘ฅ โˆ— .
๏‚ง ๐’—๐’‚๐’“๐‘ญ ๐’• ๐‘ญโˆ—
๏‚ง
โ‰… ๐’—๐’‚๐’“๐‘ญ ๐’• ๐‘ญ
BS estimate of Var(t(๐น))
โ‰…
๐Ÿ
๐’
๐’—๐’‚๐’“๐‘ญ ๐‘ผ ๐’™, ๐‘ญ
=
๐Ÿ
๐’
๐‘ฌ๐‘ญ ๐‘ผ๐Ÿ ๐’™, ๐‘ญ
since ๐‘ฌ๐‘ญ ๐‘ผ ๐’™, ๐‘ญ
=0
JK , IJ & NON PARAMETRIC BS
๏‚ง Suppose that ๐‘ญ = ๐‘ญ, and in ๐‘ผ ๐’™, ๐‘ญ , ๐ = โˆ’
๐Ÿ
,๐œ–
๐’โˆ’๐Ÿ
โ†›0
๏‚ง We obtain something close to the JK variance estimate:
๏‚ง (
๐’โˆ’๐Ÿ ๐Ÿ
)
๐’
๐’
๐Ÿ (๐’•
๐‘ญ
๐’Š
โˆ’๐’• ๐‘ญ
.
๐’โˆ’๐Ÿ
)
๐’
)๐Ÿ โ‰… (
๐’
๐Ÿ (๐’•
๏‚ง Now, suppose that ๐‘ญ = ๐‘ญ, and ๐‘ผ ๐’™, ๐‘ญ โ†’
๏‚ง The infinitesimal jackknife estimate:
๐Ÿ
๐’
๐‘ญ
๐’Š
โˆ’๐’• ๐‘ญ
.
)๐Ÿ = ๐’—๐’‚๐’“๐‘ฑ๐‘ฒ ๐’• ๐‘ญ .
๐Ÿ
๐’
๐‘ฌ๐‘ญ ๐‘ผ๐Ÿ ๐’™, ๐‘ญ = ( )๐Ÿ
๐’—๐’‚๐’“๐‘ฐ๐‘ฑ ๐’•
๐‘ญ =
๐Ÿ ๐Ÿ
( )
๐’
๐’ ๐Ÿ
๐Ÿ ๐‘ผ (๐’™๐’Š ,
๐’ ๐Ÿ
๐Ÿ ๐‘ผ (๐’™๐’Š ,
๐‘ญ).
๐‘ญ)
JK , IJ & NON PARAMETRIC BS
๐’—๐’‚๐’“๐‘ญ ๐’•
๐‘ญโˆ—
๏‚ง If ๐‘ก ๐น = ๐‘ก ๐น +
๏‚ง ๐’—๐’‚๐’“๐‘ญ
๐’•(๐‘ญโˆ— )
=
1
๐‘›
โ‰… ๐’—๐’‚๐’“๐‘ญ ๐’• ๐‘ญ
๐‘›
1๐‘ˆ
๐’—๐’‚๐’“๐‘ฐ๐‘ฑ ๐’•
โ‰…
๐Ÿ
๐’
๐’—๐’‚๐’“๐‘ญ ๐‘ผ ๐’™, ๐‘ญ
๐‘ฅ๐‘– , ๐น ๏ƒ 
๐‘ญ =
=
๐Ÿ
๐’
๐‘ฌ๐‘ญ ๐‘ผ๐Ÿ ๐’™, ๐‘ญ
โ€œExactโ€
Nonparametric
Bootstrap
๐’โˆ’๐Ÿ
๐’—๐’‚๐’“๐‘ฑ๐‘ฒ ๐’•(๐‘ญ)
๐’
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric Bootstrap
Fisher information
Parametric delta
THE PARAMETRIC BOOTSTRAP FOR MLE
๏‚ง Consider a sample ๐‘‹ drawn from ๐น, a known distribution function with parameter ๐œƒ.
We would like to estimate a function of ๐œƒ in the real world.
๏‚ง First, we compute our maximum likelihood estimator ๐œƒ.
๏‚ง We draw B Bootstrap samples ๐‘‹ โˆ— of size n from the density ๐‘“๐œƒ ๐‘ฅ .
โˆ—
๏‚ง We then calculate the maximum likelihood estimate ๐œƒ๐‘ for each BS sample
โˆ—
โˆ—
๏‚ง We use the empirical distribution of {๐œƒ1 , โ€ฆ , ๐œƒ๐ต } to estimate qualities of ๐œƒ . For
instance, the sample variance estimates ๐‘ฃ๐‘Ž๐‘Ÿ๐น ๐œƒ.
THE PARAMETRIC BOOTSTRAP - EXAMPLE
๏‚ง ๐‘‹~๐‘
๐œ‡, ๐œŽ 2
Normal model
; ๐‘› = 9 ; interested in the variance of the mean.
50
๏‚ง 500 parametric BS samples of size 9 from ๐‘(๐œƒ, ๐‘ ๐‘’ 2 ).
โˆ—
๏‚ง Histogram of the 500 values ๐œƒ1 , โ€ฆ , ๐œƒ500
โˆ—
100
0
of the samples mean.
๏‚ง Superimposed on the histogram is the density ๐‘ ๐œƒ, ๐‘– ๐œƒ
โˆ’1
.
20
40
60
80
100
Exponential model
300
200
โˆ—
โˆ—
๏‚ง BS sample variance of ๐œƒ1 , โ€ฆ , ๐œƒ500 =183.7, and ๐‘– ๐œƒ
โˆ’1
=177.7 .100
0
20 40 60 80 100 140
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
๏‚ง The score function: ๐‘™ ๐œƒ, ๐‘ฅ =
๐œ•๐‘™(๐œƒ,๐‘ฅ)
๐œ•๐œƒ
๏‚ง The Fisher information: ๐‘– ๐œƒ = โˆ’๐ธ
=
๐œ•log(๐ฟ ๐œƒ,๐‘ฅ )
๐œ•๐œƒ
๐œ•2 ๐‘™ ๐œƒ,๐‘ฅ
๐œ•๐œƒ
=
๐‘› ๐œ•log(๐ฟ(๐œƒ,๐‘ฅ๐‘– )
1
๐œ•๐œƒ
= โˆ’๐ธ[
2
๐‘› ๐œ• log ๐ฟ ๐œƒ,๐‘ฅ๐‘–
1
๐œ•๐œƒ2
]
๏‚ง By the asymptotic distribution of the MLE, under certain conditions (regularity of ๐‘“,
๐œƒ is in the interior of the parameter space): ๐œฝ๐’ โ‰” ๐œฝ ~๐‘ต ๐œฝ, ๐’Š ๐œฝ
โˆ’๐Ÿ
.
๏‚ง This suggests that the sampling distribution of ๐œฝ can be approximated by
๐‘ต ๐œฝ, ๐’Š ๐œฝ
โˆ’๐Ÿ
.
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
๏‚ง Suppose we have a vector of parameters ๐œ‚ and we want to conduct inference for a real
valued function ๐œƒ = โ„Ž(ฮท). Let ๐œ‚0 be the true value of ๐œ‚.
๏‚ง Assuming ๐œ‚ is the MLE, ๐œƒ = โ„Ž(๐œ‚) is the MLE for ๐œƒ.
๏‚ง Denote the parametric family of distribution functions for x by ๐นฮท , and ๐น = ๐นฮท0 .
๐œ•2 ๐‘™ ๐œ‚;๐‘‹
โˆ’๐ธ๐น [
๐œ•๐‘™(๐œ‚;๐‘‹)
๐œ•๐œ‚1
๏‚ง ๐‘™ ๐œ‚; ๐‘ฅ =
|
๐œ•๐‘™(๐œ‚;๐‘‹)
๐œ•๐œ‚๐‘
,๐‘– ๐œ‚ =
]
๐œ•๐œ‚1 2
๐œ•2 ๐‘™ ๐œ‚;๐‘‹
โˆ’๐ธ๐น [
]
๐œ•๐œ‚1 ๐œ•๐œ‚2
โ€ฆ
๐œ•2 ๐‘™ ๐œ‚;๐‘‹
โˆ’๐ธ๐น [
]
๐œ•๐œ‚1 ๐œ•๐œ‚2
โ€ฆ
\
|
โˆ’
โˆ’๐ธ๐น [
๐œ•2 ๐‘™ ๐œ‚;๐‘‹
๐œ•๐œ‚๐‘ 2
๐œ•โ„Ž(๐œ‚)
๐œ•๐œ‚1
,โ„Ž ๐œ‚ =
]
|
๐œ•โ„Ž(๐œ‚)
๐œ•๐œ‚๐‘
.
PARAMETRIC MAX. LIKELIHOOD INFERENCE - THE BASICS
๏‚ง By the chain rule, ๐’Š(๐’‰ ๐œผ ) โˆ’๐Ÿ = ๐’‰ ๐œผ ๐‘ป ๐’Š(๐œผ) โˆ’๐Ÿ ๐’‰ ๐œผ
๏‚ง It can be shown that ๐œƒ๐‘› โ‰” ๐œฝ = ๐’‰ ๐œผ ~๐‘ต(๐’‰(๐œผ๐ŸŽ ), ๐’‰ ๐œผ๐ŸŽ
๐‘ป
๐’Š(๐œผ๐ŸŽ ) โˆ’๐Ÿ ๐’‰(๐œผ๐ŸŽ )),
๏‚ง where ๐œผ๐ŸŽ can be replaced with ๐œผ, a consistent estimator.
๏‚ง A property of M-estimators: U x, F = โˆ’
๏ƒ  ๐‘ผ ๐’™, ๐‘ญ = ๐’Š ๐œฝ
โˆ’๐Ÿ
๐œ•๐œ“ ๐‘ฅ,๐œƒ
๐œ•๐œƒ
โ‹… ๐’ ๐œฝ; ๐’™ = ๐’ โ‹… ๐’‰ ๐œผ ๐‘ป ๐’Š ๐œผ
๏ƒ  ๐‘ผ ๐’™, ๐‘ญ๐œผ = ๐’ โ‹… ๐’‰ ๐œผ ๐‘ป ๐’Š ๐œผ
โˆ’๐Ÿ ๐’‰
๐œผ
โˆ’๐Ÿ
|๐‘ก
๐น
๐‘‘๐น ๐‘ฅ
๐’(๐œผ; ๐’™)
โˆ’1
โ‹… ๐œ“ ๐‘ฅ, ๐‘ก ๐น
PARAMETRIC ML & BS INFERENCE, JACKKNIFE
๏‚ง ๐‘ผ ๐’™, ๐‘ญ = ๐’ โ‹… ๐’‰ ๐œผ ๐‘ป ๐’Š ๐œผ
๏‚ง Now, using ๐‘ฃ๐‘Ž๐‘Ÿ๐น (๐‘ก ๐น )โ‰ˆ
๐‘‡๐‘–
1
๐‘›
๐ธ๐น (๐‘ˆ 2 ๐‘ฅ, ๐น ), we can see that:
โ‰ˆ ๐ธ๐น [๐‘› โ‹… โ„Ž ๐œ‚ ๐‘‡ ๐‘– ๐œ‚
๏‚ง ๐’—๐’‚๐’“๐‘ญ ๐’• ๐‘ญ
๏‚ง ๐‘›โ‹…โ„Ž ๐œ‚
โˆ’๐Ÿ ๐’(๐œผ; ๐’™)
๐œ‚
โˆ’1 ๐ธ [๐‘™(๐œ‚; ๐‘ฅ)๐‘™
๐น
๏‚ง Since ๐‘› โ‹… ๐ธ๐น ๐‘™ ๐œ‚; ๐‘ฅ ๐‘™ ๐œ‚; ๐‘ฅ
๐‘‡
โˆ’1 ๐‘™
๐œ‚; ๐‘ฅ ๐‘™ ๐œ‚; ๐‘ฅ ๐‘‡ ๐‘– ๐œ‚
๐œ‚; ๐‘ฅ ๐‘‡ ]๐‘– ๐œ‚
โˆ’1
โˆ’1 โ„Ž
โ„Ž ๐œ‚ = โ„Ž ๐œ‚ ๐‘‡๐‘– ๐œ‚
= ๐ธ๐น ๐‘™ ๐œ‚; ๐‘‹ ๐‘™ ๐œ‚; ๐‘‹
๐‘‡
=๐‘– ๐œ‚
๐œ‚ ]=
โˆ’1 โ„Ž
๐œ‚ = ๐’Š(๐’‰ ๐œผ ) โˆ’๐Ÿ
PARAMETRIC ML, PARAMETRIC BS, IJ
๏‚ง Therefore, we can conclude that for a statistic ๐‘ก ๐น๐œ‚ , ๐น = ๐น๐œ‚ :
๏‚ง ๐’—๐’‚๐’“๐‘ฐ๐‘ฑ ๐’• ๐‘ญ๐œผ =
๐Ÿ
๐’
๐‘ฌ๐‘ญ๐œผ [๐‘ผ๐Ÿ ๐’™, ๐‘ญ๐œผ ] = ๐’‰ ๐œผ ๐‘ป ๐’Š ๐œผ
๏‚ง Furthermore, of ๐‘ก ๐น๐œ‚ is a linear statistic,
๏‚ง then the IJ and the inverse Fisher information
๏‚ง both agree with the parametric BS estimate of
๏‚ง variance for ๐‘ก ๐น๐œ‚ .
โˆ’๐Ÿ
๐’‰ ๐œผ = ๐’Š(๐’‰ ๐œผ ) โˆ’๐Ÿ
โ€œExactโ€
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric
Bootstrap
Fisher information
Parametric delta
THE DELTA METHOD
๏‚ง Assume ๐œ‚๐‘›
๐ถ๐ด๐‘
๐œ‚, i. e.
๐ท
๐‘› ๐œ‚๐‘› โˆ’ ๐œ‚ โ†’ ๐‘ 0, ฮฃ๐น ๐œ‚ .
๏‚ง We will assume that ๐œ‚๐‘› : โ„๐‘ โ†’ โ„๐ด consists of means: ๐‘„1 , โ€ฆ , ๐‘„๐ด ,
๐ท
๏‚ง By the CLT, ๐‘› ๐‘„ โˆ’ ๐œ‡ โ†’ ๐‘ 0, ฮฃ๐‘ž ๐œ‚
1
๐‘›
๐‘›
1 ๐‘„๐‘Ž (๐‘‹๐‘– ) .
โ‰” N 0, ฮฃ๐‘ž,๐น ๐œ‚ .
๏‚ง Consider โ„Ž ๐œ‚ : โ„๐ด โ†’ โ„1 , ๐›ปโ„Ž โ‰  0.
๏‚ง Then
๐‘ซ
๐’(๐’‰(๐œผ๐’ ) โˆ’ ๐’‰(๐œผ)) โ†’ ๐‘ต(๐ŸŽ, ๐œต๐’‰ ๐œผ ๐‘ป ๐œฎ๐’’,๐‘ญ (๐œผ)๐œต๐’‰(๐œผ)).
๏‚ง In other words, we can estimate ๐’—๐’‚๐’“๐‘ญ ๐’‰(๐œผ ) by
๐œต๐’‰ ๐œผ ๐‘ป ๐œฎ๐’’,๐‘ญ (๐œผ))๐œต๐’‰(๐œผ)
๐’
.
THE DELTA METHOD
๏‚ง First, weโ€™ll look at the expansion:
๏‚ง โ„Ž ๐œ‚ โ‰ˆ โ„Ž ๐œ‚ + ๐›ปโ„Ž ๐œ‚
๐‘‡
๐œ‚โˆ’๐œ‚
๏‚ง This implies:
๏‚ง ๐‘ฃ๐‘Ž๐‘Ÿ โ„Ž ๐œ‚
โ‰ˆ ๐‘ฃ๐‘Ž๐‘Ÿ โ„Ž ๐œ‚ + ๐›ปโ„Ž ๐œ‚
๏‚ง
= ๐›ปโ„Ž ๐œ‚
๐‘‡ ๐ถ๐‘œ๐‘ฃ(๐œ‚)๐›ปโ„Ž
๐‘‡
๐œ‚
๐œ‚โˆ’๐œ‚
= ๐›ปโ„Ž ๐œ‚
๐‘‡ ฮฃ๐‘ž,๐น (๐œ‚) ๐›ปโ„Ž
๐‘›
๐œ‚
THE DELTA METHOD
๏‚ง The nonparametric delta method: ๐น โ†’ ๐น:
๏‚ง Replace the first and second moments with the plugin estimates created from ๐น,
๏‚ง ๐’—๐’‚๐’“๐‘ต๐‘ซ (๐’‰ ๐œผ ) =
๐œต๐’‰๐‘ญ ๐‘ป ๐œฎ๐’’,๐‘ญ ๐œต๐’‰๐‘ญ
๐’
๏‚ง The parametric delta method: ๐น โ†’ ๐น๐œ‚ :
๏‚ง ๐’—๐’‚๐’“๐‘ท๐‘ซ (๐’‰ ๐œผ ) =
๐œต๐’‰๐‘ญ๐œผ ๐‘ป ๐œฎ๐’’,๐‘ญ๐œผ ๐œต๐’‰๐‘ญ๐œผ
๐’
THE DELTA METHOD FOR THE MEAN โ€“ EXAMPLE
๏‚ง ๐œฝ ๐’™๐Ÿ , โ€ฆ , ๐’™๐’ =
๐Ÿ
๐’
๐’
๐Ÿ ๐’™๐’Š ,
h ๐œ‚ = ๐œ‚ โ†’ ๐›ปโ„Ž๐น = 1. CLT ๏ƒ  ฮฃ๐น = ๐‘ฃ๐‘Ž๐‘Ÿ(๐‘ฅ).
๏‚ง Nonparametric: the plugin estimate of variance, ฮฃ๐น =
๐‘๐ท
๏‚ง Var
(โ„Ž ๐œ‚ ) =
๐›ปโ„Ž๐น ๐‘‡ ฮฃ๐‘ž,๐น ๐›ปโ„Ž๐น
=
๐‘›
1
๐‘›2
๐‘›
1 (๐‘ฅ๐‘–
1
๐‘›
๐‘›
1 (๐‘ฅ๐‘–
โˆ’ ๐‘ฅ)2
โˆ’ ๐‘ฅ)2 .
๏‚ง Parametric delta: if we assume x is exponential, ๐œ‚ = ๐‘ฅ, ฮฃ๐น๐œ‚ = ๐œŽ 2 ๐น๐œ‚ =
๏‚ง Var๐‘ƒ๐ท (โ„Ž ๐œ‚ ) =
๐›ปโ„Ž๐น ๐‘‡ ฮฃ๐‘ž,๐น ๐›ปโ„Ž๐น
๐œ‚
๐œ‚
๐‘›
๐œ‚
=
1
๐‘›๐‘ฅ 2
1
.
๐‘ฅ2
THE DELTA METHODโ€™S CONNECTIONS
๏‚ง If ๐‘ก ๐น = ๐œ‚๐‘›
๐ถ๐ด๐‘
๐œ‚ [or a function of means], then:
๏‚ง ๐’—๐’‚๐’“๐‘ฐ๐‘ฑ ๐’• ๐‘ญ = ๐’—๐’‚๐’“๐‘ต๐‘ซ ๐’• ๐‘ญ .
๏‚ง For a parametric model:
โ€œExactโ€
Nonparametric
Bootstrap
Approximate
Jackknife
Infinitesimal Jackknife
Nonparametric delta
๏‚ง ๐’—๐’‚๐’“๐‘ฐ๐‘ฑ ๐’• ๐‘ญ๐œผ = ๐’—๐’‚๐’“๐‘ท๐‘ซ ๐’• ๐‘ญ๐œผ = ๐’Š(๐’‰ ๐œผ ) โˆ’๐Ÿ
Parametric Bootstrap
Fisher information
Parametric delta
THE EXPONENTIAL FAMILY
๏‚ง A random variable X is said to have a density in the exponential family if
๏‚ง ๐‘“๐œ‚ ๐‘ฅ = โ„Ž0 ๐‘ฅ
๐‘‡ ๐‘ž ๐‘ฅ โˆ’๐œ“ ๐œ‚
๐œ‚
๐‘’
, where q is a vector of A sufficient statistics, ๐œ‚ is the vector of
natural parameters, and the support does not depend on ๐œ‚.
๏‚ง ๐‘ฌ๐’’ ๐‘ฟ
= ๐โ€ฒ ๐œผ , ๐’—๐’‚๐’“ ๐’’ ๐‘ฟ
= ๐โ€ฒโ€ฒ ๐œผ .
๏‚ง ๐‘ฅ~ exp ๐œ† , ๐‘“ ๐‘ฅ = ๐œ†๐‘’ โˆ’๐œ†๐‘ฅ = 1 โ‹… ๐‘’ ๐œ†โ‹…(โˆ’๐‘ฅ) โ‹… ๐‘’ โˆ’(โˆ’ ln
๏ƒ ๐ธ ๐‘ž ๐‘‹
๏ƒ  ๐‘ฃ๐‘Ž๐‘Ÿ ๐‘ž ๐‘‹
1
๐œ†
= ๐ธ โˆ’๐‘ฅ = โˆ’ =
=
1
๐œ†2
=
๐œ•2 โˆ’ ๐‘™๐‘› ๐œ†
๐œ•๐œ†2
๐œ• โˆ’ ln ๐œ†
๐œ•๐œ†
๐œ† )
PROOF โ€“ THE PARAMETRIC CASE
๏‚ง Now, say we have n observations ๐‘ฅ1 , โ€ฆ , ๐‘ฅ๐‘› from the exponential family.
๏‚ง We can conclude that their joint distribution also belongs to the exponential family,
with natural parameter ๐œ‚, and with sufficient statistics ๐‘„ = (
๏‚ง Therefore, ๐‘“๐œ‚ ๐‘„ = โ„Ž1 ๐‘ž ๐‘’ ๐‘›(๐œ‚
๏‚ง ๐‘ฌ๐‘ธ ๐‘ฟ
= ๐โ€ฒ ๐œผ , ๐’—๐’‚๐’“ ๐‘ธ ๐‘ฟ
๐‘‡๐‘ž
๐‘ฅ โˆ’๐œ“ ๐œ‚ )
1 ๐‘ž (๐‘ฅ )
๐‘› 1 ๐‘–
๐‘›
,โ€ฆ,
.
= ๐โ€ฒโ€ฒ ๐œผ
๏‚ง ๐œ‚ satisfies the set of equations: ๐’’ = ๐โ€ฒ ๐œผ โ†’ ๐’’ = ๐‘ฌ ๐‘ธ ๐‘ฟ
๐’–๐’๐’…๐’†๐’“ ๐‘ญ๐œผ .
1 ๐‘ž (๐‘ฅ )
๐‘› ๐ด ๐‘–
๐‘›
).
PROOF โ€“ THE PARAMETRIC CASE
๏‚ง The solution to ๐’’ = ๐โ€ฒ ๐œผ โ†’ ๐œผ = ๐โ€ฒโˆ’๐Ÿ ๐’’ .
๏‚ง If we look at the parametric delta method, which begins with ๐‘„ having variance
applies ๐œƒ = โ„Ž (๐œ“โ€ฒ ๐‘ž
๐œ“โ€ฒโ€ฒ ๐œ‚
๐‘›
โˆ’1 ),
๐œ‚
๏‚ง Let K be the matrix of derivatives of ๐œ“โ€ฒโˆ’1 โ‹… .
๏‚ง ๐’—๐’‚๐’“๐‘ท๐‘ซ ๐’‰ ๐‘ญ๐œผ = โ„Ž ๐œ‚
๐‘‡ ๐พ ๐‘‡ ๐‘ฃ๐‘Ž๐‘Ÿ
๐‘„ ๐พ๐‘‡โ„Ž ๐œ‚ =
โ„Ž ๐œ‚ ๐‘‡ ๐พ ๐‘‡ ๐œ“โ€ฒโ€ฒ ๐œ‚ ๐พ๐‘‡ โ„Ž ๐œ‚
๐‘›
๏‚ง Therefore, ๐’—๐’‚๐’“๐‘ท๐‘ซ ๐’• ๐‘ญ๐œผ = ๐’‰ ๐œผ ๐‘ป ๐’Š(๐œผ) โˆ’๐Ÿ ๐’‰ ๐œผ .
=
โˆ’๐Ÿ
๐’‰ ๐œผ ๐‘ป ๐โ€ฒโ€ฒ
๐œผ ๐’‰ ๐œผ
๐’
.
, and
PROOF โ€“ THE PARAMETRIC CASE
๏‚ง On the other hand,
๏‚ง ๐‘– ๐œ‚ = ๐‘› โ‹… ๐œ“ โ€ฒโ€ฒ ๐œ‚ = ๐‘›2 ๐‘ฃ๐‘Ž๐‘Ÿ(๐‘„)
๏‚ง ๐‘– ๐œƒ
โˆ’1
=โ„Ž ๐œ‚
๐‘‡ ๐‘–(๐œ‚) โˆ’1 โ„Ž
๐œ‚ =
๐’‰ ๐œผ ๐‘ป ๐โ€ฒโ€ฒ ๐œผ โˆ’๐Ÿ ๐’‰ ๐œผ
๐’
SUMMARY โ€“ ESTIMATION OF VAR(๐œƒ)
Nonparametric Bootstrap
Jackknife
Infinitesimal Jackknife
Nonparametric delta
Parametric BS
Fisher information
Parametric delta
๐‘ฃ๐‘Ž๐‘Ÿ๐น ๐‘ก ๐น โˆ—
๐‘›โˆ’1
(
)
๐‘›
๐‘›
(๐‘ก ๐น
1
1
( )2
๐‘›
๐‘–
โˆ’๐‘ก ๐น
๐‘› 2
1 ๐‘ˆ (๐‘ฅ๐‘– ,
.
)2
๐น)
๐›ปโ„Ž๐น ๐‘‡ ๐›ด๐‘ž,๐น ๐›ปโ„Ž๐น
๐‘›
๐‘ฃ๐‘Ž๐‘Ÿ๐น๐œ‚ โ„Ž(๐œ‚)โˆ—
๐›ปโ„Ž ๐œ‚ ๐‘‡ ๐‘– ๐œ‚
โˆ’1 ๐›ปโ„Ž
๐œ‚ = ๐‘–(โ„Ž ๐œ‚ ) โˆ’1
๐›ปโ„Ž๐น๐œ‚ ๐‘‡ ๐›ด๐‘ž,๐น๐œ‚ ๐›ปโ„Ž๐น๐œ‚
๐‘›