Presentation on Importance Sampling and Radon-Nikodym Derivatives (PDF)

Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Importance Sampling and
Radon-Nikodym Derivatives
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
1 / 33
Steven R. Dunbar
Outline
Importance
Sampling and
Radon-Nikodym
Derivatives
1
Sampling with respect to 2 distributions
2
Rare Event Simulation
3
Absolute Continuity
4
Examples and Applications
5
Statistical Interpretation
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
2 / 33
More than one way to evaluate a statistic
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
A statistic for X with pdf u(x) is
Z
A = Eu [F (X)] = F (x)u(x) dx
Suppose v(x) is another probability density such that
Rare Event
Simulation
L(x) =
Absolute
Continuity
Examples and
Applications
is well-defined.
Statistical
Interpretation
Then
u(x)
v(x)
Z
A = Ev [F (X)L(X)] =
3 / 33
F (x)
u(x)
v(x) dx
v(x)
Likelihood ratio
Importance
Sampling and
Radon-Nikodym
Derivatives
The likelihood ratio is
Steven R.
Dunbar
L(x) =
Sampling with
respect to 2
distributions
Rare Event
Simulation
u(x)
v(x)
so
A = Eu [F (X)] = Ev [F (X)L(X)]
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
4 / 33
To get A we can either:
take samples with density u(x) calculate with F ; or
take samples with density v(x), calculate with F · L.
Good choice of likelihood ratio
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
5 / 33
How should we seek v(x) and corresponding L(x)?
Reduction of variance is a good criterion: Want
Varv [F (X)L(X)] = Ev (F (X)L(X))2 − A2
less than
Varu [F (X)] = Eu (F (X))2 − A2
Example: Large values for a normal r.v.
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Suppose X ∼ N (0, 1) and we wish to evaluate P [X > 4]
R code for probability:
1 - pnorm(4)
Absolute
Continuity
The output
Examples and
Applications
[1] 3.167124e-05
Statistical
Interpretation
6 / 33
Sampling
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Code for sampling
sample <- rnorm(10^5)
sum( sample > 4 )
sample[ which(sample > 4) ]
Rare Event
Simulation
The output
Absolute
Continuity
[1] 6
[1] 4.110055 4.970314 4.026803 4.133050 4.014239
Examples and
Applications
Statistical
Interpretation
7 / 33
Most of the sample doesn’t count, "wasted effort".
The samples with X > 4 are only a little larger than 4.
A better choice of sampling PDF
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
8 / 33
Draw samples from N (4, 1) instead. Put most of the
weight in sampling near 4.
Likelihood ratio is
2
L(x) =
u(x)
ke−x /2
2
= −(x−4)2 /2 = e4 /2 e−4x .
v(x)
ke
New sampling
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
The v-method, with sample draw according to the new
pdf:
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
9 / 33
Z
∞
L(X)v(x) dx ≈
PN (0,1) [X > 4] =
4
1 X
L(vk )
N v >4
k
=
e
42 /2
N
X
vk >4
e−4vk
New sampling example
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
10 / 33
Code for sampling
vsample <- rnorm(10^5,4,1)
usevsample <- vsample[which(vsample > 4)]
Pv <- exp(4^2/2)*sum(exp(-4*usevsample))/10^5
Pv
The output
[1] 3.17203e-05
Intuition behind Importance Sampling
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
11 / 33
About half the samples are counted, but they are
counted with a small weight.
A lot of small weight hits give a lower variance estimator
than a few large weight hits.
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
12 / 33
Variance for the naive sampling:
Z
2
e−x /2
P = PN (0,1) [X > 4] = 1[X>4] (x) √
dx
2π
Varu [F (X)] = Eu (F (X))2 − P 2
Z
2
e−x /2
= (1[X>4] (x))2 √
dx
2π
= P − P2
≈P
Variance is same order as P . (The standard deviation
will be relatively large compared to P .)
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Naive Sample Variance
P <- sum((sample > 4)^2)/10^5
P-P^2
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
13 / 33
Results
[1] 5.99964e-05
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
14 / 33
2 /2
Importance sampling: L(x) = e4
e−4x
Variance for importance sampling:
Varv [F (X)] = Eu (F (X)L(X))2 − P 2
Z
−(x−4)2 /2
42 /2 −4x 2 e
√
dx −P 2
= (1[X>4] (x)e e )
2π
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Importance Sample Variance
secmomv <- sum((exp(4^2/2)*exp(-4*usevsample))^2
secmomv - Pv^2
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
15 / 33
Results
[1] 4.550243e-09
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
16 / 33
What is best normal distribution N (b, 1) for importance
sampling of P = PN (0,1) [X > 4]?
After some calculation with L(x):
2
Varv 1[X>4] = eb (1 − Φ(b + 4)) − P 2
( Φ(x) is the cdf of the N (0, 1) distribution. )
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Unfortunately, (1 − Φ(b + 4)) is 0 to precision-levels for
3 ≤ b ≤ 5 leading to severe numerical error in
2
eb (1 − Φ(b + 4)) − P 2 .
Use standard asymptotic estimates for the normal cdf
tail:
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
17 / 33
√
1
2
2
e−(b+4) /2 − e−(b+5) /2 ≤
2π(b + 5)
1 − Φ(x)
1
2
≤√
e−(b+4) /2
2π(b + 4)
Digression: Reduction of Variance
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
Using the estimates
with L(x) provides algebraic simplification which reduces
numerical error.
Using a N (b, 1) with b ≈ 4.1 gives minimum variance.
18 / 33
New Measures from Old
Importance
Sampling and
Radon-Nikodym
Derivatives
If Q is a probability, can define a new probability measure
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
19 / 33
dP = L(x) dQ
provided
L(x) ≥ 0 (almost surely w.r.t. Q).
R
L(x) dQ = 1.
Ω
Then
EP [F (X)] = EQ [F (X)L(X)]
Relation between measures
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Ask the reverse question: Given measures P and Q, is
there an L(x) relating them?
Necessary condition: Q[B] = 0 =⇒ P[B] = 0 because
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
20 / 33
P[B] = EP [1B (X)] = EQ [1B (X)L(X)] = 0
“Impossible under Q is also impossible under P.”
Absolute Continuity
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
21 / 33
If Q[B] = 0 =⇒ P[B] = 0 we say P is absolutely
continuous w.r.t. Q and write P Q.
“Impossible under Q is also impossible under P.”
Radon-Nikodym Theorem
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
The R-N Theorem says the necessary condition of
absolute continuity is also sufficient:
Radon-Nikodym Theorem
P and Q are probability measures on common
σ-algebra F; and
P is absolutely continuous w.r.t. Q
then there is a function
Examples and
Applications
L(x) =
Statistical
Interpretation
dP
.
dQ
so that
EP [F (X)] = EQ [F (X)L(X)] .
22 / 33
Completely Singular
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
23 / 33
If there is a B with P[B] = 0 and Q[B] = 1 we say Q is
completely singular w.r.t. P.
Note: P[B C ] = 1 and Q[B C ] = 0 so then P is completely
singular w.r.t. Q and we write
P⊥Q
Simple Example 1
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
24 / 33
X ∼ U ([0, 1]) with probability P
Y ∼ N (0, 1) with probability Q
Then
P is absolutely continuous w.r.t. Q (
Q[B] = 0 =⇒ P[B] = 0 )
Q is not absolutely continuous w.r.t. P
( P[1, 2] = 0 but Q[1, 2] ≈ 0.136 )
Simple Example 2
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
25 / 33
X ∈ R2 , X ∼ N ((0, 0), I2,2 ) with probability P
Y = X/kXk, Y ∼ U (S 1 ) with probability Q
Then
P[S 1 ] = 0 and Q[S 1 ] = 1
P ⊥ Q.
Sophisticated Example
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
26 / 33
X ∼ U ([0, 1]) with probability P
Y has the Cantor distribution on [0, 1], with
probability Q
Then
Q has a continuous c.d.f. (Devil’s staircase), but
the p.d.f. does not exist in any easy sense.
P ⊥ Q.
This example shows that absolute continuity and
completely singular are extensions of the “calculus”
definitions for functions from real analysis.
Example of an R-N derivative
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
27 / 33
Previous sampling example:
2
X ∼ N (0, 1), with probability P, p.d.f. e−x /2
2
Y ∼ N (4, 1), with probability Q, p.d.f e−(x−4) /2
dP
2
= e4 /2 e−4x
dQ
Bigger example of an R-N derivative
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
X ∈ Rn , X ∼ N (0, Cn,n ), H = C −1 .
Y ∈ Rn , Y ∼ N (µ, Cn,n ), H = C −1 .
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
28 / 33
u(x) = kn e−x
T Hx/2
v(x) = kn e−(x
T −µ)H(x−µ)/2
T Hµ/2
= kn e−µ
T Hx
eµ
e−x
T Hx/2
Continuation, Bigger Example
Importance
Sampling and
Radon-Nikodym
Derivatives
Let
Steven R.
Dunbar
µT H = ν T
Hµ = ν
µ = Cν
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
29 / 33
so
dP
T
T
= eν µ/2 e−ν x
dQ
Deciding the Distribution
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
30 / 33
Have a single sample point: X. Is:
X ∼ P (null hypothesis ); or
X ∼ Q (alternate hypothesis)
Test: Criterion is set B:
If x ∈ B say Q
If x ∈ B C say P
Types of errors
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
31 / 33
Type I error is saying Q when truth is P (reject null
and accept alternative when null is true)
Type II error is saying P when truth is Q (accept
null and reject alternative when null is false)
(Usually we believe) Type I errors are worse than Type II
Confidence in the test
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
32 / 33
The confidence in the procedure is P B C = 1 − P [B],
probability of accepting null, when the null is true.
The power in the procedure is Q [B] of rejecting null,
when the null is false.
If Q ⊥ P, there is a test with 100% confidence and 100%
power.
Conversely, If there is a test with 100% confidence and
100% power, then Q ⊥ P,
Neyman-Pearson Lemma
Importance
Sampling and
Radon-Nikodym
Derivatives
Steven R.
Dunbar
Sampling with
respect to 2
distributions
Rare Event
Simulation
Absolute
Continuity
Examples and
Applications
Statistical
Interpretation
33 / 33
A test is efficient if there is no way to increase the
confidence without decreasing the power.
Neyman-Pearson Lemma
If B is efficient, then there is L0 such that
dQ
B= x :
> L0
dP