IMPORTANCE SAMPLING Importance Sampling Background: let x

1
IMPORTANCE SAMPLING
Importance Sampling Background: let x = (x1, . . . , xn),
Z
θ = E[h(X)] = h(x)f (x)dx
N
1 X
≈
h(Xi) = Θ̄,
N i=1
if Xi ∼ F (X), and F (x) is cdf for f (x). For many problems,
F (x) is difficult to sample from and/or V ar(h) is large.
• If a related, easily sampled pdf g(x) is available, could use
Z
h(X)f (X
h(x)f (x)
θ = Eg [
] =
g(x)dx
g(X)
g(x)
N
1 X h(Xi)f (Xi)
≈
,
N i=1
g(Xi)
with Xi ∼ G(X), for associated cdf G(X).
(x)
• Importance sampling: if V ar( h(x)f
g(x) ) is small, g(x)
samples are concentrated where h(x)f (x) is “important”:
2
IMPORTANCE SAMPLING CONT.
R 1 x2
• Importance Sampling Example: θ = 0 e dx.
R 1 x2−x x
x
Try g(x) = e ; so θ = 0 e
e dx.
R1 x
To find G(x), note 0 e dx = e − 1, so
Z x
1
etdt = (ex − 1)/(e − 1);
G(x) =
e−1 0
then using Xi = ln(1 + (e − 1)Ui),
Z 1
N
x
X
e
(e
−
1)
2
2
θ = (e − 1)
ex −x
eXi −Xi ,
dx ≈
e−1
N i=1
0
N = 10000; U = rand(1,N); Y = exp(U.^2);
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % simple MC
1.4672
0.009463
e = exp(1); X = log(1+(e-1)*U);
T = (e-1)*exp(X.*(X-1));
disp( [mean(T) 2*std(T)/sqrt(N)]) % importance
1.4628
0.0022348
Error reduction by ≈ 1/4.
exp(x), exp(x2), and 1+x2+x4/2
2.8
2.6
2.4
2.2
2
1.8
1.6
1.4
1.2
1
0
0.1
0.2
0.3
0.4
0.5
x
0.6
0.7
0.8
0.9
1
3
IMPORTANCE SAMPLING CONT.
Alternative: g(x) = 1 + x2, G(x)?
4
θ=
3
Z
0
1
2
2
N
ex 3(1 + x2)
4 X eXi
dx ≈
,
1 + x2
4
3N i=1 1 + Xi2
with Xi ∼ 43 X + 41 X 3.
N = 10000; U = rand(1,N); I = rand(1,N)<3/4;
X = I.*U + (1-I).*U.^(1/3);
T = 4*exp(X.^2)./(3*(1+X.^2));
disp( [mean(T) 2*std(T)/sqrt(N)]) % importance
1.4627
0.0028178
10 3
3 5
Better g(x) = 1 + x2 + x4/2, G(x) = 30
x
+
x
+
43
43
43 x ;
Z
2
ex
30(1 + x2 + x4/2)
43 1
dx
θ =
30 0 1 + x2 + x4/2
43
N
Xi2
X
43
e
≈
,
2
4
30N i=1 1 + Xi + Xi /2
with Xi ∼
30
43 x
3 5
3
+ 10
x
+
43
43 x .
N = 100000; U = rand(1,N); V = rand(1,N);
I = V<30/43; J = V>40/43;
X = I.*U + (1-I).*(1-J).*U.^(1/3)+ J.*U.^(1/5);
T = 43*exp(X.^2)./(30*(1+X.^2+X.^4/2));
disp( [mean(T) 2*std(T)/sqrt(N)]) % importance
1.4623
0.00072228
4
IMPORTANCE SAMPLING CONT.
R ∞ −z 2/2 3 z
1
√
• Importance Sampling Example: θ = 2π −∞ e
z e dz.
N = 100000; Z = randn(1,N); Y = Z.^3.*exp(Z);
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % simple MC
6.5418
0.33644
Try g(z) =
2
√1 e−(z−1) /2
2π
=
2
√1 e−w /2 ,
2π
with w = z − 1; so
1/2 Z ∞
e
2
θ = e1/2
(w + 1)3e−w /2dw.
z 3g(z)dz = √
2π −∞
−∞
PN
e1/2
3
Simulation uses θ ≈ N
i=1 (Zi +1) , with Zi ∼ N ormal(0, 1).
Z
∞
N = 100000; Z = randn(1,N); Y = exp(1/2)*(Z+1).^3;
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % importance
6.6338
0.081528
Error reduction by ≈ 1/4. Note
Z
e1/2 ∞
2
−w2 /2
θ=√
(3w + 1)e
dw = 4e1/2 ≈ 6.5948851.
2π −∞
5
IMPORTANCE SAMPLING CONT.
• Higher dimensional problems: often
f (x) ≈ g(x) = g1(x1)g2(x2) · · · gn(xn),
so samples are from
of 1-d samples.
R 1 R a1 sequence
2
2-d example: θ = 0 0 e(x1+x2) dx; if g(x) = ex1 ex2 ;
R 1 R 1 ((x +x )2−x −x x +x
1
2e 1
2 dx.
θ= 0 0e 1 2
After scaling, with Xij = ln(1 + (e − 1)Uij ),
Z 1Z 1
x1 +x2
2
((x1 +x2 )2 −x1 −x2 ) e
dx
θ = (e − 1)
e
2
(e
−
1)
0
0
N
(e − 1)2 X (X1i+X2i)2−X1i−X2i
e
.
≈
N
i=1
N = 10000; U = rand(2,N); T = exp(sum(U).^2);
disp( [mean(T) 2*std(T)/sqrt(N)]) % simple MC
4.9204
0.12261
e = exp(1); X = log(1+(e-1)*U);
T = (e-1)^2*exp(sum(X).^2-sum(X));
disp( [mean(T) 2*std(T)/sqrt(N)])
4.8863
0.065169
Better g(x) = e2x1 e2x2 , with g(1, 1) = f (1, 1)? Then
e2x −1
Gi(x) = e2−1 , Xij = ln(1 + (e2 − 1)Uij )/2, and
2
2 Z 1Z 1
2(x1 +x2 )
(e − 1)
((x1 +x2 )2 −2(x1 +x2 )) 4e
θ=
e
dx,
2
2
4
(e − 1)
0
0
e = exp(1); X = log(1+(e^2-1)*U)/2;
T = (e^2-1)^2*exp(sum(X).^2-2*sum(X))/4;
disp( [mean(T) 2*std(T)/sqrt(N)])
4.9008
0.0082436
Better g(x) = 1 + (x1 + x2)2?
6
IMPORTANCE SAMPLING CONT.
Tilted Densities
R tx g(x) : given pdf f (x)
let M (t) = e f (x)dx (the moment generating function).
tx (x)
The tilted density for f (x) is ft(x) = eMf(t)
.
• Examples
– Exponential densities: if f (x) = λe−λx, x ∈ [0, ∞),
ft(x) = (λ − t)e−(λ−t)x,
t < λ.
– Bernoullli pmf’s: f (x) = px(1 − p)1−x, x = 0, 1.
M (t) = Ef [etx] = etp + (1 − p), so
x 1−x
etx px (1−p)1−x
et p
1−p
ft(x) = etp+(1−p) = etp+(1−p)
,
et p+(1−p)
a Bernoulli RV with
t
et p
pt = etp+(1−p) .
−tx t
= e (e p + (1 − p))
So f /ft = e p+(1−p)
etx
Generalization: if f (x) is a Binomial(n, p) pmf, ft(x) is
Binomial(n, etp + 1 − p), with M (t) = (etp + 1 − p)n.
2
– Normal densities: if f (x) =
2
/2
e−x
√
,
2π
x ∈ (−∞, ∞),
2
xt −x /2
−(x−t) /2 −t
e
e
e
e
tx
√
e f (x) = √
=
2π
2π
2 /2
2 /2
e−(x−t)
√
,
2π
2 /2
so ft(x) =
N ormal(t, 1), with M (t) = e−t
Generalization: if f (x) is a N ormal(µ, σ 2) pdf, then
ft(x) is a N ormal(µ + σ 2t, σ 2) pdf.
(x)
• Choosing t: pick t with small V ar( h(x)f
ft (x) ).
Text heuristic
P for exponentials and∗ Bernoullis:P
if h = I{ Xi > a}, choose t = t with Et∗ [ Xi] ≈ a.
.
7
IMPORTANCE SAMPLING CONT.
• Examples:
1. Bernoulli RV Examples: if Xi0sPare independent
Bernoulli(pi) RV’s and θ = I{ ni=1 Xi > a} = I{S > a}.
n
Y
(etpi + (1 − pi)), with
θ̂ = I{S > a}e−tS
i=1
Et[
n
X
i=1
Xi ] =
n
X
i=1
et p i
.
t
e pi + (1 − pi)
Example with n = 20, pi = .4, a = 16; choose t so that
t
t∗
=
16,
with
solution
e
= 6;
Et[S] = 20 .4e.4e
t +.6
∗
then pt = .8, et p + (1 − p) = 3, and estimator is
P
P
−S 20
20−S
θ̂ = I{ Xi > a}6 3 = 3
I{ Xi > a}/2S .
Matlab
N = 100000; p = .4; n = 20;
I = sum( rand(n,N) < p ) > 16; % Simple MC
disp([mean(I) 2*std(I)/sqrt(N)])
6e-05
4.8989e-05
S = sum( rand(n,N) < .8 ); % importance
I = 3.^(20-S).*( S > 16 )./2.^S;
disp([mean(I) 2*std(I)/sqrt(N)])
4.7575e-05
5.1608e-07
N = 10000000; p = .4; n = 20;
I = sum( rand(n,N) < p ) > 16; % Simple MC
disp([mean(I) 2*std(I)/sqrt(N)])
4.82e-05
4.3908e-06
P20 20 i 20−i
Note: θ = i=17 i (.4) (.6)
≈ 4.7345 × 10−5.
8
IMPORTANCE SAMPLING CONT.
2. Exponential RV Example:
P
1
if Xi ∼ Exp( i+2
), i = 1, . . . , 4, S(X) = 4i=1 Xi, find
P
xi
Z ∞
Z ∞
− 4i=1 i+2
e
dx,
θ=
···
h(x)
3
·
4
·
5
·
6
0
0
with h(x) = S(x)I{S(x) > 62}.
1
Raw simulation uses Xij ∼ Exp( i+2
), to estimate
N
1 X
h(Xj ).
θ≈
N j=1
Matlab
N = 100000; U = rand(4,N);
X = -diag([3:6])*log(1-U);
S = sum(X); h = S.*( S > 62 );
disp( [mean(h) 2*std(h)/sqrt(N)])
0.066974
0.013647
Note: to find E[S|S > 62], divide by E[I(S > 62)]
E = h/mean((S>62));
disp([mean(E) 2*std(E)/sqrt(N)])
66.948
15.237
9
IMPORTANCE SAMPLING CONT.
For tilted density, use common tilt parameter t,
so that Xi ∼ Exp(1/(i + 2) − t),
P
Z
1 −t)
4
−tS(x) − 4i=1 xi ( i+2
Y
i+2
h(x)e
e
θ =
dx;
Q4
Q4
i+2
1 − (i + 2)t [0,∞)4 i=1(i + 2) i=1 1−(i+2)t)
i=1
N
4
Y
CX
1
≈
h(Xj )e−tS(Xj ), with C =
.
N j=1
1 − (i + 2)t
i=1
Text estimates “good” t = .14, by approximately solving
4
X
i=1
Et[Xi] =
3
4
5
6
+
+
+
= 62.
1 − 3t 1 − 4t 1 − 5t 1 − 6t
But “guess and check” with Matlab finds “better” t ≈ .136.
Matlab tests:
t = .14; Cd = 1./(1-[3:6]*t); C = prod(Cd);
St = -([3:6].*Cd)*log(1-U);
ht = C*St.*( St > 62 ).*exp(-t*St);
disp([mean(ht) 2*std(ht)/sqrt(N)])
0.063281
0.0010059
t = .136; Cd = 1./(1-[3:6]*t); C = prod(Cd);
St = -([3:6].*Cd)*log(1-U);
ht = C*St.*( St > 62 ).*exp(-t*St);
disp([mean(ht) 2*std(ht)/sqrt(N)])
0.06201
0.00099896
E = ht/mean(C*(St > 62).*exp(-t*St));
disp([mean(E) 2*std(E)/sqrt(N)])% Expected Value
68.215
1.0931
Note smaller standard errors compared to raw sampling.
10
IMPORTANCE SAMPLING CONT.
• Tilting for Normal Densities: if f (x) =
f (x)ext
M (t) is a
−z 2 /2 3 z
tilted density ft(x) =
R∞
1
Example: θ = √2π −∞ e
2
√1 e−(x−µ) /2 ,
2π
shifted normal.
z e dz. Pick t to match point
−(z−1)2 /2 3
(mode) where integrand a(z) = Ke
z is max.
exp(x−x2/2) x3/sqrt(2 π)
3.5
3
2.5
2
1.5
1
0.5
0
−4
−3
−2
−1
0
x
1
2
3
4
11
IMPORTANCE SAMPLING CONT.
Or, pick t to match mean at t = 2.5.
For either Rcase:
2
2
∞
θ = √12π −∞ e−(z−t) /2z 3ez−zt+t /2dz
R ∞ −w2/2
2
1
√
(w+t)3e(w+t)(1−t)+t /2), using z = w+t.
= 2π −∞ e
Matlab tests:
N = 100000; W = randn(1,N); t = (1+sqrt(13)/2;
Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2);
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % mode tilt
6.6408
0.035346
t = 2.5; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2);
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % mean tilt
6.6571
0.037741
Compare:
t = 1; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2);
disp( [mean(Y) 2*std(Y)/sqrt(N)]) % tilt = 1
6.5775
0.079949
t=0; Y = (W+t).^3.*exp((W+t)*(1-t)+t^2/2);
disp( [mean(Y) 2*std(Y)/sqrt(N) ]) % t=0, raw MC
6.3272
0.29958
12
IMPORTANCE SAMPLING CONT.
• Tilting for Multidimensional Normal Density Problems:
(x)
use vector t. Choice of t? Try to make V ar( h(x)f
f (x−t) ) small:
a) choose point t where h(x)f (x) is maximum (mode), or
b) choose t = E[xh(x)]/E[h(x)] (mean).
√
2
(r− σ2 )δ+σ δZm
,
Asian Option example: this has Sm = Sm−1e
with δ = T /M , Zm ∼ N ormal(0, 1) and expected profit
M
X
1
Si(Z) − K, 0)]
θ = E[e−rT max(
M i=1
P
Z ∞
Z ∞
M
zi2 /2
− M
X
i=1
1
e
max(
···
Si(z) − K, 0) √
= e−rT
dz
m
M i=1
( 2π)
−∞
−∞
P
Z ∞
Z ∞
2
− M
e i=1 zi /2
=
···
h(z) √
dz,
m
( 2π)
−∞
−∞
PM
1
−rT
with h(z) = e
max( M i=1 Si(Z) − K, 0).
P
2
− M
For method a), find t to maximize h(z)e i=1 zi /2.
For method b), t can be estimated from data.
Given t, use
P
PM
Z ∞
Z ∞
2 /2
2
− M
z
−
e i=1 i
e i=1(zi−ti) /2
√
θ̂ =
···
h(z) PM
dz
2 /2
m
−
(z
−t
)
( 2π)
e i=1 i i
−∞
−∞
P
P
Z ∞
Z ∞
2
− M
(yi +ti )2 /2 − M
i=1
e
e i=1 yi /2
√
=
···
h(y + t)
dy.
PM 2
m
−
y
/2
( 2π)
−∞
−∞
e i=1 i
13
IMPORTANCE SAMPLING CONT.
Example with M = 16,
S0 = K = 50, T = 1, r = .05, σ = .1.
Matlab test using method b):
M = 16; S0 = 50; K = 50; T = 1; dlt = T/M;
r = 0.05; s = 0.1; rd = ( r - s^2/2 )*dlt;
N = 10000; z = randn(M,N); % Simple MC
S = S0*exp(cumsum(rd + s*sqrt(dlt)*z));
h = exp(-r*T)*max( mean(S)-K, 0 );
disp([mean(h) var(h) 2*std(h)/sqrt(N)])
1.9465
4.825
0.043932
t = z*h’/sum(h); % Approx. Mean Tilt Vector
y = z; z = y + t*ones(1,N);
S = S0*exp(cumsum(rd + s*sqrt(dlt)*z));
h = exp(-r*T)*max( mean(S)-K, 0 );
ht = h.*exp(sum(y.*y-z.*z)/2); % Importance
disp([mean(ht) var(ht) 2*std(ht)/sqrt(N)])
1.9136
0.66366
0.016293
Notice variance reduction from tilted sampling.
Using method a) with Matlab “fminsearch” to find t:
Sf = @(z)S0*exp(cumsum(rd+s*sqrt(dlt)*z));
hf = @(z)exp(-z’*z/2)*max(mean(Sf(z))-K,0);
t = fminsearch(@(z)-hf(z), ones(M,1) ); % Tilt t
y = z; z = y + t*ones(1,N);
S = S0*exp(cumsum(rd + s*sqrt(dlt)*z));
h = exp(-r*T)*max( mean(S)-K, 0 );
ht = h.*exp(sum(y.*y-z.*z)/2); % Importance
disp([mean(ht) var(ht) 2*std(ht)/sqrt(N)])
1.8868
0.35124
0.011853