Lecture Notes -- Week 7-8 File

MTH739U/P: Topics in Scientific Computing
Week 7
5
Autumn 2016
Monte Carlo methods
Monte Carlo sampling is one of the most important and useful methods in scientific computing. It is used as a tool in many disciplines from biology through financial mathematics
to physics. The term Monte-Carlo was invented in Los Alamos National Laboratory, famous for its work on the atomic bomb. In it’s essence it is a method for estimating an
expected value using the sample mean of a function of simulated random variables. In
other words it is a computational way of estimating the integral of any function using
random variables, even without prior knowledge of the function itself. This is extremely
useful, since many quantities of interest can be cast into the form of expected values.
5.1
Computing the area of a pond (Hit-Miss method)
Imagine that you want to buy a little plot of land whose map looks like the one shown
in Fig. 1. The lot is more or less a 100ft by 100ft square in shape, large enough to build
a family home with a marvellous garden, and has been put on the market at a very
competitive price. The only inconvenience is that, as clearly stated in the advert, there
is a “tiny” pond in the middle of the plot. You are convinced that the presence of a small
pond would not be a massive problem on itself – indeed, it could be seen as a small plus
– but the seller has not declared the actual extent of that “tiny pond”. And every time
you ask the estate agent, they minimise the matter, re-stating again that it is just “tiny”,
and that 10.000 square feet is a large plot, and the neighbourhood is fantastic, and house
prices are still rising in this area, and that the new Crossrail station will be no farther
than 300 yards away. In other words, you have the impression that nobody wants to tell
you how “tiny” is that “tiny pond” in the middle of the lot. However, you don’t want to
pay for more land than you will actually be able to use, so you definitely would like to
have a precise estimate of how large that pond is before putting an offer.
If you have walk access all around the lot (let us assume that its area is A), it is
relatively straightforward to obtain a rough estimation of the area I of the pond. The
method works as follows. You can throw a certain number N of little pebbles inside the
lot, and count the number K of them which end up in the pond. You can tell that a
pebble has landed on water by the distinctive “plonk” sound. Now, you can approximate
the probability of one of the pebbles falling in the pond as
p̃ =
K
N
i.e., as the fraction of the N thrown pebbles for which you actually heard a “plonk”.
However, the real probability of a randomly-thrown pebble to fall on water is equal to
p=
1
I
A
A
I
Figure 1: The hit-miss Monte Carlo method can be used to estimate the unknown area
I of a “tiny” pond inside a square plot of land of area A. We throw a certain number N
of pebbles at random into the lot of land, and count how many of them “plonk” into the
pond.
that is, to the ratio between the actual area of the pond and the total area of the plot.
If we equate the two probabilities we get:
p̃ =
I
K
' =p
N
A
=⇒ I ' A
K
N
(1)
which states that the actual area I of the pond can be approximated as the area of
the plot A multiplied by the fraction of the N pebbles which landed on water. This is
probably the simplest Monte Carlo algorithm you can devise, and goes under the name
of Hit-Miss method.
The idea behind the Hit-Miss method is very simple: you try to approximate the
probability of an event by using a realtively small number N of random experiments (the
throw of a pebble in our example). Each experiment can result either in a “hit” (the
pebble falling on water and producing a distinctive “plonk”) or a “miss” (the pebble
lands on solid ground), so that the fraction of hits is a plausible estimate of the hitting
probability. Obviously, the estimate will become exact in the limit of N → ∞, but in
practice a small number of experiments will provide a reasonably good first approximation
answer in most of the cases.
The Hit-Miss method can be used to efficiently compute approximations of integrals
of functions which are otherwise tricky to estimate, as we show in the following section.
5.1.1
Implementation of the Hit-Miss Monte Carlo method
Let us consider the function:
f (x) = sin
2
1
x(3 − x)
x ∈ [0, 3]
(2)
whose plot is reported in Fig. 2. It is evident from the figure that f (x) is pretty smooth
in the middle but varies wildly towards the edges of the interval [0, 3]. The reason is that
2
1
for x ∈ [0, 3]. The integral I =
Figure 2: A plot of the function f (x) = sin2 x(3−)
R3
f (x)dx is, by definition, the area between the x-axis and f (x), and is indicated in
0
grey. We can use the Hit-Miss Monte Carlo method to compute an estimate of I, i.e., by
sampling N points at random in the rectangle [0, 3] × [0, 1] and counting how many of
those points fall in the grey area.
the argument of the sinus diverges when x → 0 and when x → 3, which results in weird
fluctuations. However, we are assured that the integral of the function:
Z 3
1
2
I=
sin
dx
(3)
x(3 − x)
0
that is the area of the plane between the x axis and the curve f (x) (which is indicated
in grey in Fig. 2) must be finite, since it is completely contained in a rectangle of area
A = 3. And in fact, the value of that integral is ∼ 1.19776514 (exact to the 8th decimal
digit).
This situation is in all similar to the case of the little pond we considered before. In
this case, we can obtain an estimate of the integral I by sampling N points uniformly
at random in the rectangle [0, 3] × [0, 1], and counting how many of them fall inside the
grey area. Notice that a point (x, y) is inside the grey area if y < f (x). A possible
Octave/Matlab implementation of the hit-miss method to compute the integral I reads:
function I = weird_hitmiss(N)
%% The area of the sample space is equal to 3
A = 3;
%% This is the function of which we want to compute the
%% integral
f = @(t) (sin(1.0 ./ (t .* (3 - t) ))).^2;
%% sample N abscissas uniformly in [0,3]
x = rand(N, 1) * 3;
%% sample N ordinates uniformly in [0,1]
y = rand(N,1);
%% Count only those points whose y is < f(x)
3
k = 0;
for i=1:N
if y(i) < f(x(i))
k = k +1;
end
end
%% Get the "Hit-miss" Monte Carlo estimate:
I = A * k / N
end
The Hit-Miss method (as any Monte Carlo method, indeed) relies on random sampling,
so each time we run the function weird_hitmiss() we might get a different result. In
fact, if we run the function weird_hitmiss(10000) we obtain a value of the estimate
between 1.19 and 1.20, which is a quite rough approximation of the correct value. If we
increase the number of points, i.e. by considering N = 105 , we usually obtain a value
between 1.93 and 1.99, which is already exact to the first decimal digit. But what can
we say about the convergence rate of those approximations? How many samples do we
need in order to get a given absolute error?
5.1.2
Hit-Miss method: mean and variance
The result of the Hit-Miss method is a random valiable, for which we can calculate the
expected value and variance. If we draw N points uniformly at random from a certain
event space Ω, whose volume is equal to A, and for which the probability of success or
hit is p = AI (where I is, for instance, the area of the curve we are interested in), then
the probability of having exactly K hits is given by the Binomial distribution:
N K
P (K) =
p (1 − p)N −K
(4)
K
Notice that the expected value of the number of hits is :
k
X
N j
E[K] =
j
p (1 − p)N −j = N p
j
j=0
and the corresponding variance is:
V ar[K] = E[K 2 ] − [E[K]]2 = N p(1 − p)
If we consider the random variable IHM , that is the Hit-Miss Monte Carlo estimate
of the integral
Z
K
I=
f (x)dx ' IHM = A
N
Ω
where N is the number of samples and K is the number of hits, we can write:
K
A
A
I
E[IHM ] = E A
= E[K] = N p = Ap = A = I.
N
N
N
A
4
which means that the expected value of the random variable IHM is actually the value
of the integral I we want to approximate. This guarantees that our estimate IHM will
fluctuate around I. We can also calculate the variance (or Mean Squared Error MSE) of
IHM :
A2
I
I(A − I)
A2
A2 I
K
= 2 V ar[K] = 2 N p(1 − p) = 2 N (1 − ) =
V ar[IIM ] = V ar A
N
N
N
N
A
A
N
Notice that the variance of IHM decreases with N , meaning that, if we use a larger number
of points to compute our Hit-Miss Monte Carlo estimate, the random variable IHM that
we obtain will be more tightly distributed around the expected value I. In other words,
as N increases the estimate becomes more and more accurate.
A more interesting quantity is the so-called standard deviation σHM of the Monte
Carlo estimate, which is just the square root of the variance:
p
σHM = V ar[IHM ]
and is proportional to the actual absolute error of the estimate. For the Hit-Miss method
we have:
r
p
I(A − I)
(5)
σHM = V ar[IHM ] =
N
which indicates that the absolute error of the estimate actually decreases with the square
root of the number of samples used to compute it. This is a quite poor rate of convergence:
in fact, if we want to increase the accuracy of our estimate by 10 times (i.e., if we want
the estimate to be accurate to one additional decimal digit), we need to use 100 times
more samples. As we will see in the following, the fact that the error decreases with the
square root of N is not typical of the Hit-Miss method but is instead a common feature
of all the Monte Carlo methods.
In Fig. 3 we report the value of σHM for the estimate
√ of the integral in Eq. 3 as a
function of N . Notice that σHM actually decreases as N when N increases.
It is easy to realise that the performance of the Hit-Miss Monte Carlo method depends
a lot on the relative size of the integral we want to compute with respect to the area
A of the rectangle within which we sample the N random points. The diligent reader
would have noticed tyhat the Hit-Miss method is just a rearrangement of the AcceptanceRejection sampling, for which the rejection ratio is proportional to the difference between
the area A of the candidate function g(x) (which in this case is just a uniform distribution,
multiplied by an appropriate constant) and the area I under the function f (x) from which
we want to draw random samples. And in fact this is evident from the expression of the
standard error σHM in Eq. (5), which is indeed proportional to (A − I).
5.2
Mean-Value method
The Hit-Miss method is relatively simple to implement if we know the functional form of
f (x), but it normally provides pretty bad extimates, unless we use a large a very large
number of points. Let us consider now the function f : R −→ R over the interval [a, b].
By definition, the average (expected value) of the function f (x) in [a, b] is given by:
Z b
1
f (x)dx
(6)
E[f (x)][a,b] = hf (x)i[a,b] =
b−a a
5
1e+0
standard error
theoretical
\sigma
1e-1
1e-2
1e-3
1e-4
1e+2
1e+3
1e+4
1e+5
N
Figure 3: The standard deviation of the Hit-Miss Monte Carlo estimate of the integral
in Eq. (3) decreases as the square root of the number of points used to construct the
estimate. The blu dots indicate the standard deviation of the error computed on a single
estimate for each value of N , while the red line is the theoretical value of the standard
error as per Eq. (5)
If we want to compute the integral:
Z
I=
b
f (x)dx
a
it is worth noticing that, by using Eq. (6), we can write:
I = (b − a) hf (x)i[a,b] .
(7)
This means that the integral I is equal to the product of the average of f (x) in [a, b]
and the size of the interval itself. The only problem is that we don’t know the value
of hf (x)i[a,b] . However, by the same definition of mean value, we can sample N points
{x1 , x2 , . . . , xN } uniformly at random in [a, b], compute f (xi ), ∀i = 1, 2, . . . , N and approximate the quantity hf (x)i[a,b] as:
hf (x)i[a,b]
N
1 X
f (xi ).
'
N i=1
(8)
If we use the approximation in Eq. (8) into Eq. (7), we obtain:
N
(b − a) X
I'
f (xi )
N
i=1
(9)
where {x1 , x2 , . . . , xN } are N uniformly sampled points in the interval [a, b]. This means
that, in practice, we can obtain an approximation of the integral of a function f (x) in
6
the interval [a, b] by computing f (x) in a finite number of (randomly sampled) points in
[a, b]. This is the Mean-Value Monte Carlo method, which can be summrised as follows.
If we want to compute the integral
Z b
f (x)dx
I=
a
1. We sample N points {x1 , x2 , . . . , xN } uniformly at random in [a, b]
2. We compute an estimate of the average value of f (x) in the interval [a, b]:
D
N
E
1 X
˜
f (x) =
f (xi )
N i=1
3. The Mean-Value Monte Carlo estimate of the integral I is obtained as:
D
E
I ' IM V = (b − a) f˜(x)
5.2.1
(10)
Implementation of the Mean-Value Monte Carlo method
Let us consider again the integral in Eq. (3):
Z 3
1
2
sin
dx
I=
x(3 − x)
0
and assume that we want to compute an estimate IM V using the Mean-Value Monte Carlo
method. A possible Octave/Matlab implementation would look as follows:
function I = weird_meanvalue(N)
A = 3;
a = 0;
b = 3;
%% We sample N points uniformly distributed in [a,b]
x = rand(1,N) * (b-a) + a;
%% This is the function of which we want to compute the
%% integral
f = @(t) (sin(1.0 ./ (t .* (3 - t) ))).^2;
%% And this is the Mean-Value Monte Carlo estimate
I = (b - a) / N * sum(f(x));
end
Notice that the function weird_meanvalue() is conceptually very simple, since we
just need to sample N points uniformly at random in the integration interval, compute
f (x) at each of those points, and compute the average of the result. In this case, if we
use N = 10000 points, we get an estimate of the integral between 1.185 and 1.210, but
if we go to N = 105 we get a value of IM V between 1.94 and 1.99, which is a far better
approximation of the real value.
7
5.2.2
Mean-Value Monte Carlo method: mean and variance
As in the case of the Hit-Miss method, the result of the Mean-Value Monte Carlo method
is a random variable, for which we can compute the expected value and the variance. In
particular, for the expected value we have:
#
"
N
(b − a) X
f (xi ) =
E[IM V ] = E
N
i=1
" N
#
N
X
(b − a) X
(b − a)
(11)
E
f (xi ) =
E[f (xi )] =
=
N
N
i=1
i=1
(b − a)
N hf (x)i[a,b] = (b − a) hf (x)i[a,b] ≡ I.
N
which means that the expected value of the Mean-Value Monte Carlo estimate of the
integral I is equal to I. In other words, the random variable IM V will fluctuate around
I. For the variance we have:
"
#
" N
#
N
X
(b − a) X
(b − a)2
V ar[IM V ] = V ar
f (xi ) =
V ar
f (xi ) =
N
N2
i=i
i=1
=
N
(b − a)2
(b − a)2 X
V
ar[f
(x
)]
=
N · V ar[f (x)] =
=
i
N 2 i=1
N2
(b − a)2
V ar[f (x)]
N
However, in general we don’t know the value of V ar[f (x)] (pretty much like we don’t
know the value of hf ()i, otherwise we would have not needed to use a Monte Carlo
estimate of the integral I...). Hence, we can only use an estimate of the actual value of
V ar[f (x)], namely the quantity called sample variance, which is defined as:
=
N
Vg
ar[f (x)] =
N
D
E2
1 X
1 X
fxi − f˜(x)
(f (xi ) − hf i)2 '
N − 1 i=1
N − 1 i=1
D
E
where f˜(x) is the sample average we used in Eq. (10) to compute the Mean-Value
Monte Carlo estimate. Finally we get:
(b − a)2
(b − a)2 g
V ar[f (x)] '
V ar[f (x)] =
N
N
N
D
E2
(b − a)2 X =
fxi − f˜(x)
N (N − 1) i=1
V ar[IM V ] =
(12)
Consequently, the standard error of the Mean-Value Monte Carlo estimate IM V reads:
s
r
p
V ar[IM V ]
Vg
ar[f (x)]
σM V = V ar[IM V ] = (b − a)
' (b − a)
=
N
N
v
(13)
u
N D
E
u
X
2
(b − a) t 1
= √
fxi − f˜(x)
N − 1 i=1
N
8
1e+0
Hit-Miss
HM-theoretical
Mean-Value
MV-theoretical
\sigma
1e-1
1e-2
1e-3
1e-4
1e+2
1e+3
1e+4
1e+5
N
Figure 4: Comparison of the standard error of the Hit-Miss and Mean-Value Monte Carlo
methods. The theoretical value of the standard deviation of the Mean-Value Monte Carlo
method is usually smaller than that of the Hit-Miss method.
√
Again, the standard error scales as N , meaning that in order to obtain an approximation
with an extra decimal digit we need to use a number of points 100 times larger. However,
the Mean-Value method works much better than the Hit-Miss value in practice, the main
reason being that the corresponding variance is normally much smaller.
As an example, we report in Fig. 4 the standard error of the Hit-Miss method and of
the Mean-Value method as a function of N when we use them to compute (again) the
integral:
Z 3
1
2
dx
I=
sin
x(3 − x)
0
Notice that the standard error obtained with the Mean-Value method is normally smaller
than that of the Hit-Miss method.
5.3
Monte-Carlo integrals in many dimensions
The usefulness of Monte Carlo methods is not really clear in the case of integrals of
functions of one variable. Indeed, there exist many integration schemes which allow to
compute integrals of functions of one variable with very high accuracy, and those algorithms normally have a pretty fast convergence rate as a function of the number of
required function evaluations N . For instance, the Simpson’s integration rule guarantees
an error of order N −4 , but in general it works well only if the integrand does not vary
wildly in the integration integral. As a result, one would normally prefer to use a traditional integration scheme (like the Simpson’s rule or the Gaussian Quadrature method)
whenever possible, and to revert to Monte Carlo estimates in all those cases where those
traditional methods would fail, like in the case of Eq. (3).
However, the real power of Monte Carlo methods becomes evident when one has
to compute integrals of functions of many variables. In those cases, all the standard
integration techniques, which are based on dividing the integration domain in a suitable
grid of points, would require an inefficiently large number of computataions. In those
9
cases, the only way to obtain a (normally good) approximation is by using a Monte Carlo
integration method.
The usual choice is the generalisation of the Mean-Value method. Let us assume that
we want to compute the integral:
Z
I=
f (x)dx x ∈ Ω ⊂ Rn .
(14)
Ω
Now, following a reasoning in all similar to the one which ed us to the Mean-Value
method, it is possible to show that the integral in Eq. (14) can be approximated as:
N
V X
f (xi )
I = V hf (x)iΩ '
N i=1
(15)
where {x1 , x2 , . . . , xN } are N points uniformly sampled in the domain Ω, and V is the
volume of Ω. Notice that, in the particular case in which the domain Ω is an interval, we
re-obtain Eq. (10), since the volume of an interval is equal to its length. If the domain
Ω is 2-dimensional, then its volume will be an area. If Ω is a 3-dimensional domain, its
volume will be exactly what we normally mean by the word “volume”, and so on.
Equation 15 is very useful (and used) in practice to compute estimates of integrals
in any dimension. The good news is that, as for the one-dimensional
Mean-Value Monte
√
Carlo method, the error of the estimate will decrease as N , meaning that the accuracy
does not depend dramatically on the dimensionality of the problem.
5.3.1
Example of Monte Carlo integrals in 2 dimensions
The implementation of the Mean-Value Monte Carlo method in many dimensions is pretty
straightforward. Let us assume that we want to compute the integral:
Z 1
Z 3
x2 − 2y 2
I=
dx
dy
2
2
−2
−1 x + 3y + 2
A possible Octave/Matlab function to do that is:
function I = double_integral(N)
%% This is the function we want to integrate
f = @(x,y) (x.^2 -2 * y.^2)./(x.^2 + 3*y.^2 + 2);
%% these are the integration intervals
xl = -2;
xu = 1;
yl = -1;
yu = 3;
%% We sample N points uniformly at random in [-2,1]x[-1,3]
x = rand(1,N) * (xh - xl) + xl;
y = rand(1, N) * (yh - yl) + yl;
%% We compute the volume of the integration domain
10
1
-1
1
-1
Figure 5: The Monte Carlo method can be used to estimate the area of the unit circle (in
grey), by sampling points uniformly at random in the square [−1, 1]×[−1, 1]., Incidentally,
this provides a simple (but quite slow) algorithm to approximate π.
V = (xh - xl) * (yh - yl);
I = V/N * sum(f(x,y));
%% the real value of this integral is -2.312781571802398
end
Notice that the code of the function double_integral() is pretty straightforward.
We start by sampling N points uniformly at random in the rectangle [−2, 1] × [−1, 3],
which is done by sampling N values of the x coordinate uniformly in [−2, 1] and N
(independent) values of the y coordinate uniformly in [−1, 3]. Then we just compute the
integrand at those points, divide by N , and multiply by the volume of the integration
domain, which in this case is just the area of the rectangle [−2, 1] × [−1, 3] and is equal to
(3) × 4 = 12. By running the function double_integral(10000) one obtains estimates
distributed between −2.25 and −2.40. However, by using N = 106 points (which requires
only a fraction of a second), we already get a result betweena −2.308 and −2.315. If we
use N = 108 points (which requires just a few seconds on a modern computer), we obtain
a result between −2.3123 and −2.3129, which is exact up to the third decimal digit. Many
of the standard deterministic integration schemes would require a much larger number of
computations to provide the same level of accuracy.
5.3.2
Example: use Monte Carlo integration to approximate π
Here we show how a multi-dimensional Mean-Value Monte Carlo method can be used to
approximate the value of π = 3.14159265 . . ..
Let us consider the unit circle shown in Fig. 5. The area of the circle is represented
11
by the integral:
Z
1
Z
1
f (x, y)dy
dx
I=
−1
where
f (x, y) =
(16)
−1
1 if x2 + y 2 < 1
0 otherwise
Now, if we sample N points uniformly in the square [−1, 1] × [−1, 1], the integral in
Eq. (16) can be approximated using the Mean-Value Monte Carlo method as:
I ' IM V =
N
N
V X
4 X
f (x, y) =
f (x, y)
N i=1
N i=1
since the volume of the sampling space is equal to 4. However, the circle has radius equal
to 1, so that its area is equal to 12 π = π. Hence we have:
N
4 X
f (x, y)
I=π'
N i=1
This means that a (rough) approximation of π can be obtained by using the Mean-Value
Monte Carlo method to approximate the integral in Eq. (16). A possible Octave/Matlab
implementation of a function to approximate π is thus:
function my_pi = approx_pi(N)
%% We sample N points uniformly at random in
%% the square [-1,1]x[-1,1]...
x = rand(N,1) * 2 - 1;
y = rand(N,1) * 2 - 1;
%% ...and then we use the builtin function "find"
%% to determine which of those points are inside the
%% circle, and put their number in the Mean-Value
%% Monte Carlo formula
my_pi = 4.0/N * length(find(x.^2 + y.^2 < 1));
end
Notice that in the function approx_pi() we have used a combination of the built-in
Octave/Matlab functions find() and length() to find which (and how many) points fall
inside the circle, and we have plugged the result in the Mean-Value Monte Carlo formula.
We should be honest in saying that this is by far one of the slowest existing methods to
compute the digits of π, but it is nevertheless one of the simplest to implement.
5.4
Importance Sampling
We have seen that the Mean-Value Monte Carlo method can be successfully used to
compute approximation of integrals in many dimensions. However, so far we have only
12
20
f(x)
15
10
5
0
0
0.2
0.4
0.6
0.8
1
x
Figure 6: When trying to use the standard Mean-Value method to compute an estimate
−1
2
for the integral of the function f (x) = exx +1
we would get wild fluctuations, depending
on how many of the uniform samples fall close to 0. Importance Sampling can solve this
problem by substantially reducing the variance of the estimate.
restricted our examples to integrals of bounded functions, i.e. to functions which don’t
have divergences in the integration domain. But what happens if the integrand is not
bounded? Let us consider for instance the integral:
Z
I=
0
1
1
x− 2
dx
ex + 1
(17)
and some Octave/Matlab code to compute it using the Mean-Value Monte Carlo method:
function I = strange_integral(N)
%% This is the function we want to integrate
f = @(x) x.^(-1/2)./ (exp(x) + 1);
%% We sample N points uniformly in [0,1]
x = rand(N,1);
%% And we use the Mean-Value formula:
I = 1/N * sum(f(x));
end
If we try out this code using N = 1000 points we will notice that the result we get will
fluctuate wildly between 0.75 and 1.1. If we go up to N = 10000 we obtain values in the
range [0.80, 0.90], which is better but not yet close to the real value of the integral (which
is finite and equal to 0.838932960013, exact to the 12th decimal digit). The problem is
that, even when we go up to N = 107 , the value of the estimates fluctuate quite wildly
between 0.837 and 0.8401, meaning that only the first decimal digit hase been correctly
approximated. But with the other examples that we have seen before, N = 107 has been
normally sufficient to get an estimate exact to the third or to the fourth decimal digit.
What is wrong this time?
13
In order to understand what is happening, we have reported in Fig. 6 the plot of the
−1
2
function f (x) = exx +1
. Notice that the function is in general pretty smooth. However, it
is evident that f (x) has a divergence for x → 0, i.e. limx→0 f (x) = +∞. As a result, the
value of the Monte Carlo estimate obtained by using N points will depend enormously on
how many of those points are “too close” to x = 0. It might well be that, in a particular
simulation, we don’t get that many points close to zero, and the result will be a gross
underestimation of the integral. Conversely, if the number of points close to zero is large,
we will obtain a gross overestimation. If only we could remove that divergence....
In these cases, the Importance Sampling Monte Carlo method comes to the rescue.
The idea is simple. Imagine that we want to compute the usual integral:
Z b
I=
f (x)dx
a
and assume that p(x) is a probability density function over the interval [a, b] such that
p(x) > 0 ∀x ∈ [a, b]. Then we can write:
Z
b
Z
b
p(x)
dx =
f (x)
p(x)
a
a
Z b
f (x)
f (x)
f (x)
=
p(x)
dx = E
=
p(x)
p(x)
p(x) p(x),[a,b]
a
I=
f (x)dx =
This means that, since p(x) is a probability density function, the integral I is equal to
(x)
over the interval [a, b] computed using the probability density
the expected value of fp(x)
function p(x). This result is the main idea behind the Importance
Monte
h Sampling
i
f (x)
Carlo method, which consists in replacing the expected value E p(x) with a Monte
Carlo estimate based on N points drawn in [a, b] accordingly to the probability density
function p(x). The steps of the Importance Sampling method are as follows:
Imagine we want to compute the integral:
Z b
f (x)dx
I=
a
1. Consider a probability density function p(x) over the interval [a, b], such that p(x) >
0 ∀x ∈ [a, b]
2. Draw N random points {x1 , x2 , . . . , xN } from the probability density function p(x)
over the interval [a, b]
3. Compute the approximation:
I ' IIS =
f (x)
p(x)
p(x),[a,b]
N
1 X f (x)
=
N i=1 p(x)
(18)
It is easy to show that the expected value of the Importance Sampling estimate
IIS is equal to the desired interval I, that the associated variance is proportional to
14
1
V
N
ar
h
f (x)
p(x)
i
, and that the standard error is:
p(x)
σIS ∝
v
i
h
u
u V ar f (x)
t
p(x)
p(x)
(19)
N
Let us now use Importance Sampling to get a Monte Carlo estimante of the integral:
1
Z
I=
0
1
x− 2
dx
ex + 1
Since the numerator x−1/2 is responsible for the divergence of the integrand in x = 0, a
good strategy would be to choose p(x) = Cx−1/2 , where C is a constant to be determined
appropriately so that p(x) is a pribability distribution function in [0, ]. Then, we sample
N points {x1 , x2 , . . . , xN } from p(x), and compute the estimate as
I'
N
1
1 X
N i=1 C(exi + 1)
(20)
1
We start by imposing that p(x) = Cx− 2 is a probability density function in [0, 1]:
Z 1
1
1
Cx− 2 dx = 1 =⇒ C =
2
0
1
Now we need a way to sample from the probability density function p(x) = 12 x− 2 , x ∈
[0, 1], for instance by using the inverse function method. We just need to compute the
cumulative distribution function associated to p(x):
Z x
1
F (x) =
p(x)dx = x 2
0
and invert it:
x = F −1 (u) = u2
(21)
Then, we sample N points {u1 , u2 , . . . , uN } uniformly at random in [0, 1], use Eq. (21)
to obtain {x1 , x2 , . . . , xN }, which are random samples following the probability distribution function p(x), and then we compute:
I ' IIS =
N
1 X f (xi )
=
N i=1 p(xi )
1
N
N
1 X xi − 2 2
1 X 2
=
=
N i=1 exi + 1 x− 12
N i=1 exi + 1
i
(22)
A possible implementation of an Octave/Matlab function which computes this estimate is as follows:
15
function I = importance_sampling(N)
%% Sample N points uniformly in [0,1]
u = rand(N,1);
%% Use the inverse function method to
%% obtain samples from p(x) = 1\2 * x^(-1/2)
x = u .* u;
%% Compute the Importance Sampling estimate
I = 1.0/N * sum(2 ./ (exp(x) + 1));
end
The good news is that the estimate obtained by the function importance_sampling()
is exact to the second decimal digit by using just N = 10000 samples, and with N = 107
samples the accuracy is improved to the fourth decimal digit. In this case, Importance
Sampling has produced an enormous decrease of the error of the estimate.
Additional References
More information about Monte Carlo integration can be found in the following references:
1) Computational Physics, Newman, M. E. J. (2013), Chapter 10, Random processes
and Monte Carlo methods. QMUL Library Link
2) Numerical recipes: the art of scientific computing, Press, William H (Cambridge University Press, 2007), Chapter 7, Random numbers. QMUL Library Link
16