REGENERATIVE RARE EVENTS SIMULATION VIA LIKELIHOOD

J. Appl. Prob. 31,797-815 (1994)
Printed in Israel
© Applied Probability Trust 1994
REGENERATIVE RARE EVENTS SIMULATION
VIA LIKELIHOOD RATIOS
S0REN ASMUSSEN,* Aalborg University
REUVEN Y. RUBINSTEIN,** Technion - Israel Institute of Technology
CHIA-LI WANG,*** University of California , Berkeley
Abstract
In this paper we obtain some new theoretical and numerial results on estimation of
small steady-state probabilities in regenerative queueing models by using the likelihood ratio (score function) method, which is based on a change of the probability
measure. For simple GIfG/1 queues, this amounts to simulating the regenerative
cycles by a suitable change of the interarrival and service time distribution, typically
corresponding to a reference traffic intensity Po which is < I but larger than the given
one p. For the ~~1 lM 11 queue, the resulting gain of efficiency is calculated explicitly
and shown to be considerable. Simulation results are presented indicating that
similar conclusions hold for gradient estimates and in more general queueing models
like queueing networks.
QUEUEING THEORY; QUEUEING NETWORKS; REGENERATIVE QUEUEING MODELS
AMS 1991 SUBJECT CLASSIFICATION: PRIMARY 60 K25
1. Introduction
Let {L t } be a stochastic process, like the waiting process of a GI/G/l queue or the
sojourn time process of a queueing network, and 2li be a rare event associated with the
system, say that lP(al) is of the order of 10- 4 or less. For example, @I could be the event
f!Jl} = {max, ~ T L, > x} that the waiting time of a customer in the busy cycle exceeds
some large x, or the event al 2 = {L > x} that the steady-state waiting time of a customer
exceeds x (here L is the limit in distribution of L t ) . For such cases the crude Monte Carlo
method (CMC) runs into difficulties because the relative error
JVar I(Pll) ~
1
IP( Pll)
~ JIP( Pll)
Received 21 July 1992; revision received 21 June 1993.
* Postal address: Institute of Electronic Systems, Aalborg University, Fr. Bajersvej 7, DK-9220
Aalborg, Denmark.
** Postal address: Faculty of Industrial Engineering and Management, Technion - Israel Institute of
Technology, Haifa 32000, Israel.
Research supported by the Technion V.P.R. Fund - B.R.L. Bloomfield Industrial Management R.F.
*** Postal address: Department of Industrial Engineering and Operations Research, University of
California, Berkeley, CA 94720, USA.
797
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
798
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
is the indicator of the event ~) increases as IP(~) becomes small, and therefore
enormous sample sizes may be required.
The typical approach of rare events simulation is to use importance sampling (IS), see
e.g. Glynn and Iglehart (1989). In IS, one performs the same experiment, but with a
different governing probability measure. According to IS theory, the optimal change of
measure is given by IP( · I 91). In general, however, such probability measure is infeasible
since IP(~) is unknown. For the GIIG/I example, IP(· I [JII) can, however, be described
asymptotically (Theorem 5.1 of Asmussen (1982» as the governing probability measure
for a GIIG/I queue with interarrival time and service time distributions derived from
the given one by an exponential family transformation (for example in MIM/l, this
amounts to switching the service rate and arrival rate). This approach generalizes to
Markov-modulated models (see Asmussen (1989), for an early example) but for more
complex queueing networks or, say, manufacturing or reliability models, the asymptotic
form of IP( · I [JI I) is usually not available; even in the rare cases where IP( · I [JtI) can be
described, the structure may be much more complicated than for the underlying measure
(p (e.g. the Markov property does not necessarily carry over, see for example Goyal et ale
(1992». For some relevant references, see Bucklew (1990), Bucklew et al. (1990), Cottrell
et ale (1983), Frater et al. (1989) and Parekh and Walrand (1988).
It may therefore be appealing to perform a naive change of measure by simply
choosing a probability measure similar to the given one, in the sense ofchanging only the
parameters of the measure (say the traffic intensity) in order to make the rare event more
likely. The purpose of this paper is to show that such a parametric change of measure
typically leads to very substantial variance reduction compared with the CMC method.
More specifically, we present explicit calculations for the variance of the regenerative
likelihood ratio (LR) estimators under such a parametric change of measure for the
MIM/l queue, show very substantial variance reduction and illustrate empirically that
this also seems to be the case for rather broad queueing network settings.
The setting in which we work is that of rare events like 912 in the steady state, which
has been comparatively less studied. In this case the CMC method does not apply
directly, so one needs to find a representation of IP(L > x) involving quantities which
can be estimated from a finite simulation run. For the GIIG/I queue, one such
representation is IP(L > x) = lP(w(x) < (0), where w(x) is the first passage time of an
associated random walk to level x. Siegmund (1976) showed how to estimate this
efficiently by the same exponential family transformation as above (the drift changes
from being negative to being positive, so w(x) < 00 almost surely with respect to the
changed measure). Also this method seems restricted in its generality. We shall therefore
concentrate on the regenerative representation
(I(~)
r
IE
(1.1)
L I{Lt>x}
IP(L >X)=_i-_l_ _
IEr
which holds for a large class of stochastic models. Here r is the generic cycle (the
standard choice for the GIIG/I queue is the busy cycle), which is assumed to be
aperiodic with finite mean. For convenience we only consider the discrete time case.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
799
Regenerative rare events simulation via likelihood ratios
We assume that the output process {L t: t > O} has the form L, = Lt(ft), where
1't =(YI , Y2'···'~) and Y I , Y2,···, is the input sequence of independent identically
distributed (i.i.d.) m-dimensional random vectors generated from a density f(y, v)
depending on the n-dimensional parameter vector v. Then L is well-defined, and we let
l(v) = l(v, x) = IP,,(L > x). Further, we assume absolute continuity of f(y, Vol with respect to fly, v). In this case (1.1) can be written as
r
l(v)
(1.2)
IE~
=
L
L Wt
IE~
where
It = I{Lt>x}
and the LR
~
[tWt
l
t=t
t =- I
up to time t is
fI f(Y;, v)
w=
t
i
=-
I
f( Y;, vol ,
(e.g. Glynn and Iglehart (1989) and Rubinstein and Shapiro (1993».
The idea of the LR method for estimating rare events in this setting is to generate a
sample Y t , t = 1, 2,· · · from fCy,Vol rather than from It», v), simulate N regenerative
cycles Y I 1, · • ., Yr l l , · • ., YIN,· • ., Y r NN and use
N
(1.3)
ti
LLlt;Wt ;
-
IN ( v) =
1
1
--:N:-:---"T
.--
L~Wt;
1
1
as a consistent estimator of I(v). Where it is more convenient, we use IN(v, Vo, x) instead
of IN(v) when referring to I(v) = E"I{L>Xl' and we denote
a 2(v, Vo, x) = lim Var voIN(v, vo, xl/No
N-oo
We shall show that typically one can choose Vo such that IP~(L > x) » IP,,(L > x), (that is
under fCy, vol, the event {L > x} is not rare), and such a defined LR estimator IN(v, Vo, x)
leads to a very substantial variance reduction. We define the efficiency of the LR
estimator &(v, vo, x) relative to its standard counterpart &(v, v, x) as
(1.4)
_ a 2(v, vo, x)f" r
e(v, vo, x ) -
2
a (v, u; x)E~ r
•
Note that the LR estimator is the more efficient of the two (variance reduction is
obtained), if e(v, Vo, x) < 1, provided u, Vo and x are fixed. Note also that (1.4) is a
measure ofthe efficiency given in terms of variance/customer rather than variance/cycle.
Our main theoretical result, given in Section 2, is an explicit formula for a 2(v, Vo, x) in
the special case ofthe M/M/! queue where one may identify v with the traffic intensity p.
The derivation is an extension of a parallel analysis of Asmussen and Rubinstein (1993)
for estimating E", but involves substantially more calculation. In Section 3, we use this
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
800
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
formula to discuss how to choose the reference traffic intensity Po efficiently. Based upon
numerical calculations, tables and figures our finding is that a 'good' Po should be
moderately larger than p. In Section 4 we consider open queueing networks and show
numerically that the optimal vector of traffic intensities v~ , that is, the solution of the
program
min., e(v, vo, x),
( 1.5)
voE V
can be approximated rather well if we minimize the variance of the likelihood ratio
estimators with respect to the parameters of the bottleneck queue alone, provided such a
bottleneck queue exists.
2. The MIMI 1 queue
Let {L t } be the waiting time sequence in the MIMII queue with arrival rate p and
service rate is. By an easy scaling argument, it follows that for the sake of varying the
parameters of the model, we can think of p as fixed and only vary ~. Thus, we can
identify v with ~ or, equivalently, with the traffic intensity p = p/~.
Our main result is the following.
(2.1)
where
Var X
&0
=
zp*p*
exp( 1 _ p*p*
~*(1
- p*p*)x)
X{I+2lE or ( 1 + Po(L > X) +
p*)_
1 - p*p*
2p(1+p)
}
- P)x) - p2
exp«~
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
801
Regenerative rare events simulation via likelihood ratios
Cov<So(X, Y)
zp*p*
exp( 1 - p*p*
=
X
~*(1
- p*p*)x)
{I + lEor [px + 2+ 1E6'l" 1?6(L > x) + 1 -2p*p*p*J- exp«l5pO- +P)x)p) - p2}
zlEorlPo(L > x)
+----I-p
X{
l5 * - l5
pp*
(1 - exp( + P - pp*
- (lEor )2IPo(L
(~*
-l5
p -}
pp*)x» - - 1 - p*p*
+P -
> x).
We first state and prove some preliminary results that will be useful in the proof of the
theorem.
Lemma 2.1.
Let
~
= a(L 1, ·
• .,
L t ) . Then
IET{P(Lt
IE (
±
s-t+l
I{L.>x}
I~) =
-
==g(Lt ) ,
x) + 1 + I?(L > x)} -
t),
+ p)
- P)x) - p
2
if L, > x;
IErlP(L >x)(exp«l5 - P)L t )
== h (L
p(l
exp«~
-
p)/(l - p)
if L, < x.
Proof. Assume first L, <x. We can write L, = L~_1 Ys , t ~ r , where Yi = ~ - T;
with Vi '- exp(l5) and T; '- exp(p). There exists a unique 00 * 0, such that lE(exp(OoY1» =
1, and from Wald's identity we then get lE(exp(0oLN» = exp(OoY), where N =
inf {n : L; $. (0, x)} and y = L o. This implies
I
I
exp(OoY) = lE(exp(0oLN) L N ~ O)lPy(LN ~ 0) + lE(exp(OoLN ) L N ~ x)lPy(L N ~ x).
Furthermore, because the overshoot is also exponentially distributed,
lE(exp(0oLN) IL N ~ 0) = lE(exp( - OoT»
lE(exp(0oLN) IL N ~ x) = lE(exp(Oo(x
Hence
+ V)) =
p
= -- ,
p + 00
s
exp(Oox) - - .
l5 - 00
p
IP (L ~ x) =
y
N-
exp(OoY) - - -
p + 00
~)
p
exp(Oox) - - - - l5 - 00
P + 00
(
Since Vand T are independent,
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
802
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
1 = lE(exp(OoYI » = lE(exp(OoV)IE(exp( - OoT» = _6
6 - OoP
P_ .
+ 00
Pand
Performing some algebraic manipulation yields 00 = 6 -
P (LN~X)=pexp«c5-p)y)_p2 .
y
exp«t5 - P).x) - p2
(2.2)
Now, letting a be the expected number of customers s
IE(L:_t+1 I{L,>x} Ii; = y) as a lPy(LN ~ x). Hence
IE(
~
r with L, > x, we can express
± I(L,>x}ILI<X)=IPLt(LN~X)a
$-t+1
IE (
= IP (L
(2.3)
t;
~
± I(L,~xd L = 0)
1
x) __
s -_1_+_ I
IPO(L N ~ x)
N -
_
= IP (L ~ x) IErlP(L > x) .
L
t
N -
IPO(LN ~ x)
When y = 0, we obtain from (2.2) that
»
=
IP(L
o N=X
(2.4)
p(l-p)
·
exp«d - P)x) - p2
Thus, by plugging (2.2) and (2.4) into (2.3) we have shown the result for L, <x.
Assume next that L, > x. Let B I' B2, • • • be the durations of the consecutive periods
that {L t } is above level x within one busy cycle. Then
T
IE(B I
+ B2 + · · ·) = IE L
I{Lt>x}
I-I
= IErlP(L > x).
Therefore, given L, > x, we can write IE(L:_t+1 I{L,>x}) as IE(BI* + ~* + · · .), where from
the memoryless property the distribution of Bf is only different from that of B;for i = I.
Trivially
IE (
± I{L,>x} IL > x) =
1
$-t+1
IE(BI*) + IE(B1 + B2 +.
· · ·)-
IE(B1) ·
We can further apply Wald's identity to obtain
E(Br)=1E (
(2.5)
L
T
$-1
)
llLo=L t -x =
L I -x + (I/P)
=lEr[p(L I -x)+ 1]
(I/P)-(1/6)
=f(Lt,X).
IE(Bt)
= Po(LN >x)
roo
p(l +p)
Jx c5exp(-c5(y-x»Jty\x)dy= exp«c5-p)x)_p2'
Hence,
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
803
Regenerative rare events simulation via likelihood ratios
(2.6)
IE(B~
+ B 2 + · .. ) = IEr{p(L
x)
t -
+ 1 + IP(L >x)} -
p(1 + p)
2
exp«~ - P)x) - p
•
Define the rank R, ofa customer as his index within the cycle, R 1 = 0,· · ., RT = r - 1,
R T =0,···.
Lemma 2.2. Define the probability measure IPz by dlPzldlP = zRllEzR. Then
IPz(L > 0) = pp, and the conditional distribution ofL given L > 0 is exp(~(1 - pp».
Proof. It is shown in Asmussen and Rubinstein (1993) (by specializing results of
Shalmon and Kaplan (1984» that
IE[ZR exp( - sL)] =
(1 -
p) (1
+
s-
Pp
~(pp
).
- 1)
Letting s = 0, we get
IEzR = 1 - P .
(2.7)
I-pp
By integration by parts, we have
pp(1 - p)
I-pp s
1
+~(1
-pp)
Since the left-hand side is the Laplace transform of IEzRI(L >x), it follows that
IEZ
(2.8)
R[
_ pp( 1 - p)
{L>x} -
I-pp
e
-4(1-pp)x
,
x~o.
In particular,
For the second statement, we get
IEze -sL = _1_ {fzRe -sL = _1_ (1 - p)
IEzR
IEzR
=(l-pp)+pp
(1 +
S -
Pp
~(pp
)
- 1)
o(1-pp) =P'(L =O/+Pz(L >0/ o(1-pp) ,
s +<5(\ -pp)
s +<5(\ -pp)
From this the result is obvious.
We are now ready to prove the theorem.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
804
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
Proof of Theorem 2.1. Note first that (2.1) is a well-known formula for ratio
estimators (see for example Asmussen and Rubinstein (1993».
To calculate Yardo Y, we present the second moment of Yas
IEdo y 2 = IEdo
T
T
L Wl + 2IEcJo L
1-1
1-1
T
~
L
s-I+l
~.
Here
Wl = (~j' exp( -
(15* - Jo)(U\
+ · · · + U,»z' =
W;,.s-I60z'
==*
15* = 215 -
Jo,
a2
a*ao
Z=-.
Then
T
IEdo
L
1-1
T
Wl=lEdo
L
1-1
T
~,6*16oZ'=1E6*
L
1-1
T
Z'=zIE6*
L
1-1
ZR,
(2.9)
(2.10)
=
IEdo
L wtun; + l)lEi r
T
1-1
= zIE6*r lEd*(pL
=
(from (2.5»
+ 1)zRlEdr
zIEJ*rIEJr(fJIEJ_Lz R + IEJ*ZR),
using (2.7) and
(2.11)
IEzRL = _ d(lEzRe-sL) I
= (1 - p)pp .
ds
s-o <5(1 - pp)2
Therefore, by noting that IEdoY = IEJ ! and collecting terms as (2.11), (2.7), (2.9) and
(2.10), the results follow immediately.
To calculate Yardo X, we need the following calculation based upon Lemma 2.2:
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
805
Regenerative rareeventssimulation via likelihoodratios
IEzRLI{L>x} = IEzRlEzLI{L>x}
100 Pz(L > O)yo(1 - pp)exp( - 0(1 - pp)y)dy
=pp(1-p) 100 oyexp( -o(1-pp)y)dy
= IEz R
(2.12)
=PP(I- P ) ( X +
1
)eXP( - J( I - PP )X ).
I-pp
J(I-pp)
As above,
IE60 X 2 = 1E60
T
T
L Wi I{Lt>x} + 2IEcSo L
t-I
T
t-I
~I{Lt>x}
L
s-t+1
~I{Li>x}
where
T
(2.13)
1E60
L W;I{Lt>x} =
t-I
zlEcS*rlEcS.zRI{L>x} ,
and
= IEdo
(2.14)
T
L
t-I
W; I{Lt>x} g(Lt )
The rest is then only a matter ofalgebraic manipulation ofLemma 2.1, (2.12), (2.13) and
(2.14).
To calculate Covdo(X, Y), we need the following result:
IEcS.zRI{L<X} exp«J - P)L)
(2.15)
=
IEcS.z RlEzI{L <x} exp«J - P)L)
=
lE.,.zR
J:
exp«o - P)L )PAL > 0)0*(1 - p*p*)exp( - 0*(1 - p*p*)L )dL
= p*p*(1 - p*)
IX 0* exp( - (0* - 0 + P - Pp*)L )dL
Pp*(1 - p*)
- - - - - ( 1 - exp( - (J* - J
J* - J + P - PP*
+ P- pp*)x».
To finish the proof of the theorem note that lEao X Y can be written as
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
806
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
f
IEcJoXY == EcJo
L
t-l
(2.16)
== Ecfo (
f
W,1{L,>x}
±W;
t-l
L
t-l
I{L,>x)
~
+
±W,
t-l
I{L,>x}
± Jv. + ±w, ± Jv.
s-t+l
t-l
s-t+l
!r.Li>X I ) ,
where
r
f
IEcfo
L
t-l
W t I{L,>x}
L
s-t+l
~ == lE.Jo
r
L
t-l
W,2I{L t>x} (jJL,
+ 1)~ r
(2.17)
and
f
== Ecfo L
(2.18)
t-l
W;{g(Lt)I{Lt>x}
+ h(Lt)I{Lt<x}}
== zEd*rEd*zR {g(L)I{L>x} + h(L)I{L<x}}.
Note that the first term in the parenthesis of(2.18) was already calculated in VarcSo X. For
the second term we have
Z 1Ed* r 1Ed* z R I{L <X} h(L )
(2.19)
The result is then obtained by substituting (2.15) into (2.19) and following the same
calculation as before.
Remark 2.1. The figures and tables in the next section are based upon the exact
expressions given by Theorem 2.1. However, by rough approximations one can derive a
simplified expression
zp*p*(1 - p)3
p(3 - 2p + p2)( 1 - p*p*)
eip ; Po, x) ~ - - - - - - - - -
(2.20)
X[1+
2(1 - p*p* + p*)]
(1 - p)(1 - p*p*)
expt - d* + Pp* + t5 - P)x),
which we found to be rather precise in the numerical examples we studied. This follows
since for large x, IPd(L > x) is very small, we have IEcSoX == IEdrlPd(L > x) approaches O.
Thus, it readily follows from (2.1) that the major contribution to a 2(p, Po, x) is from
Varcfo X/(lEcSo y)2, that is
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
807
Regenerative rareeventssimulation via likelihoodratios
a 2(p, Po, x) ~ Varc50 XI(lE c50 y)2
::::::: zp*p*(1 - p)2 exp( _ 0*(1 - p*p*)x)
I-p~*
{I + 2EcJf (I +
p*
)} .
I-p~*
Similarly, when Po = p, z = p = 1 and
a 2(p, p, x) ~ Var, XI(lE cJ y )2
:::::::p(l - p)2 exp( _ (0 - P)x)
I-p
{I + 2IEcJf (I + _P-)}
.
I-p
Approximating IE~ fiE" r by I, (2.20) follows.
It is suggested that one can get more variance reduction by employing larger c5*, i.e,
smaller tSo• Ifwe let <50 be a weighted combination of p and tS, namely,
<50 =
p + (<5 -
p)exp( - 0.05x),
and plug it into (2.20), though it is rather empirical, we can still have a rough
approximation of the magnitude of the relative efficiency e.
For example, let tS = 2, P = 1, and x =
which corresponds to PcJ(L > x) ~ 10- s.
We obtain
II,
tSo = I
+ e-o. ss =
1.58, <5* = 2.42,
p* = 0.41,
Z
= 1.05
and
p* =
I +p*-J(I +p*)2-4p*z
2p*
~
1.08.
This yields etp, Po, x) ~ 0.018. That is, using <50 = 1.58, we obtain a variance reduction of
approximately 60 times.
If we take x = 27 instead of x = II, which corresponds to PcJ(L > x) ~ 10- 12, we
obtain Jo = 1 + e-1.3S = 1.26. This yields e(p, Po, x) ~ 5 X 10- 6 •
3. How to estimate rare events for simple queues
We first consider the MIMII queue in the notation of Section 2. Based on Theorem
2.1, Table 3.1 displays the optimal value of the reference parameter p~ and the optimal
relative efficiency e$O as functions of the probability IP,(L > x) for p = 0.3, 0.6 and 0.9.
It is seen that indeed a substantial variance reduction is obtained, and that the optimal
Po is moderately larger than p. This can be understood in terms of importance sampling
as follows: When estimating a rare event l(P), the main contributions typically come
from the cycles which are much longer than average, and when choosing a reference
parameter Po > P greater weight is assigned to large cycles. The reason that the optimal Po
is not substantially larger than p is that a too large Po will blow up the variance on the
estimate of l(P).
Figures 3.1-3.3 display the relative efficiency e°(p,Po, x), as a function of Po for
p = 0.3, 0.6, 0.9 and P,(L >x) = 10- 6 •
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
808
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
TABLE 3.1
The optimal value P~ of the reference parameter Po and
the optimal relative efficiency e* as functions of
l(v) = Pv(L > x) for P = 0.3, 0.6 and 0.9
~
(,)
P
Pv(L >x)
pt
0.3
10- 1
10- 3
10- 5
10- 7
10- 9
10- 11
10- 13
10- 15
0.395
0.52
0.56
0.58
0.591
0.600
0.606
0.610
7.22£ 9.51£ 9.82£ 9.66£ 9.30£ 8.84£ 8.35£ 7.86£ -
01
02
03
04
05
06
07
08
0.6
10- 1
10- 3
10- 5
10- 7
10- 9
10- 11
10- 13
10- 15
0.711
0.777
0.795
0.804
0.811
0.815
0.818
0.820
5.78£ 9.84£ 1.57£ 2.46£ 3.81E 5.84£ 8.91£ 1.36£ -
01
02
02
03
04
05
06
06
0.9
10- 1
10- 3
10- 5
10- 7
10- 9
10- 11
10- 13
10- 15
0.937
0.951
0.955
0.958
0.959
0.960
0.9605
0.961
5.36E 1.22£ 2.58£ 5.34£ 1.09£ 2.22£ 4.50£ 9.09£ -
01
01
02
03
03
04
05
06
e*
0.1
c::
·u
CIJ
~
UJ
0.01
0.00 11----+--+--+-~~__+_-_+_-t___+-....+_____I
o 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Figure 3.1. The relative efficiency e°(p,Po, x), as a function of Po for p
=
0.3 and IPJI(L > x) = 10- 6
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
Regenerative rareeventssimulation via likelihoodratios
809
0.01
0.001 "t-----+------lI-----+-----+-----4
0.9
0.5
0.8
0.6
0.7
Figure 3.2. The relative efficiency e°(p,Po, x), as a function of Po for P = 0.6 and I?,,(L >x) = 10- 6
0.01
0.001 +------f----+----+---t-----+----1
0.9
0.85
0.875
0.925
0.95
0.975
Figure 3.3. The relative efficiency e°(p,Po, x), as a function of Po for p = 0.9 and IP,,(L > x) = 10- 6
It can be seen from the above figures that for fixed P there exists a rather broad interval
where variance reduction is achieved, that is etp, Po, x) < 1. The shape of the variance
reduction region Q = {(P, Po) : eip, Po) < I} and the optimal value p~ for different values
of e are depicted in Figure 3.4. It follows from Figure 3.4 that p~ is allocated inside the
region Q, which in tum is quite large, provided e is not very small, say e < 10- 2 • This
means that for fixed e we have good performance over a reasonably large neighborhood.
Although the MIMII setup is quite specialized, we shall now show (via simulation)
that the above results shed some light on and provide basic insight into more complicated queueing models and problems, in particular how to choose a 'good' set of
reference parameters and on sensitivity analysis. We let Vl(v) denote the derivative of
l(v) with respect to v, write VOI(v) = l(v) and Vk[N(P, Po, x) for the estimator ofVkl(v) for
the relevant LR identities and expressions, see Rubinstein and Shapiro (1993».
Table 3.2 displays point estimators V kIN(P, Po, x), k = 0, 1, the associated sample
variance denoted fJ2(p, Po, k, x), k = 0, 1 along with the point estimator lPPo (L > x) of
l(po) = IPPo(L > x) as functions of Po = pIJo for the MIMII queue with P = I/J = 0.3,
(jJ = 1) and x = 4, while simulating N = 4 X 106 customers.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
810
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
1.0.....----------------::-:::=-=-==:::=r:~~
0.9
e= 1
e = 10- 1
e = 10- 2
e = 10- 3
0.1
0.0"::·
0.0
p*
--- ----'T'"---r---r---r---r---r--~---....---t
0.1 0.2 0.3
0.4
0.5 0.6
0.7 0.8
0.9
1.0
Figure 3.4. The variance reduction region n and the optimal value p~(P) for
different values of e and IP(L > x) = 10- 6
TABLE 3.2
vkIN(P,po, x), k = 0,1 and fJ 2(p, po, k, x), k = 0,1 as functions of Po for the MIMI 1 queue
with p = 0.3 and x = 4
Po
P/IO(L > 4)
IN(P,po, x)
0.3
0.4
0.5
0.55
0.58
0.625
0.715
0.8
0.9
0.00
0.003
0.01
0.021
0.030
0.043
0.16
0.29
0.48
0.00
2.3£ - 5
2.7£ - 5
2.5E - 5
2.6£ -4
2.5E - 5
2.6£ - 5
2.6£ - 5
2.5£ - 5
a2(p, Po, 0, x)
0.00
4.2£ 3.7£ 2.5E 2.3£ 2.4E 4.1£ 4.9£ 6.4£ -
12
13
13
13*
13
13
13
12
V IN(P, Po, x)
a2(p, Po, 1, x)
0.00
-0.09£ -4
- LIE - 4
-1.1E-4
- 1.0£ - 4
- LIE - 4
-1.1£-4
-1.1£-4
- 1.0£ - 4
0.00
5.2£ - 12
3.2E -12
3.1E - 12
3.0£ - 12*
3.2E - 12
4.4£ - 12
6.2£-11
7.8£ -11
Table 3.3 displays data similar to that of Table 3.2 for the MIGII queue withp = 0.6
and x = 12, while simulating N = 4 X 106 customers. We assumed that the interarrival
rate equals .1, the service time distribution is gamma (denoted r(p, A»; we chose
(P, l) = (2, 3.333), took the scale parameter A.o = Vo as reference parameter and as before
simulated 4 X 106 customers.
Based on the results of Table 3.3, Figure 3.5 depicts IN(P, Po, x) along with the 95%
(relative) confidence intervals CI as a function of Po.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
811
Regenerative rareeventssimulation via likelihoodratios
vkIN(P, Po, x), k
=
Po
P/1e(L> 12)
0.60
0.65
0.70
0.75
0.77
0.80
0.82
0.85
0.90
0.00
0.001
0.002
0.003
0.005
0.013
0.023
0.045
0.15
TABLE 3.3
0, 1 and fJ2(p, Po, k, x), k = 0, 1 as functions of Po for the MIGII queue
with P = 0.6 and x = 12
a2(p, Po, 0, x)
IN(P,po, x)
0.00
3.7£ 6.4£ 6.8£ 6.9£ 6.7£ 7.1£ 7.5£ 6.5£ -
0.00
3.6£ - 13
3.3£ - 13
2.1£ - 13
LIE - 13
3.6£:- 14*
5.6£ - 14
1.4£ - 13
2.2£ - 13
6
6
6
6
6
6
6
6
a2(p, Po, 1, x)
V IN(P, Po, x)
0.00
- 3.87£ - 5
- 8.6£ - 5
- 9.0£ - 5
- 9.1E - 5
- 9.0£ - 5
- 9.5£ - 5
- 9.4£ - 5
- 8.9£ - 5
0.00
6.8£ - 11
4.7£ - 11
4.9£-11
3.9£ - 11
5.2£ - 12*
1.0£ - 11
1.4£ - 11
2.1£-11
0.000009
0.000008
0.000007
0.000006
0.000005
0.‫סס‬OO04
I
0.000003
0.000002
0.000001
I
I
I
I
r ."..'
0-----0-
A._-(;--~
--."....,
,,
~
I
I
,.AS
0u---t---t---+----tl---+---+_ _....._~
0.6
0.65
0.7
0.75
0.77
0.8
0.82
0.85
ro
0.9
Figure 3.5. Performance of the estimator IN(P, Po, x) (solid line) as a function of Po for the
= 12. The dotted lines indicate 95% confidence intervals.
MIGII queue with P = 0.6 and x
The results of the tables are self-explanatory. It is important to note that under the
original p.d.f. fCy, v), all estimators in Tables 3.2 and 3.3 result in zero. Similar tables
were obtained while using as reference parameters either the interarrival rate or service
rate, or both.
We also found from our extensive simulation studies with simple queues, like G//G/l,
that:
1. Under the original p.d.f. fCy, v), the standard regenerative estimator IN(V, v, x)
typically understimates the underlying quantity f»p(L > x), provided f»p(L > x) ~ 10- 2,
and when estimating rare events, say when f»p(L > x) ~ 10-4, the estimator IN(V, v, x) is
useless (in all cases IN(V, o, x) resulted in zero, provided the sample size N ~ 4 X 10 6) .
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
812
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
2. In order to be on the safe side, it is desirable to choose the reference traffic intensity
Po moderately larger than p", where pC by itself must be chosen moderately larger than p.
We also found that a 'good' reference traffic intensity Po for estimating rare events can
serve simultaneously as 'good' reference traffic intensity for estimating standard performance parameters, like the waiting time. For example, using p"t = 0.80 as a reference
traffic intensity for estimating a rare event in the MIG/l queue with P = 0.6 (see Table
3.3), we can, at the same time, improve the efficiency (accuracy) of its standard
estimators. For more details see Rubinstein and Shapiro (1992).
4. How to estimate rare events for queueing networks
We now discuss how to choose a 'good' vector of reference parameters vo for open
non-Markovian queueing networks with the output process L, typically like the sojourn
time and the total number of customers in the network. As in Section 3 we show
empirically that under the probability measure f(y, vo) the variance of the rare event
estimator is substantially smaller than the variance of the estimator under the original
probability measure f(y, v).
We first present simulation results for the following two simple queueing models: two
queues in tandem and two queues with a feedback on the second queue.
Note that in both queueing models one of the nodes is more congested (works under
heavier traffic) than the other. Using common terminology, we shall call the congested
queue the bottleneck queue. As we shall see below, the bottleneck queue plays a crucial
role when choosing a 'good' (optimal) vector of reference parameters.
Table 4.1 presents point estimators IN(P, Po, x) of l(p, Po, x) = I?p(L > x), the associated sample variance denoted fJ2(p, Po, x) along with the point estimators I?PO(L > x)
of I (Po, x) = IPPO(L > x) as functions ofA.o2 and A.o3 for two GIIG/I queues in tandem with
PI = 0.2 and P2 = 0.3. We assume that x = 3.5, the interarrival and two service times
distribution are gamma (denoted ri(fJi, Ai)' i = 1, 2, 3). We chose (fJh AI) = (4, 1);
<P2' A2) = (4, 3.333); and (fJ3' A3) = (4, 1.667), took £'0 = Ao = (~2' ~3) as the reference
vector and simulated 4 X 106 customers. Note that fJ2(p~) and fJ2(p~) correspond to the
optimal reference vector p~ = (P~I ,P~2) and to p~ = (PI, Po~), respectively. Here Po~
corresponds to the optimal solution of (1.5) with respect to P02 alone and PI is fixed,
namely PI = 0.2.
Table 4.2 presents data similar to those of Table 4.1 for the two-node queueing model
with feedback on the second queue. We chose the feedback probability p = 0.3; assumed
that the interarrival and two service time distributions are gamma (ri(fti, Ai), i = 1,2,3);
chose (fJI, AI) = (2, 2); (fJ2' A2) = (2, 6.666); (fJ3' A3) = (2, 6.666), took Vo = Ao = (Ao2' Ao3) as
the reference vector, and simulated N = 4 X 106 customers with x = 5 and x = 6,
respectively.
It follows from these tables that the optimal vectors of traffic intensities Po· = (P~I' P62)
and the minimal value (J2(p, p~, x) are essentially insensitive to the parameters of the
non-bottleneck queue in the sense that both P!' = (P~I , P~2) and (J 2(p, Po·, x) can be
approximated rather well if we minimize the objective function (J2(p, Po, x) with respect
to the reference traffic intensity of the bottleneck queue alone.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
813
Regenerative rareeventssimulation via likelihoodratios
TABLE
4.1
IN(P, Po, x), and fJ 2(p,Po, x), as functions of Ao2 and Ao3 for two G1/ G/1 queues in tandem
with PI = 0.2 and P2 = 0.3
a2(p, Po, x)
POI
P02
IPPo(L > 3.5)
0.2
0.2
0.2
0.2
0.2
0.2
0.6
0.66
0.71
0.77
0.85
0.9
0.03
0.08
0.13
0.22
0.35
0.56
7.7£ 7.3£ 7.5£ 7.1£ 7.2£ 6.9£ -
5
5
5
5
5
5
2.7£ -11
2.4£ - 11
2.3£ - 110
7.6£-11
9.2£ - 11
2.7£ - 10
0.22
0.22
0.22
0.22
0.22
0.22
0.6
0.66
0.71
0.77
0.85
0.9
0.03
0.08
0.14
0.23
0.36
0.58
7.6£ 7.3£ 7.5£ 7.2£ 7.2£ 7.0£ -
5
5
5
5
5
5
2.6£ 2.2£ 2.3£ 7.8£ 9.4£ 2.9£ -
0.24
0.24
0.24
0.24
0.24
0.24
0.6
0.66
0.71
0.77
0.81
0.9
0.03
0.09
0.15
0.24
0.37
0.59
7.5£ 7.2£ 7.4£ 7.2£ 7.1£ 6.8£ -
5
5
5
5
5
5
2.6£ - 11
2.4£ - 11
2.5£ - 11
7.8£-11
9.5£ -11
3.7£ - 10
fv(P,Po, x)
11
11*
11
11
11
10
Consider, for example, the results of Table 4.1. We have, respectively,
p~ =
(Ptl , Pt2) = (0.22, 0.66),
and
p~ = (POI, pS) = (0.2, 0.71).
Note again that a2(p~) corresponds to the optimal reference vector p~ = (Ptl' P~2)' while
a2(p~) corresponds to p~ = (POI, pS), that is, where POI = PI is fixed, namely PI = 0.2. It
follows from the comparison of the above a2(p~) and a2(p~) that a2(p~) approximates
a 2(P~) rather well, and thus, we can adopt p~ as a 'good' vector of reference parameters.
Our extensive simulation studies support this phenomenon also for general open r-node
queueing networks.
Note, however, that it seems crucial that we consider a performance measure (like the
mean sojourn time) which is sensitive to the parameters of the bottleneck queue. Thus,
for performance measures like the mean sojourn time at a non-bottleneck node the story
may well be quite different.
Note also that we do not claim to present a universal recipe for how to perform the
change of measure in all possible settings. For simple queues of GI/G/l the crux is
obviously to make the service time distribution B stochastically larger (or at least more
heavy-tailed) and/or the interarrival distribution A stochastically smaller, but even this
can be done in many ways, e.g. by scaling A and/or B or by exponential family
transformations of the form
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
814
S0REN ASMUSSEN, REUVEN Y. RUBINSTEIN AND CHIA-LI WANG
TABLE 4.2
iN(P, po, x), and fJ2(p, Po, x), as a function of 102 and 103 for two GIIG/I queues
with the feedback on the second queue
iN(Po, x)
a2(p, Po, x)
0.005
0.011
0.017
0.029
0.055
0.112
0.220
2.8E 1.9£ 2.1£ 2.0£ 5.4£ 8.3E 1.5£ -
5
5
5
5
5
5
6
4.5£ - 11
3.3E - 11
1.8£ - 11
4.6£ -12 0
8.7£ - 12
1.1£-11
1.8£ - 11
0.005
0.012
0.019
0.032
0.059
0.120
0.231
2.6E 1.9£ 2.3£ 2.1£ 5.5£ 8.6£ 1.4E -
5
5
5
5
5
5
6
4.2£ -11
3.1£ - II
1.7£-11
4.4£ - 12*
6.4£ -12
9.2E -12
1.9E - 11
0.005
0.013
0.021
0.035
0.063
0.127
0.240
2.4E 2.0E 2.0E 3.2E 4.7£ 9.1£ 1.6£ -
5
5
5
4.7E - 11
3.2E - 11
1.7£-11
5.8E - 12
8.4£ -12
1.3£ - 11
2.4£ -12
iN(po, x)
a2(p, Po, x)
IP(L > 6)
0.008
0.016
0.026
0.045
0.078
0.140
0.270
8.4£ 6.8E 6.6£ 6.6E 6.8£ 5.1E 4.7£ -
5
5
5
5
5
5
5
1.7£ - 10
8.7£ -11
7.7£-11 0
2.1£ - 10
4.3£ - 10
6.2E - 10
7.2£ - 10
4.5
4.2
4.0
3.8
3.6
3.4
3.2
0.008
0.017
0.027
0.047
0.081
0.146
0.282
8.2£ 7.1E 6.9E 6.8£ 6.6£ 5.4£ 4.9£ -
5
5
5
5
5
5
5
1.9E 8.5E 7.5£ 1.3£ 3.2£ 5.7£ 7.1E -
4.5
4.2
4.0
3.8
3.6
3.4
3.2
0.008
0.019
0.028
0.049
0.084
0.151
0.293
8.5E - 5
6.9E - 5
6.6£ - 5
6.7£ - 5
6.7£ - 5
5.7£ - 5
5.1£- 5
102
103
6.66
6.66
6.66
6.66
6.66
6.66
6.66
4.5
4.2
4.0
3.8
3.6
3.4
3.2
6.5
6.5
6.5
6.5
6.5
6.5
6.5
6.4
6.4
6.4
6.4
6.4
6.4
6.4
P(L
AIII(dx) =
i
> 5)
co
exp(v.x)
exp(v.y)A(dy)
10
11
11*
10
10
10
10
2.4E - 10
8.9E-l1
8.2£ - 11
9.8E - II
2.4£ - 10
4.8£ - 10
6.3£ - 10
A(dx),
B",(dx) =
i
co
exp(~x)
5
5
5
6
exp(~y)B(dy)
B(dx)
for A and/or B; in that setting, one would have VI < 0, V2 > 0, and a rather standard
choice would be to use the exponential family transformation for both A and B with
VI = - V2 (cf. Asmussen (1987), Chapter XII). For networks, one could similarly use
scaling or exponential family transformations for the service time distributions, but
there are other potential solutions (for instance, one could also try to change the routing
probabilities).
If the cycles of a queueing network are prohibitively long for the regenerative method
to apply, we can estimate a rare event by using truncated estimators (see Rubinstein and
Shapiro (1992». This method also applies to a process {L t } which is stationary and
ergodic rather than regenerative.
References
AsMUSSEN, S. (1982) Conditioned limit theorems relating a random walk to its associate, with
applications to risk reserve processes and the GIIG/I queue. Adu. Appl, Probe 14, 143-170.
AsMUSSEN, S. (1987) Applied Probability and Queues. Wiley, New York.
ASMUSSEN, S. (1989) Risk theory in a Markovian environment. Scand. Actuarial J., 69-100.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356
Regenerative rare events simulation via likelihood ratios
815
ASMUSSEN, S. AND RUBINSTEIN, R. Y. (1993) Response surface estimation and sensitivity analysis
via efficient change of measure. Stoch. Models 9, 313-339.
BUCKLEW, J. A. (1990) Large Deviation Techniques in Decision, Simulation, and Estimation. Wiley,
New York.
BUCKLEW, J. A., NEY, P. AND SADOWSKY, J. S. (1990) Monte Carlo simulation and large deviations
theory for uniformly recurrent Markov chains. J. Appl. Probe 27,44-59.
COTTRELL, M., FORT, J. C. AND MALGOUYRES, C. (1983) Large deviations and rare events in the
study of stochastic algorithms. IEEE Trans. Autom. Control. 28, 907-920.
FRATER, M. R., LENNON, T. M. AND ANDERSON, B. D. O. (1989) Optimally efficient estimation of
the statistics of rare events in queueing networks. Manuscript, Australian National University.
GLYNN, P. W. AND IGLEHART, D. L. (1989) Importance sampling in stochastic simulations.
Management Sci. 35, 1367-1392.
GOYAL, A., SHAHABUDDIN, P., HEIDELBERGER, P., NICOLA, V. F. AND GLYNN, P. W. (1992) A
unified framework for simulating Markovian models of highly dependable systems. IEEE Trans.
Computers 41, 36-51.
PAREKH, S. AND WALRAND, J. (1988) Quick simulation of excessive backlogs in networks of queues.
In Stochastic Differential Systems, Stochastic Control Theory and Applications, IMA Volume 10, ed. W.
Fleming and P. L. Lions, pp. 439-472. Springer-Verlag, New York.
RUBINSTEIN, R. Y. AND SHAPIRO, A. (1993) Discrete Event Systems: Sensitivity Analysis and
Stochastic Optimization via the Score Function Method. Wiley, New York.
SHALMON, M. AND KAPLAN, M. A. (1984) A tandem network of queues with deterministic service
and intermediate arrivals. Operat . Res. 32, 753-773.
SHALMON, M. AND RUBINSTEIN, R. Y. (1990) The variance of the regenerative estimators with
special references to sensitivity analysis of queueing systems 'with Poisson arrivals. First International
Conference on Operations Research in Telecommunications, Boca Raton, Florida.
SIEGMUND, D. (1976) Importance sampling in the Monte Carlo study of sequential tests. Ann.
Statist. 4, 673-684.
Downloaded from https:/www.cambridge.org/core. IP address: 88.99.165.207, on 12 Jul 2017 at 18:18:42, subject to the Cambridge Core terms of use, available at
https:/www.cambridge.org/core/terms. https://doi.org/10.1017/S0021900200045356