A New Approach to Estimate the Stationary Distribution

A New Approach to Estimate the Stationary Distribution
Bing-shiung Chuang, Instructor, Department of Mathematics and Information Education,
National Taipei University of Education, Taiwan
ABSTRACT
To solve different problems by Monte Carlo methods, it is necessary to simulate random variables
with given distributions. In this study we simulate an irreducible and a periodic discrete Markov chain.
We aim at obtaining exact samples from a stationary distribution instead of approximate samples.
Furthermore, we establish some estimators to provide alternatives in the estimation of  using exact
samples.
Keywords: Algorithm, Backward Coupling, CFTP (coupling from the past), Exact Sampling, Forward
Coupling, Markov Chains
INTRODUCTION
Today the development of computer and the appearance of simulation methods are very fast.
Many calculations of mathematical expectation for the stochastic processes have been fulfilled by
numerical methods on computers. The theory of probability is widely applied in the recent evolution of
science. In economics, finance, traffic and transport, signal and image processing, genetics, modeling in
earth science such as wind, waves or earthquakes, the standard models make use of random variables,
stochastic processes and sometimes the idea from stochastic analysis, Especially, Markov model is
usually used because the outcomes of successive trials are allowed to be dependent on each other such
that each trial depends only on its immediate predecessor.
The Markov model is a simple and useful model for analyzing practical problems. A Markov
process allows us to model the uncertainty in many real-world systems that evolve dynamically in time.
The basic concepts of a Markov process are those of states of a system and state transitions. The theory
of Markov chains use of the theory of so-called stochastic matrices which have been fully studied and
discussed (Asmussen, 1987; Bremaud, 1998; Feller, 1968; Meyn & Tweedie, 1993; Sheu, 2003; Wang,
1998; Wolf, 1989).
In the past decades, Markov chain Monte Carlo method is a popular and active stage of research in
many fields because it is widely used (Gilks, Richardson, & Spiegelhalter, 1996; Madras (ed.), 1998).
n
This method works since, if P ( x, A) denotes the n-step transition law of the chain, under appropriate
n
irreducibility and aperiodicity assumptions, P ( x,) converge to  in the sense that
lim
P n ( x , )   (  )  0
n 
for each x   ; where 
denote the total variation. An approximate sample from  can thus
be taken as the value of the chain after a suitable chosen (large) number of iterations. How close the
sample is to  in a distributional sense depends on the convergence rate of the chain to stationarity.
The main problem is to decide how many steps the chain will enter equilibrium. The question is around
us for many years which is known as “ the initial transient problem” (Asmussen, Glynn, & Thorisson,
62
The Journal of Human Resource and Adult Learning, Vol. 9, Num. 1, June 2013
1992). A solution is to estimate the rate of convergence, which is difficult to get a tight upper bound
usually. In 1996, a completely different approach by using a backward coupling, called coupling from
past (CFTP), was derived by Propp and Wilson (Propp & Wilson, 1996; Wilson, 1996, 2000a, 2000b).
The algorithm can generate exact samples from the stationary distribution of any finite Markov chain free
from the rate of convergence. However, due to the nature of backward coupling, their efforts in
improving the efficiency have limited progress.
On the other hand, the counterpart of backward coupling, the forward coupling is a simple and
straightforward sampling technique. It is easy to understand, but, unfortunately, is known to be biased
for generating exact samples. We refer to Lindvall (1992) and Thorisson (2000) for the knowledge of
coupling. Here, we find a new and more efficient algorithm by modified forward coupling which can
also generate exact samples. We will introduce this method in section 2.
Section 3 suggests a way in which we can use this new method to create tours. These tours will be
identically, independently distributed, and then we show how to use these tours to construct estimators of
means of functional with respect to the target distribution. Finally, we discuss the relevance among the
variety kinds of cycles.
A NEW ALGORITHM
The relative background knowledge can be found in studies done by Chuang (2013), Propp &
Wilson (1996), and Wilson (1996). Here we only describe the main results. Our method departs from
CFTP, and the goal is to derive a more efficient exact sampling technique. We first connect the
successful generation of exact samples by CFTP to the situation when the forward coupling couples.
This connection leads to a decomposition of the stationary distribution which can be implemented to be
an efficient algorithm by running the forward coupling twice.
Firstly, we construct a probability space ( , F , P ), where without loss of generality   [0,1]Z is
an infinite product space and P is Lebesgue measure, so that there is an identically, independently

distributed sequence {un }n  of uniform U[0,1] random variables; and there is then a measurable
function F :   [0,1]   , where  is a state space of a Markov chain, such that X satisfies the
recursion
X n  f ( X n 1 , un ), n  1, 2,
t
We also use X s ( x ) to denote the state of X at time t given it is started in x at time s with s<t.
The construction of Markov chain can be found in studies done by Borokov (1998), Borokov & Foss
(1992), and Meyn & Tweedie (1993).
The process of the forward coupling is composed of paths from every state in  starting at time 0.
Driven by the same sequence of uniform numbers, all paths will coalesce sooner or later in the future. The
first coalescence time 1 is called the forward coupling time in the first time,
1  inf {t : X 0t ( x)  X 0t ( y ), x, y   }
and the second coalescence time  2 is called the forward coupling time in the second time,
 2  inf {t : X 1t ( x)  X 1t ( y ), x, y   } .
Then we have the following algorithm:
Algorithm 2.1
m
Step1. Use {ui }1 to determine
1
The Journal of Human Resource and Adult Learning, Vol. 9, Num. 1, June 2013
63
Step2. Use {ui }n2 to determine  2
Step3. if  2  1 , then we have    2   1 samples; otherwise, we go to step 1.
The algorithm is proved to be able to generate exact samples in Chuang (2013). In the next
section, we will use these exact samples to estimate the stationary distribution.
ESTIMATORS BY USING THE EXACT TOURS
If we run algorithm 2.1 N times, then we can obtain N tours denoted by {Ck }kN1 in the following:
C1  { X  (1) , X  (1) 1 , X  (1) 1} ;
1
1
2
C2  { X  ( 2 ) , X  ( 2 ) 1 , X  ( 2 ) 1} ;
1
1
2


CN  { X  ( N ) , X  ( N ) 1 , X  ( N ) 1} ;
1
1
2
In the kth tour, we have  k   2( k )   1( k ) samples labeled as X  ( k ) , X  ( k ) 1 , X  ( k ) 1 , k  1, 2, , N .
1
1
2
Thus, one might hope that for a functional g :   R , we could estimate the expected value of g with
respect to  () , i.e.,  ( g )   g ( x) (dx) . By the law of large number, we have the first estimator:
1
^
N  i 1
 ( g )  M  g ( X
i 1 j  0
is exact.
1( i )  j
) where
N  i 1
M  1 . This estimator is clearly unbiased since each sample
i 1 j  0
The second estimator can be established by the regeneration property, we consider the ratio

estimator as the second estimator:
coefficient
i
~
 (g) 
E[ g ( X i )]
i 1
E
and distribution  i , i  0,1, 2, , n where
 i  P(  i ) and  i ( j )  P( X 1 i  j ) .
.
The third estimator can estimate each
By the decomposition in Chuang (2013), we have
n
 ( j )    i i ( j ) . Since  i is decreasing in i , the error generated by truncation can be omitted if n
1
is large enough. Furthermore, we can use other methods in simulation to improve your results such as
the Batch-Means method, Jackknife estimator, Bootstrapping techniques, and so on. Those discussions
can be seen in Ripley (1987), Shedler (1993), and Wang (1998) for details.
FURTHER DISCUSSION
Murdoch and Rosenthal (2000) propose a method of obtaining tours related to the forward coupling.
They run a chain forward in time from time 0 until all paths have coalesced, to obtain a sample x0 at
that time, then they run it forward again until all states coalesce again, using the path from x0 as one
tour. The cycle length is much longer than ours in the section 2. Although both of them use the
forward coupling technique twice, but we can find that our cycles are identically, independently
distributed and exact realizations of the decomposition of  which are our advantages. For finite
cycles, the ratio estimator generated by forward coupling is biased.
We can see another decomposition of  in the following:
64
The Journal of Human Resource and Adult Learning, Vol. 9, Num. 1, June 2013

 ()   (0)[ 0 P k (0,)]
k 1
where
k
0
P (0,) denotes the usual taboo probability avoiding the state 0. This formula appears
in Meyn & Tweedie (1993), nobody propose its corresponding algorithms until now. It deserves to study
the relations between any two kinds of cycles. Until now, the cycle generated by our algorithm is the
only exact one which deserves more attention and more discoveries of its applications.
REFERENCES
Asmussen, S. (1987). Applied probability and queues. New York: John Wiley & Sons.
Asmussen, S., Glynn, P. W., & Thorisson, H. (1992). Stationarity detection in the initial transient problem. ACM transaction on
modeling and computer simulation, 2, 130-157.
Borokov, A. A. (1998). Ergodicity and Stability of stochastic process. New York: John Wiley & Sons.
Borokov, A. A., & Foss, S. G. (1992). Stochastically recursive sequences and their generalizations.
Siberian Advances in
Mathematics, 2, 16-81.
Bremaud, P. (1998). Markov chains:Gibbs fields, Monte Carlo simulation and queues. New York: Springer.
Chensov, N. N. (1967). Pseudo random numbers for modeling Markov chains. Zh. Vychisl. Mat. mat. Fiz, 7(3), 632-643.
Chuang, B. (2013). Exact sampling by a modified forward coupling. (Unpublished manuscripts)
Feller, W. (1968). An introduction to probability theory and its applications, v.1, (3rd ed.). New York: John Wiley & Sons.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.
Kalashnikov, V. (1994). Topics on regenerative processes. Boca Raton: CRC Press.
Kingman, J. F. C. (1972). Regenerative phenomena. New York: John Wiley & Sons.
Lindvall, T. (1992). Lectures on the coupling method. New York: John Wiley & Sons.
Madras, N. (ed.)(1998). Monte Carlo methods. Proceedings of the Workshop on Monte Carlo methods held at the Fields Institute for
Research in Mathematical Sciences, October 25-29, 1998 in Toronto Ontario.
Meyn, S. P., & Tweedie, R. L. (1993). Markov chains and stochastic stability. London: Springer-Verlag.
Murdoch, D. J., & Rothenthal, J. S. (2000). Efficient use of exact samples. Statistics and Computing, 3(10), 237-243.
Propp, J. G. & Wilson, D. B. (1996). Exact sampling with coupled Markov chains and applications to statistical mechanics. Random
Structures and Algorithms, 9, 223-252.
Ripley, B. D. (1987). Stochastic simulation. New York: John Wiley & Sons.
Ross, S. M. (1997). Introduction to probability models. New York: Academic Press.
Shedler, G. S. (1993). Regenerative stochastic simulation. Boston: Academic Press.
Sheu, S. J. (2003). Lecture note. Department of Mathematics. Taipei: Academia Sinica.
Sigman, K., Wolff, R. W. (1993). A review of regenerative processes. SIAM Review, 36.
Thorisson, H. (2000). Coupling, stationarity, and regeneration. New York: Springer.
Wang, C. L. (1998). Lecture note in Simulation Study. Hualien: National Donghwa University.
Wilson, D. B. (1996). Exact sampling with Markov chains. (Unpublished Doctoral dissertation at Massachusetts institute of
technology).
Wilson, D. B. (2000). How to couple from the past using a read-once source of randomness. Random Structure Algorithms, 16,
85-113.
Wilson, D. B. (2000). How to get a perfectly random sample from a generic Markov chain and generate a random spanning tree of a
directed graph. Random Structure Algorithms, 16, 114-143.
Wolf, R. W. (1989). Stochastic modeling and the theory of queues. Englewood Cliffs. NJ: Prentice Hall.
The Journal of Human Resource and Adult Learning, Vol. 9, Num. 1, June 2013
65