Random Walks on Finite Groups and Rapidly Mixing

S ÉMINAIRE DE PROBABILITÉS (S TRASBOURG )
DAVID J. A LDOUS
Random walks on finite groups and rapidly
mixing Markov chains
Séminaire de probabilités (Strasbourg), tome 17 (1983), p. 243-297.
<http://www.numdam.org/item?id=SPS_1983__17__243_0>
© Springer-Verlag, Berlin Heidelberg New York, 1983, tous droits réservés.
L’accès aux archives du séminaire de probabilités (Strasbourg) (http://www-irma.
u-strasbg.fr/irma/semproba/index.shtml), implique l’accord avec les conditions générales d’utilisation (http://www.numdam.org/legal.php). Toute utilisation commerciale ou impression systématique est constitutive d’une infraction pénale. Toute
copie ou impression de ce fichier doit contenir la présente mention de copyright.
Article numérisé dans le cadre du programme
Numérisation de documents anciens mathématiques
http://www.numdam.org/
RANDOM WALKS ON FINITE GROUPS AND RAPIDLY MIXING MARKOV CHAINS
by
David Aldous*
Department of Statistics
University of California at Berkeley
Introduction
1.
This paper is
which
are
useful in
random walks
We
are
study
expository
our
results
are
the type of
our
problems
how
long
question.
stationary distribution.
defined at
(3.3).
but
Consider the
we
to answer this
we
propose that the concept "time until
stationary" should
be formalized
in tractable form, it is seldom
exactly, but often
T
can
be estimated
possible
can
be
interpreted
kind needed to make
the
#G,
on a
group
G,
we
several
call the
In Section 4
parameter
get
T
One
card-shuffling,
we
a
particular
illustrate
card-shuffling schemes.
"rapid mixing" property:
this is the property that
the size of the group.
to
the number of random shuffles of
deck well-shuffled.
Some chains have what
random walk
to
a new
as
coupling technique by analysing
(B)
a
by the coupling technique.
situation where these problems arise naturally is in random
T
by
Since it is seldom possible to express distributions
t
where
using
shall argue in Section 3 that this
chain at time
a
consider
does it take until the distribution is close
Instead,
the distribution is close to
of
we
particular
says that under mild conditions the distribution
asymptotic theory,
the wrong
answers
and in
purpose is to make them well-known!
stationary distribution? One might try
classical
probabilistic techniques
types of problems.
two
(imprecise) question:
to the
some
perhaps slightly novel, the mathematical
Markov chain converges to the
a
T,
Although
mostly easy and known:
(A) Elementary theory
of
account of
studying certain finite Markov chains,
finite groups.
on
and the form of
ideas
an
When this property
T
for
is small compared
holds, probabilistic
techniques give simple yet widely-applicable estimates
for
a
hitting
’
time
*
Research supported by National Science Foundation Grant MCS80-02698.
244
distributions.
(7.1) (7.18)
These
discussed in Section 7.
are
is that for
rapidly mixing random walk with uniform initial
a
distribution, the first hitting time
with
exponentially distributed
parameter which
be
can
interpreted
single
R#G.
Here
the
as
on
state is
(6.4),
defined at
R,
This result, and its
to
more
for
analogue
is
a
rapidly
complicated problems
hitting times
sets of states, and
arbitrary
approximately
number of visits to the
mean
partial extensions
has
involving hitting times
on a
mean
initial state in the short term.
mixing Markov chains,
The fundamental result
from
arbitrary initial distributions.
This paper is about
approximations,
for finite Markov chains there
butions at time
matrix.
However,
e.g., 52! in the
matrices
are
and
t
case
studied
using
stationarity, and
away the
a
Instead,
our
particular
Finally,
apply
the
less
some
(1982)
be
can
groups
studies convergence
hitting times, using this theory.
So
we
are, so to
speak, throwing
naturally our’results applied
general properties,
are
not
a
can
to
be obtained
study is feasible.
such
apparent
as
exponential
from ad hoc
analyses
cases.
we
should
out two limitations of
point
the Markov chain results it is
examples
on
precise information than
hitting times, which
distribution, at least
walk
Diaconis
studies
structure.
give
results reveal
for
large,
of the familiar
analogue
analytic study of that random walk, if such
approximations
of
the
only the Markov property;
random walk
random walks
practice,
case.
(1981)
Letac
special random walk
particular
from the
use
for distri-
Exact results in terms of 52! x52!
representation theory,
group
Our arguments
expressions
illuminating.
Fourier theory in the real-valued
to
since
reader:
where the state space is
case
card-shuffling.
of
exact
course
puzzle the
time distributions in terms of the transition
and sometimes in
principle,
In
hitting
of
have in mind the
we
seldom
are
which may
usually
approximately:
is that then the
rapid mixing property
on
necessary to know the
one reason
for
our
concentrating
hitting time results depend
"high-dimensional"
To
techniques.
stationary distribution is uniform.
which
characteristic of complicated
our
stationary
on
random
Second,
seems
processes, rather than the
245
for which
elementary one-dimensional examples of Markov chains,
give
2.
our
techniques
useful information.
no
Notation
The
general
case we
shall consider is that of
irreducible Markov process
p..(t)
be the transition
=
there exists
(2.1 )
X
=
{i,j,k,...}.
and let
’
classical theory
pi ~j(t) ~~(j)
(2.2)
G
Q(i,j),
qi
~
#.
j~
probabilities. By
unique stationary distribution
a
continuous-time
finite state space
on a
be the transition rates,
Q(I ,j ), j # I ,
Let
(Xt)
a
ir,
and
as
= j)--~~r(j)
a.s.
as
t-~~ ,
t
timeSit:
where
Xs =j)
amount of time before time
The
the
spent in state
t
results hold for
same
analogue
is the random variable
=
of
(2.1)
we
need
a
setting;
discrete-time chain
a
(X )
a
corresponding continuous-time
=
P(i,j), j # i.
In fact
Xt XN t
Nt
=
Let
TA (resp.
from
some
to
study convergence to stationarity in the continuous-time
aperiodic setting with
no
changes.
Given
(2.4)
except that for
(X ),
provided X is aperiodic.
the results hold in the discrete-time
essential
define
j.
discrete-time chain
as
we
the
aperiodicity:
(2.3)
In Section 3
measuring
TA)
where
is
with transition matrix
(X*)
process
a
Poisson
counting
process of rate 1.
be the first
hitting
time of
X
initial distribution.
Then
TA
(resp. X*)
by (2.4),
=
we can
with transition rates
explicitly
represent
we can
P(i,j)
on a
as
set
A
and it is easy
A
see
(2.5)
In Section 7
1
we
as
TA ~ ~ ,
°
study hitting time distributions for continuous-time
processes;
246
by (2.5)
results extend to discrete-time chains.
our
realise that
be used for
even
the results in Section 7
periodic discrete-time chains by
since it is only
be
though
important
to
rapid mixing, they
the observation
may
(2.5) above,
corresponding continuous-time
that the
required
use
It is
process
rapid mixing.
We shall illustrate
walks
on
tion
®.
p
be
a
probability
a
The discrete-time random walk
of random
G.
associated with
p
is the process
independent with distribution
Xn ® ~n+1 , where (~n) are
Equivalently, Xn is the Markov chain
Xn+1
case
such that
G
measure on
G
on
special
group structure, under the opera-
support(p) generates
(2.6)
the
discussing
Suppose G has
finite groups.
Let
results by
our
=
p.
with transition matrix of the
special
form
P(iJ)
=
TI(i)
time random walk
are
=
As at
1/#G.
(2.4)
discrete-time random walks.
to state when
"symmetry" properties
first
hitting time
dependent
When
on
i
on
there is
we
specialized
corresponding continuous-
of the random walk.
i
For
example,
=
the random walk
case
become
the
ETITi’
mean
a
not
case we
shall
assume
1 .
assigns probability
We shall avoid occasional
general Markov
natural
more
case.
qi
need only change time scale by
general results
stationary distribution, is clearly
from the
stating the specializations in
p
our
to the random walk case, because of the
in the random walk
This is automatic if
a
usually remain with the
The results in the
(2.7)
we
stationary distribution is the uniform
and it is for this process that
(Xt),
stated, although in the examples
simpler
.
The
By (2.6) the chain is irreducible.
distribution
u(i 1 ® J) .
zero
to the
identity; otherwise
constant factor to attain
(2.7).
uninteresting complications by assuming
247
(2.8)
~12
max i 03C0(i)
which in the random walk
case
,
is merely the
assumption that
G
is not the
trivial group.
We should make
explicit
our
definition of
hitting
times:
TA = min{t >0:
as
distinct from the first return times
T~ - min{t>0:
(2.9)
Elementary theory gives
(2.10)
where
using the convention a/bc = a/(bc).
we are
For sequences
of reals,
(a ), (b )
a -
Finally,
G
bn
bn
means
lim
means
lim sup
1 ;
an/bn ~ .
the total variation distance between two
.
probability
measures on
is
=
1
max
The time to
In the
.
’
"-
AGG
3.
=
approach stationarity
general Markov
(3.1)
case
write
di(t) =
-~rN
for the total variation distance between the
distribution at time
t
stationary distribution
for the process started at
d(t)
=
max
i
d.(t) .
’
i.
Let
and the
248
Note that in the random walk
so
d(t)
In
di(t).
=
the
general
does not
d.(t)
case
depend
i,
on
by symmetry,
elementary limit theorem (2.1) implies
d ( t ) --~ 0
as
Moreover, classical Perron-Frobenius theory gives
d(t) -
(In
is the
discrete time, a
eigenvalues
of the transition
speed of convergence
some
as
to
d(t)
natural to
of
d(t)
use
makes
a
matrix.)
stationarity.
Thus a
looks
fairly abrupt switch
(3.3)
T
where the constant
the
from
near
1/2e
is used
=
min{t:
=
T(1/2e)
merely
at
for
a
time
asymptotic
examples
qualitatively
to express the idea of "the time taken to
T(e)
a
our
the time of this switch rather than the
(3.2)
by
describes the
However, in
Informally, think of this switch occurring
it
.
largest absolute value, excepting 1, of the
mixing random walks the function d(t)
That is,
C>0, 0~1 .
1 to
of
rapidly
like
near
0.
It seems
asymptotic behaviour
approach uniformity".
T.
Formally, define
algebraic convenience; replacing
different constant would merely alter other numerical constants in
sequel.
249
d(t)
The idea that
considering
an
"abrupt switch"
For
sequence of processes.
a
shuffling scheme
to
In
we can
some
makes
examples
an
be formalized
can
in
example,
applying
we
will get functions
(and
we
believe it holds rather
that there exist constants
for each 0e1 .
as
aN
.
In other words, the scaled total variation distance function
converges to the
generally)
such that
aN
(3.4)
particular
dN(t), TN(~).
N-card deck
prove
a
by
for
step function
t 1 1.
An
dN(t/aN)
example
is shown in
Fig. 3.24.
The next lemma
gives
probably be traced back
(3.5)
LEMMA.
(a)
(b)
03C1(t) = max 03C1i,j(t) .
= ~Pi(Xt ~.) - Pj(Xt ~.)~ ;
is
p
submultiplicative :
is
Assertion
(a)
the
proof is
the
following fact,
more
LEMMA.
Let
p(S+t) p(s)p(t).
decreasing.
variation distance.
(3.6)
to Doeblin.
2d(t).
(c) d(t)
Proof.
can
Define
03C1i,j(t)
Then
elementary properties of d(t), which
some
follows from the
triangle inequality
The other assertions
can
be
for the total
proved algebraically, but
transparent when coupling ideas
are
used.
The
key idea is
whose proof is easy.
have dis tributions
Z1, Z2
vl, v2.
Then
°
Conversely, given vl, v2,
we can
such that
construct
IIvl-v211 = P(Zl ~ Z2) ;
Zn
To prove
(b),
fix
has distribution
i., i
,
,
s, t.
vn ( n =1, 2 ) .
Construct
(Z1,Z2)
such that
250
Z"
has the distribution of
s
Then
on
the sets
A.
s
1’2
on
the sets
Aj,k =
given
X~=i~ ;
(s).
such that
construct
=
Z1s+t =
And
X~
Z2s+t
{Z1s = j, Z2s = k}
(j ~ k)
(Z1s+t,Z2s+t)
construct
such
that
~.t~-~j~ = ~’t~ ’ ~~t~-!’j~ = ’kCt~-) ~
~~t~~t!~,k)=~j~
~~
’
~
Z~ .
Now
and
Z~.)
(resp.
pd) .
has the distribution of
X~
given
Xo
=
i~
(resp.
i~),
so
by (3.7)
ip(t)p(s) .
To prove
(c),
use
the
construction except for
same
and
stationary distribution
having
=
giving
d~ (s).
Z~ ~
the
Then
d,
This result is useful because it shows that
cular time
t..
gives
an
into
an
upper bound
on
p
at
a
parti-
upper bound for later times:
p(t)
Translating this
an
i
expression involving
t
explicitly,
p(t)i(p(tQ))~~0-~ .
In
the
particular
the definition
following bound, which
we
(3.3)
shall
of
use
T
makes
extensively.
and
we
obtain
251
(3.8)
d(t) exp(1 - t/T) , t ~
COROLLARY.
(a)
REMARKS.
We are here
results for continuous time; the
stating
results hold in the discrete time
(b)
Note that the
processes is
simple
a
0.
aperiodic
same
case.
rate of convergence in finite state
exponential
consequence of the basic limit theorem
Perron-Frobenius theory is only needed if
one
wants
(2.1).
for the
expression
an
The
asymptotic exponent.
(c) Corollary
3.8
be
can
rephrased
as
T(E) T(1
As mentioned in the
exact
expressions
for
0
T(e)
and
some
A C G
for which the
side may be
right
is
ent(p) =
distribution n
on
G
has
on
conveniently estimated.
general method which gives effortless,
though usually rather weak, lower bounds.
p
We shall
I
We should however mention another
distribution
r(E:).
get useful
inequality
d(t) ’_
for
or
to
The basic way to get lower bounds
the obvious
use
d(t)
and hence for
p..(t),
is to
.
Introduction, it is seldom possible
instead discuss how to get bounds.
d(t)
1
e
ent(Tr)
Recall that the entropy of
a
log u(i).
In
particular, the uniform
log #G.
We
quote two straightforward
=
lemmas.
LEMMA.
let
p
LEMMA.
(X )
Let
be the discrete-time random walk associated with
be the distribution
If
v
of
is a distribution
lhen
Xn.
G
on
n
such that
ent(v) > (1-e) log #G.
From these lemmas
(3.9)
we
immediately
>
for discrete-time random walks.
obtain the lower bounds
;
03C4(~) ~ (1-~) log #G ent( )
e
then
u,
and
252
Our next topic is the
method of
later to
coupling method, which
upper bounds on
getting
T.
time distributions
hitting
we
is
widely-applicable
a
We remark that for the
applications
need
on
only
upper bounds
and
T;
often rather crude upper bounds will suffice.
be
(Xt)
Let
construct
a
(3.10)
Z~
pair
Markov process.
a
of processes
(resp.
Z2)
Z1 - Z2
T
(Zl,Z2)
a
i, j.
Suppose
coupling, and
T
where
coupling
By Lemma 3.6
time.
P(T’~>t) .
°
Thus from estimates of the tails of the distributions of
we
have constructed
couplingsfor
each
pair i, j.
where
T
~
~
because
=
small
as
possible.
now
as
outline the strategy
is
conceptually simpler
we
have
call
f
function
a
a
Then
distance
we
shall
in
for which the
constructing couplings.
to discuss the discrete-time case first.
GXG-~{0,1,2,...}
f:
use
couplings
function.
such that
f(i,j)
Suppose that for each pair
= 0
(i,j)
joint distribution
(3.14)
Suppose
get good estimates of the time taken for the process
to
seek to construct
We
we
1 ,J
we
time is
expectations.
times
max
approach stationarity,
coupling
coupling
p(t)
by (3.12)
To summarize:
a
A crude way is to take
get estimates for d(t).
(3.13)
to
(resp. j);
inf{t:
=
a
i
given
{t > T} ,
on
T~’J)
(=
X
as
(3.12)
can
we can
such that
is distributed
(3.11)
Call
Fix states
6..
~
=
JC(V)
=
such that
Pd,’); C(M)
=
P(j,-);
V=W if
i=j.
It
Suppose
iff
i = j:
there is
253
Then
(Z1,Z2)
construct the bivariate Markov process
we can
such that
P((Z1n+1,Z2n+1)~.|(Z1n,Z2n) = (i,j)) = 03B8i,j
.
This is
plainly
a
Think of the process
coupling.
the distance between the two processes; the
T
All
we
our
need
possible choices
since
our
the distribution
possible
(and
process
aim is to make
(V,M)
to arrange that
strict decrease.
T)
hence
To
f(V,W) f(i,j)
coupling
is just
estimating
probabilities
setting,
is
positive probability
some
specified, estimating
the time for the
merely replace
we
(3.15) 03A3 i,j(k,l) Q(i,k) ; 03A3 i,j(k,l)
=
=
walk
associated with
of the
p*
Let
the
joint transition
such that
i,j(k,l)
rates
Q(j,l) ;i,(k,)
p*(j)
coupling
integer-valued
=
=
Q(i,k) .
’
We should mention the useful trick of time-reversal.
p.
the
of
need not be Markov.
Dn
by joint transition
the random walk associated with
there
course
and indeed it is often
with
Note, however, that
In the continuous-time
Of
6...
coupling,
decrease it is natural to choose
Ef(V,M),
to minimize
the
specify
joint distributions with prescribed
Dn
Once the
to hit 0.
Dn
for these
measuring
Dn = 0) .
only specify the "one-step" distributions
marginals:
time
min(n:
as
time is
coupling
will be of this Markovian form.
couplings
will be many
a
=
D =
u(j-1).
is
(X )
Suppose
Then the random
is called the time-reversed process, because
easily-established properties
(a)
(b)
when
and
XD
X~
are
given
the uniform
distribution,
(X~,X~,...,XK) ~ (X K ,X K-1 ,...,X ) . 0
°
The next lemma shows that when
random walk with its
(3.16)
LEMMA.
a random walk
Le t
Xn
estimating d(n)
time-reversal, if this
d(n)
(resp.
fresp. d*( n ) )
is
we
more
may
replace the original
convenient to work with.
be the total variation
the time-reversed walk X*n)
.
Then
function for
d(n)
=
d*(n).
°
254
i
Writing
Proof.
for the
identity
of
G,
I
- ~ ~ Pj_1 (Xn = i ) -1/#G~I
=
n
j
J
by the
= i ) -1/#G~I
re-ordering
= 03A3|Pi(X*n = j) - 1/#G|
it may
course
the
same as
the
sum
by (a)
d*(n) .
=
Of
random wal k property
original
happen that p
~*,
=
call such
process:
general continuous-time Markov setting,
a
a
so
the reversed process is the
random walk reversible.
In the
process is reversible if it
satisfies the equivalent conditions
(3.17)
~(~)p>>j(t)
Although
we
lose the
processes do have
opportunity
some
X~
=
i
and consider
the
can
E.S
times
trick, reversible
necessarily possessed by
not
all such
is
equivalent
to consider the random walk with
such that
S
over
be shown that T
following
(3.18)
stopping
our
instance, another way to formalize the concept
approach stationarity" is
be the infimum of
It
For
of
taking advantage
regularity properties
non-reversible processes.
of "the time to
of
=
stopping times,
to
is uniform; let
XS
and
let T
=
min
Ti
Ti.
for reversible processes, in
r
sense.
PROPOSITION.
There exist constants
C1, C2
such that
T C2T
for all reversible Markov processes.
This and other results
on
reversible processes
The rest of this section is devoted to
an
exact
analytic expression
for
d(t)
are
one
which
given
in Aldous
(1982a).
example, in which there is
can
be
compared with coupling
estimates.
(3.19)
EXAMPLE.
Random
on
the N-dimensionaZ cube.
The vertices of the
255
unit cube in
dimensions
N
0’s and 1’s, and form
There is
a
a
be labelled
can
G under
group
(0,...,0,1,0,...,0)
The random walk associated with
the cube, which
chosen
jumps from
uniformly
a
at random.
0
=
of
(il,...,iN)
Write
with
1
p(j)
=
componentwise addition modulo 2.
f(i,j) =
natural distance function
0 = (0,...,0),
i
N-tuples
as
1
at coordinate
r,
"simple random
walk"
~ r ~ N ,
otherwise.
is the natural
p
vertex to
of the
one
vertices
neighboring
The discrete-time random walk is
shall consider the continuous-time process,
though similar
hold for the discrete-time random walk modified to become
on
periodic:
we
results would
aperiodic by
putting
We
i, j;
c
for
describe
now
let
L
1/(N+1)
=
1/(N+l)
1
and let
C
.
as
ll.. i ®
(interpret cL+1
(if L = 1 )
as
as
Q(n,n-2)
It is
coupling
time
T
is
T* =
c~
1 N
1
C.
c E
=
coupling,
on
C.
i.e. the Markov process with tran-
plain that the distance
(2 n N), Q(1,0)
n/N
Fix
follows.
1/N ,
=
the Markov process
=
T.
c1).
be the associated
sition rates
evolves
®
A..(i
Let
upper bound for
be the set of coordinates
=
(if L>1)
an
=
Define
which jc ~ i.
N
r
coupling, which will give
a
f(i,j)
=
=
process
{O,l,...,N} with transition
=
2/N.
rates
It is not hard to deduce that the
stochastically dominated by
T*1 + T*3 + T*5 + ... + T*M ;
Dt =
the
M=
sum
N
(N odd) ,
N-1
(N even),
256
where the summands
N/m.
mean
independent exponential
are
To estimate the tail of the distribution of
ET*
var(T*)
=
N(1 +1/3+1/5+... +1/M) ~
T*
we
having -
calculate
~! 1og(N)
+(1/3)~ + ... +(1/M)~) -
=
~-,
So
P(T*>aN 1og(N))
d(aN
2014~0
So
T* m
random variables,
by Chebyshev’s inequality.
as
conclude
we
r(e)
Me shall
now
~ 1og(N) ;
show how to get
the continuous-time random walk
easy to
on
an
X.
exact
{0,1} with transition
1
e
.
analytic formula
componentwise
that the component processes
verify
processes
0
1
Q(0,’!)
(X’..,X’j.
as
Write
It is
independent Markov
are
rates
d(t).
for
Q(1,0)
=
=
1/N.
So the
component processes have transition probabilities
Po(X~=0) =~{-!+exp(-2t/N)} , Po~"~ =~{1-exp(-2t/N)} .
So the transition
probabilities for the random walk
(3.20)
=
Thus
we
are
2’~{1
L
=
f(j,0) .
obtain the formula
(3.21)
d(t)
=
2~ ~ N
L=0
.
’-
Elementary but tedious calculus shows
(3.22)
lim
N
"
-
1og(N))
=
=
1
,
t
0 ,
t
1
>
1
,
and hence
(3.23)
r(e) -
~ 1og(N) ,
0
e
1.
257
Thus
that the upper bound for
we see
gives
this
the correct order of
=
3.24 shows computer-c,alculated
8, 32, 128, 512,
REMARK.
Our
use
tion is to the
we
magnitude, though
stationary distribution
used another
a
T
large
may
indicator, say entropy?
certain
for
graphs
of total variation distance to
Xt
Thus
N.
of
d( t /
In this
for
(3.22).
measure
seem an
log N)
how close
a
distribu-
arbitrary choice.
example the entropy
What if
~(t)
has the form
~(t)
to
not the correct constant, in
to illustrate the convergence in
of the distribution of
for
coupling technique
example.
Figure
N
derived by the
T
CPN(e)
=
Ncp(t/N)
does not exhibit the
So it is hard to
see
how to define
in terms of entropy; and it is not clear that the
mations of Section 7 would be valid under
some
"abrupt switch"
a
parameter analogous
hitting
definition of
which used entropy rather than total variation distance.
of
time
approxi-
"rapid mixing"
258
4.
Card-shuffling models
Imagine
be described
in
deck of
a
by
permutation
a
position
where
the bottom.
cards, labelled 1 to N.
N
So the card in
position j is labelled
position
on
the group
o
is
picked according
GN
of
to the distribution
of the card labelled
Then
=
Let
new
let
and
deck
i
after
is the random walk
(i.e.
is to estimate
Xn(i)
for the
associated with
GN
Imagine starting
GN.
position i).
in
at
T
(3.3)
measuring
as
T
for
with
to
infinity.
we
imagine
We shall
two
coupled processes.
motion of
a
Markov chain
~r, a,
6
a
As in section 3
Xn
the number of
precisely,
To describe
shall
we
the number of cards
each Er
N
tends
couplings,
specify dependent
having distribution
~.
is the transition matrix for
of
One way of
particular card: this
on
as
More
say, and then
of the two decks,
random shuffles
The joint distribution
p.
get upper bounds by coupling.
decks, in states
u.
Our purpose in this section
specific shuffles
some
try to find the asymptotic behaviour of TN
getting
lower bounds is to consider the
motion
Yn
=
i fixed, forms the
Xn(i),
{1,...,N} with transition matrix
P(j,k)
(4.1)
distance function for
y,
we
b
=
the stationary distribution is uniform.
(4.2)
shuffle in which
random
Write
shuffles needed to get the deck well-shuffled.
the
that the
be the total variation distance between the distribution of
Think of the parameter
N
independent such random shuffles.
on
i
position
probability distribution
A
the group
on
with the card labelled
indicating
a,
p.
n
be the uniform distribution
n
d(n)
n.
permutation
being
shuffle of
A
permutations describes the
position
(Xn(i))
a
position 0(1).
i has moved to
p
Xn
by
i
the card labelled
is the top of the deck and
position 1
the deck may also be described
card at
{1,...,N},
of
’n-
The state of the deck may
Writing
have the obvious
d(n) >
dy
for the total variation
inequality
dy(n) .
We shall need three famous results from elementary
probability theory,
259
which
describe.
we now
Given two decks, say
by the
match occurs.whenever one
a
labelled card in both decks.
same
Let
{i :
the number of matches between decks in states
(Feller (1968)
(4.3)
p.
and
-n-
Then
a.
107)
CARD-MATCHING LEMMA.
Note that
f(7r,a)
distance function
Second, let
X
For
uniform
(1)
on
the number of unmatched cards, is
=
on
GN.
Rn
be the number of distinct cards obtained in
uniform random draws with replacement from the deck.
where
#{C1,...,Cn},
=
Ci
are
Lj
=
be the number of draws needed to
Feller
(1968)
1.1.d. uniform
min{n:
1im
{1,...,N}.
on
we
LEMMA.
If 0
have
1
1
a
1-a
the number of the first draw
U
BIRTHDAY LEMMA.
Diaconis
(4.6)
and
for
now
on
mi n {n :
=
U/N 1/2 -+ V
describe and
cards.
Then from
in
and
if j
=
zn
j(N) satisfies
In
probabi Zi ty.
analyse
which
Cn = C i
V,
obtain
we
for
where
some
some
i
some
0
V
examples.
U
previously-drawn
be
card:
n } .
~.
Several of these
are
in
(1982).
EXAMPLE.
j .
N
to random".
"Top
it in
replacing
1
Let
Rn =N-j}
Third, consider again random draws with replacement, and let
We
n
That is,
get all but some j
then
00,
for fixed j
(4.5)
natural1
a
pp. 225 and 239
(4.4) COUPON-COLLECTORS
0
be
N
as
~n
position is occupied
a
random
define the
Here
we
shuffle
position in the deck.
permutation 03C0j
by
by removing the top card
For
a
formal
description,
260
’~~ ( i ) = i -1 ,
i
=
rr ,
Then the random shuffle is
for
i j
i
>
j.
uniform
on
,
J
{1,...,N}.
We shall
prove
T(E) N N loq(N) ;
(4.7)
To
analyse this example it is convenient
as
discussed in section 3.
That is,
top".
of the deck.
To construct
to the
top.
uniformly at random, and moved
not
destroyed,
so
a
are
La
coupling time
once
top
label
a
the card labelled
Now matches,
completely matched.
at the time
least once, the decks
The
coupling.
move
to the
Choose
two decks.
coupling, consider
a
Plainly this is
at which the decks are
.
to use the time-reversed process,
uniformly from {1,...,N} and in each deck
C
.
Here, the time-reversed process is "random to
card is chosen
a
0e1
C
is the time
T
created,
are
at which each label has been chosen at
So
completely matched.
By the Coupon-Collectors Lemma,
d(aN
(4.8)
To
get the lower bound, consider the
the bottom j
cards have
1f-l(N-j+l) 1f-l(N-j+2)
Let
L. be the number of
then
chosen. If
L.
> n
shuffles have
never
relative order with
=
a >
as
set
increasing labels:
of states
A. J
we
start with
shuffles until all but some j
the bottom j
cards after
been chosen to be moved,
increasing labels.
So
so
Using the Coupon-Collectors Lemma,
P(Xn E Aj)
P(L. J >n) - l/j! .
we
find
for which
n
>
a new
deck.
labels have been
"random to
remain in their
1/ j ! ,
d(n) - >
~r
that is,
Suppose
...
1.
top"
original
Since
261
d(aN
This and
(4.8)
In the
1
example above the coupling is
very
trivial but where
direct argument
a
that the
top" shuffling is uniform.
to
minor modification for which the
a
And in fact the
simple.
using coupling, by observing
already-chosen cards in "random
order of the
l.
a
(4.7).
establish
upper bound could be obtained without
But here is
N--~~ ;
as
coupling argument
(1982)
Diaconis
hard.
seems
is
equally
records that
Borel proposed this shuffle.
(4.9)
EXAMPLE.
to
random, bottom
to random".
Here
we
get
T(e) -
log(N) ;
N
0
1,
e
(for
the obvious modifications of the arguments above
using
alternate between
we
the top and the bottom card to be removed and replaced at random.
picking
Again
"Top
consider the set
of states for which
the lower bound,
successive cards have
some j
increasing labels).
(4.10)
EXAMPLE.
possibility
permutation,
of
doing nothing.
and 03C0j
the
random shuffle
Formally, let
for
uniform
J
at random
pick
periodicity,
Tr
be the
and
permutation transposing j
(4.11)
for constants
we
To eliminate
adjacent cards, and transpose them.
the
Here
"Transposing neighbours".
Then the
We shall prove
Tj
Cl, C2.
We need first
under this shuffle.
some
results about the motion
(Yn)
This motion is the Markov chain
transitions
=
=
P(j,j)
=
1 - 2/N
P(1,1 )
=
P(N,N)
=
1/N
1 -1/N .
on
of
a
of
also allow
we
identity
j+1.
{0,...,N-1}.
on
pair
a
single
{1,...,N}
card
with
262
This is
symmetric random walk with reflecting boundaries.
a
straightforward exercise
in weak convergence
to show
theory
normalised, this converges weakly to Brownian motion
reflecting
Bt
It is
a
that, suitably
[0,1]
on
with
boundaries:
N-1Y[2t/N3]
~ Bt .
The first assertion of the lemma below is
now
immediate, and the second is
not hard.
(4.12)
LEMMA.
at the
top reaches the
Let
be the number
S..
of shuffles until the card initially
position (i.e.
S 1 /N3 2 V,
Let
52
position
be the number of
shuffles
reaches the bottom.
the middle).
V
where
~hen
0.
>
until the card in
an
arbitrary initial
Then there exist constants
K,
S
>
0,
such
that
s O. N>1.
Suppose
Yo
1,
=
and write
the distribution of
dy(n)
for the total variation distance between
and uniform.
Yn
Then
_[N/2J) - CNl2JlNII
~
and
1 > n) -1 2
so
by the first assertion
of Lemma 4.12.
than
get the lower bound
1/2e,
and
so we
To get the upper bound, suppose
the
following
(a)
two
Matches
we can
the
a
right
is greater
aN .
T >
produce
a
coupling
(Xl,X2)
with
properties.
are
not
X~(i) X2(i)
=
(b)
For small
A card in
one
That is, if
destroyed.
for
That is, if
X~(i) X~(i)
=
then
m > n.
deck cannot jump
(resp. )
over
X2(i)
the
same
then
card in the other deck.
X~(i) ~ (resp. ) X2(i)
263
for
Given such
0.
n >
the
coupling,
a
time is
coupling
time until the cards labelled
i
T
=
matched.
are
But
the number of shuffles for the card labelled i
(in
deck
the deck where this card is
where
Ti’
max
by (b)
is the
Ti
S2,
have
we
to reach the bottom of the
initially higher).
So
log(N))
NKe-03B2C log(N)
- -~
and then
S
coupling satisfying (a)
a
Let
~r, o.
position j
are
be the set of j
S
matched
C
>
1/P,
to
(b),
and
consider two decks in
such that neither the cards in
the cards in
nor
and add
{jl,...,jL}
as
provided
CN3log(N).
T~
To exhibit
states
0
by Lemma 3.10
position j+1
S.
Let
J
matched.
are
be uniform
List
on
{0,...,N-1} and define J* by
J* = J
=
The
coupling
is
is immediate.
one
deck:
REMARKS.
(b)
in this
(4.13)
(a)
were
same
=
jkES (interpreting
a
as
way in which
applied
(b)
jo).
to the first deck
coupling, because J*
only
label
shuffle 03C0J
is uniform.
as
the card at
so
This shuffle generates
a
The lower bound obtained
and 03C0J*
Property
could fail is if the
to both decks when the card at
coupling is designed
position
this cannot
j+1
same
position j
in the other
happen.
reversible random walk.
by considering entropy (3.9) gives T >
CN
example, which is rather crude.
EXAMPLE.
"Random transpos itions".
randomly chosen pair
pair
J
This is
deck had the
and the
if
And the
transposition 03C0j
in
JfS
produced by applying
to the second deck.
(a)
if
of cards.
to be identical.
permutation transposing
To avoid
For the formal
jl
and
j2.
Here
we
shuffle by
periodicity,
description, let
we
transposing
again allow the
7r..
Jl,J2
Then the shuffle is
be the
,
where
a
’
264
and
J1
J2
independent, uniform
are
~N
(4.14)
CN~ ;
T~
Diaconis and Shahshahani
(1981)
use
We shall prove
{1,...,N}.
on
for
some
constant C.
representation techniques
group
analyse this shuffle.
From their results one can obtain the
(4.15)
T(E:) -
To describe the
~I 2 1og(N) ;
coupling,
0
1
E
note that the random shuffle may be described
(independent, uniform),
J
at random
and then transpose the card labelled
C
with the card at
label
a
and
C
two decks in states
a
77, a,
pick
Plainly this is
described above.
of the decks after this shuffle.
position
new
a
Then
J
were
(Yl,Y2)
Y1(C) Y2(C) J.
let
=
=
C
matched,
were
matched, in the decks
match has been created,
so
Now the chance that the event in
coupled process, and
coupled process.
coupling time
N
0.
T
we see
the cards at
one
=
where
Let
(Z1,Z2)
be the
the number of unmatched cards in the
on
P(iJ-1) =
So the
Thus
is
By (a) and (b), the process
Dn
be the states
M(Yl,Y2) > M(~,a) + 1;
remains the same, M(Yl,Y2)
(a) happens
D =
by the Markov process
nor
Given
as
then ,at least
fr, a,
is the number of unmatched cards.
=
J.
position
and shuffle each deck
J
coupling:
otherwise the number of matches
(b)
to
and
C
if neither the cards labelled
(a)
precise result
.
position
pick
as:
to
Dn
is
stochastically dominated
~0,1,...,N} with transition matrix
(ilN)2 ;
P(i,i)
=
is at most the first passage time
T* of
D~
from
So
(N/i)2 CN2 ,
i=1
and
(3.13) gives
To
say).
the upper bound in
get the lower bound, suppose
Let
L.
(4.14).
we
start with
a new
be the number of shuffles needed until
deck
the j
(state ~D,
last card has
265
been
By the Coupon-Collectors Lemma, recalling that
picked.
picked
P(L_>aN1og(N))-~1 ; a~.
.
be the set of states
A. J
if
and the
Lemma
Card-Matching
This establishes the lower bound in
(b)
(a)
give
--~
1 ;
~- .
a
.
(4.14).
This shuffle also is reversible.
For this
example the lower bound (3.9) obtained
considerations is
(4.17)
EXAMPLE.
T >
We
"Uniform riffle".
into two roughly equal piles,
7T
piles into
one.
from entropy
CN.
model the riffle shuffle,
want to
now
which is the way card-players actually shuffle cards:
two
GN
on
>j)
d(aN 1og(N))
REMARKS.
Then
- > j.
where X is uniform
P(XnEAj) P(L. J > n) -
->
(4.16)
#{i:
for which
-n-
So
d(n) >
and
are
each shuffle,
on
(4.16)
Let
two cards
taking
If the top
pile
one
by cutting the deck
pile in each hand, and merging the
has
cards, this gives
L
a
permutation
such that
(4.18)
Call
a
shuffle
where
shuffle
~r(2)
shuffle
can
b(j)
came
satisfying (4.18)
alternatively
(resp. 1)
0
=
from the top
~r(1)
=
7T(i)
=
7r(L+1)
=
~r(L+2)
and
...
for
L
some
be described
by
a
a
...
~(N) .
riffle shuffle.
0-1 valued sequence
indicates that the card at position j
(resp. bottom) pile:
formally,
min{j: b(j) =0}
b(j) =0} ,
i
L
=
#{j: b(j) =0}
min{j: b(j)=l}
L+11
i N .
Such
a
(b(1),...,b(N)),
after the
266
random riffle shuffle
To model
a
the set
R
description, this
terms of the second
P(B(i) = 1)
This process has been
(1982)),
Diaconis
probability
some
The easiest way is to take
of riffles.
independent,
specify
we
P(B(i) = 0) = 1 2.
=
whose
technique
shall
we
3 2log2N ,
(4.19)
on
p
In
R.
on
(B(1),...,B(N))
to be
Call this the uniform riffle.
by Reeds (1982) (see also
in detail
investigated
uniform
p
take
means we
measure
use to
0
prove
1
e
.
In actual riffle shuffles, successive cards tend to come from alternate
piles:
(1982), Epstein (1977)
Diaconis
see
(B(i),
model would be to take
P(0,1)
matrix
P(1,0)
=
e,
=
A
for discussion.
more
with transition
~i~N) to be Markov,
say (Epstein suggests e
1
8/9).
=
The only
result known for this model is the lower bound given by entropy
fixed
(3.9):
for
e,
N--~~ ,
as
&#x26;(e) =
where
But the
argument
general
6,
C03B8log2N
shall
we
write
example
(B1(c):
cards with
another
0
pile
of the second
written
on
on
as
pile.
(1,0),
followed by
reverse
shuffles let
on
does not extend
be described
Bl(c),
one
analyse
as
follows.
where
of the
pile consisting
B2(c);
this will
(Bl,B2) _
=
reverse
produce
(0,0),
followed by
(0,1),
(c)
them; and place the first pile
doing this
now
D
can
the number
c
before; form
Imagine
top which have
(8 = ~) 2
them, in their original order, thereby leaving
of cards with 1 written
independent numbers
cards
independent
are
fixed) .
for which it is easier to
This reversed shuffle
the card labelled
on
e
conjecture
reasonable upper bound is known.
no
the time-reversed process.
c
(e,
for the uniform riffle
use
for which
N-+oo
as
The uniform riffle is another
For each
It is natural to
1og28 -(1-e)log2(1-6).
-e
T(e) -
to
realistic
a
deck with
followed by
(1,1).
n 2m-1B (C),
again
shuffle
a
a
and then
with
sequence of
sequence with
Continuing,
on
after
n
top
267
(D (c):
the random variables
(4.20a)
{0, ... ,2n-1 }
on
>
the order of the deck is such that
(4.20b)
with identical values of
We shall
d(n).
distance
this
now use
(Bm(c)).
Let
distinct.
d(n)
But the
1 - P(Fn).
which
We turn
the number of
now
for
to the lower bound.
T n
using
the
same
are
by (b).
Fn,
on
P(Fn)-~11
for
2
log2N.
a
deck in state
Tr
So
when
d(a
Hence
For
crude upper bound.
a
(Dn(c):
Lemma shows that
N/(2n)1/2~0,
way that
the total variation
on
shuffle to each
satisfies
T
and cards
relative order.
original
get bounds
to
the crude upper bound
gives
in their
are
reverse
Birthday
increasing,
coupling argument
a
coupling time
Then the
a
apply the
is
Dn
be the event that the numbers
Fn
in such
Dn
description
We first present
Consider two decks, and
independent, uniform
are
2,
a >
let
be
adjacent pairs of cards with increasing labels:
e(-rr) = #{j:
where
first X
odd)}
is the indicator function of
a. J
uniform
are
independent,
(4.21)
Now
,
imagine starting
leaving
T(e) N > log2N.
(4.22)
UN
=
and
Ee(X)
with
X.
However,
Let
a
Since
D
From this and
be
some
reverse
2n
has at most
(4.21)
we can
~I 1 2-a , ,
shuffles,
distinct values,
immediately get
delicate analysis will improve this
more
independent, uniform
i n},
n
If
of the
on
var(UM) -
Birthday
~1,...,M}.
M -
then
EUM -
(resp.
6(X) N/2 .
straightforward variation
(Ci))
#{n N : Cn = C; for
var
deck, and performing
slightly
a
even
easily get
(N-1)/2 ;
a new
e(Xn) > N - 2n.
We first quote
LEMMA.
we
=
the deck in state
(b) implies
bound.
Then the random variables
GN.
on
Consider
~I 1 2-a
.
Na for
Lemma.
Let
some
a >
1
268
Let
03B1 > 3/2
Recall
and
j
Xn
then EVN ~ 0.
is the state of the deck after
(random)
be the
Jn
set of
after the shuffles have the
j+1
Then, conditional
(i)
on
same
value of
Let
positions
D:
Jn,
we can
even
(resp. odd)} are independent.
calculate
(N-1)/2 + ~#J ;
(4.23)
by (a) the distribution of
in Lemma 4.22, for
M
=
2n.
#Jn
is the
Lemma 4.22
same as
the distribution of
UN
So, putting
for
some’!a?.
gives
~ S=2-a1 .
*
.
using (4.23)
(4.24)
E6(X ) _
(N-1)/2 + v Nl/2 ;
Chebyshev’s inequality applied
to
(4.21)
where
and
var
so
d (n) 1 ,
We shall
2n > Na.
now
Let
-~
-11
(4.19).
return to the upper bound.
Fix a
X
N/2 .
0
the lower bound in
giving
8(X )
(4.24) gives
> (N-1)/2+~v Nl/2)
p(e(X ) > (N-1 )/2 +~ N1/2)
p(6(X)
so
shuffles.
D n (X n 1(j))=D n (X n 1(j+1))} .
the random variables
From this
and
reverse
M -- Na. for
;
(n)
So
n
for which the cards at
positions j
Jn = {j:
Now
and
If
some
> 3, 2
n
be the state of the deck, described at
=
1
+[a
(4.20),
log2N],
after
269
n
shuffles
reverse
(D~(c)
1 c N)
=
with
starting
define
a
random
the numbers
Dn(c)
they become
(2,2,8,15,15,15),
{4,5,6}.
partition A.
Denote
a
(15,2,8,15,15,2),
are
the set of
Using
r
then when put in
elements in the
Thus if
Dn-
increasing order,
{1,2}, {3},
partition
and let
{A-.,A?,...},
=
values of
common
be the
lAir
partition A.
Let
be
P
partitions consisting only of singletons and consecutive pairs.
Lemma 4.22
E~A~ 2~ NZ°~’
(4.25)
(4.26)
1
by conditioning
as
as
the set of distinct values taken
on
by
(Dn(c):
1 ~c N),
obtain
(4.27)
for AEP the
Now for
A*
of the shuffled deck into
and this defines the
partition by A
number of sets with exactly
we
The random variables
consisting of the positions of cards with
sets
And
deck.
a new
let
m > 0
probability P(Ar= A)
Wl,...,Wm
be i.i.d. uniform
be the collection of sets
disjoint,
extend A*m
{1,...,N}
as
distributed
to
a
singletons.
uniformly
over
depends only
{1,...,N-1}, and
on
1 j m.
on
If these sets
let
are
partition by including the remaining elements
A~
Given that
the
is such
partitions A ~
a
partition, it is plainly
with
P
of
=
So
m.
by
(4.27)
(4.28)
A E P)
=
t
Now let
~r
be
a
state of the
number of successive pairs with
A
=
Fix
{Al,A2,...}
y, S
(4.29)
such that
LEMMA.
P(A~ is
y
>>2 > S
increasing
some
>
a
partition)
A
E
P .
P(A,m = A) ,
deck, and
is consistent with
is
~r
2-a,
before let
as
labels.
if
y+S
1
is
increasing
~r)
be the
partition
a
1.
partition consistent
(1 2)m{1 - 03C8(03B8(03C0) - N/2 N03B3,m N03B2,N)}
Say
e(7r)
>
on
each
A..
J
270
where
Given that the
PROOF.
7T’
that
for
M.
from the
is
increasing
>
"
some
can
are
disjoint and
e(’rr)-3(i-1)
at least
and
on
choices
{W..M.+1}
disjoint
So
partition
product
tends to 1
where
n {03B8(03C0)-3(i-1) N-1}
=
x
03B8(03C0) - N/2.
N~
x2014~0,
as
express the distribution of
using description (4.20),
ir) >
consistent with
(1" 2)m1=1 {1 + 2xN03B3-1 - 6iN-1} ,
Calculus shows the
~
increasing
previous pairs.
is
are
each, there
on
’IT-1
which have
P(A*
Me
pairs
X
by conditioning
n
the
on
partition
as
N.P(X,=.)=~=A)(2!) !AL ’(3.) )AL ’"~(A consistent with.)
!A!.
1(A consistent with ’IT)
A~P)
=P(A~ 03AP(|A2=m|a~P)2m03AP(=A| 2=m,
.
(4.30)
~ P( ) |A 2 = m ~P)(1-03C8 B (03C )-N/2 03B ,m N 2 )
by (4.28) and
Lemma 4.29.
By (.4.21)
such that
we can
we can
such that the set
find
satisfies
(4.26)
to
So the total variation distance
(4.19).
’IT)
X..+(1
of states
(4.30)
d(n)
’IT
#FN/N! ~ 1.
we
Applying
obtain
where
N!P(X=-rr)~1-B.,
d(n) ~
F..
such that
find
these observations and
tion satisfies
consistent with
,
|03B8(03C0) - N/2 N03B3| ~ ~N
By (4.25)
(A
between
-M~-)2014~0,
X
n
~N~~’
and the uniform distribu-
establishing the
upper bound in
271
(4.31)
"Overhand
EXAMPLE.
for which
no
upper bound for
good
the deck is divided into
reversed.
To make
represent the
P(Vi = 1)
=
0 ;
J1 Then
and let
1/K,
Ji1
=
represents the
B.
V~
only result known is
=
min{j
ith
7T(j) =
The
VN
=
1.
right
they
are
inequality T - >
CK.
0
be
independent,
Ji j Ji+1 ~ .
j
proof
we
shall
by
are
at most
we measure
and
Then
n.
N,
its
is
Yn
K
=
N2/3,
One would like to
CKn1/2,
means.
Note that for
our
of
Yn
position from
21/2K.
CN2/3.
On each shuffle,
and this leads to the
2/K,
is
T >
a
particular card
the top for
Markov process
a
a
It
even
n
~1,...,N}
on
random walk whose increments
can
be shown that
and this leads to the other
Y
has
n
inequality.
conjecture that for any "reasonable" way of shuffling
is at most polynomial in
T
for which
initially adjacent.
approximately
and standard deviation
small compared to
constant C .
some
Second, consider the motion
cards,
5.
which
separated is
standard deviation at most
.REMARK.
{j:
lower bound, whose
following
two cards
which, away from 1
mean
N)
(Vi:
block, and the random shuffle is:
max(K,(N/K) ) ;
and from the bottom for odd
have
Let
(N-Ji+1 ) + (j-Ji ) ~I
the
shuffles, where
n
is where
shuffling
parameter which will
a
Ji-1: Vj =1} ;
>
side is minimized
First, consider
after
random shuffle
-
T > C
the chance
a
Let
merely indicate.
Note that the
Overhand
be
K N/2
2
of the blocks.
length
mean
is known.
T
of
example
an
number of blocks, and the order of the blocks is
a
model, let
a
Here is
shuffle".
But it is not clear what "reasonable"
N.
applications
to
hitting times,
we
only
need
T
N!
Rapidly mixing Markov chains
In this section
informally
the
we
mention
a
few Markov chain
"rapid mixing" property.
examples, and discuss
272
(5.1)
EXAMPLE.
"Ehrenfest
which is the Markov process
-
Think of
~0,1,...,N} with transition
on
Yt
Q(i,i+1)
the other box;
on
Yt
Now
we can
uniformly
a
Yt
(Example 3.19),
the N-dimensional cube
a
where
as
and
rates
Poisson
(rate 1)
at random and transferred to
describes the number of balls in
represent
version,
i/N .
=
balls distributed among two boxes, with
N
t.
Q(i,i-1)
1 - ilN ,
=
process of selections of balls chosen
time
We discuss the continuous-time
model".
urn
particular box
at
is the random walk
Xt
In fact,
f(i1,...,iN) =
the random walk describes the process of balls in boxes where the balls
labelled
in box
1,...,N,
i
Y
(il,...,iN)
indicates that ball
that the
stationary distribution
is
r
.
representation
for
X,
we see
(N,~-)
is the Binomial
Y
as
=
r
From this
for
i
and state
are
distribution.
d(t)
And
is the
-n-
for
same
so
T(~) ~
(5.2)
~I 4
log(N)
by (3.23).
(5.3)
EXAMPLE.
"Random subsets".
the set of all subsets
subset
B
{1,2,...,N} with
of
=
~
=0
;
stationary distribution is uniform
construct
(5.5)
a
T
coupling argument similar
CN
EXAMPLE.
= H)
#B
Formally, consider the B-valued process
Q(B,B’)
(5.4)
N > 3,
N,
=
and let
Consider
M.
be
B
a
random
evolving by elements being deleted and replaced by outside
B
elements.
The
M
1
Let
=
1/2,
log(1 + min(M,N-M))
in
"Sequences
1 =T)
=
=
with transition rates
M-11
other B’ ~ B.
on
B.
The reader may like to
Example 3.19
to that of
as
coin-tossing".
1/2,
Xt
for
Let
some
(~i)
representing repeated
to show
C.
constant
be
independent,
tosses of
a
fair
273
chain
the process
N >1
For fixed
coin.
Xn
d(n)
(5.6)
1 -
=
(~)N n ,
’
0 n
,
n ~ N.
=0
(5.7)
"Random walk in
EXAMPLE.
the random walk
N
on
by boundaries.
a
let
G
Markov
N
We want to consider
d-dimensional box".
the d-dimensional
Formally,
a
stationary distribution is uniform and
For this chain the
on
(~n+1’’.’’~n+N) is
=
restricted to
integers
{i
=
=
a
(il,...,id) : 0 ir
box of side
N}
and consider
the Markov chain with transition matrix
P(i,j)
=
1/(2d+1)
for
=
0
for
1;
other j ~ i;
P(i,i) = 1 - ~ P(i,j) .
.# .
(We
use
1/(2d+1)
instead of
stationary distribution
1/2d
to avoid
is uniform, and
using
periodicity problems.)
The
the Central Limit Theorem
we
see
(5.8)
(5.9)
EXAMPLE.
cube obtained
by choosing
of the
one
We
now
introduce
time random walk,
a
possible rotations
to estimate
to "solve"
could be used to construct
d.
One may consider the random walk
of the 27
interesting
algorithms
for fixed
as
cube".
It would be
step.
one
CdN2
T -
(i.e.
coupling.
informally
the
Rubic’s
at random at each
for this random walk.
T
reach
on
a
specific
But this
seems
state
of)
Perhaps
the cube
difficult.
"rapid mixing" proeprty.
For
a
discrete-
this is the property
(5.10)
T
is small compared to
#G .
The intuitive idea here is that the distribution of the chain
approaches
stationarity
while
only
a
small proportion of states have been visited.
the general discrete-time chain,
stationary distribution,
and
so
we measure
"proportion
formulate the rapid
of states"
mixing property
using
as
For
the
274
(5.11)
is small
T
For continuous-time processes
transitions
general Markov
In the
(5.12)
(2.7)
(5.12)
same as
is at most
T
in the
in
polynomial
d
d-dimensional box for
to
case we
For
1
=
N.
An
exception is
N/(1-a):
to
of
sequel
to
a
=
Q(i,i+1)
1 ;
A
a~(1-~)/(1-~N+1).
=
the passage time from
approaching
curious paradox:
three
so
possible explanation
or more
we
#G
=
N!
that the familiar
rapid ’mixing
on
N
to
Very roughly,
must
which is of order
at
N
must pass
the stationary distribution.
rapid mixing property, which
the
is that
recall that
dimensions.
T
0,
get approximations for hitting times,
transience" property, and
only in
1,
1 ;
= a
complicated high-dimensional processes rather
processes.
where
queue process
server
put it another way, the process starting
We thus have
in the
as
most states before
through
=
rates
and stationary distribution
order
see
do not have the
instance, consider the single
Q(iJ-1)
same
q.1
the random walk in the
Indeed, it is easy to
2.
or
{0,’!,...,N}, with transition
be of the
normalize to make
card-shuffling examples,
examples of l-dimensional Markov processes
property.
i.
state
(5.10).
particularly noticeable
It is
leaving
examples mentioned have this rapid mixing property.
Almost all the
but
compared
that in the random walk
is the
is the rate of
rapid mixing property becomes
the
is small
T
Recall
then
case
must take into account the rate at which
we
L Q(i,j)
q.’ = j1i
Recall
occur.
to
compared
than
seems
This analogy is
characteristic
simple one-dimensional
rapid mixing
mean zero
we use
is
a
kind of "local
random walks
pursued
a
are
transient
little in the
next section.
6.
The
mean
occupation function
In this section
we
discuss the
mean
occupation function
Ri(t),
which
275
plays
a
a
For
major role in the behaviour of rapidly mixing Markov processes.
and
(X )
Markov process
a
state
i
define
t
R~(t) = 0 p..(s)ds
(6.1)
time
X
In the next
a
before time
1
at state
spends
paragraph
we
describe
amount of
indicating the
is the random variable
time(s t:
where
t.
the rest of the section contains lemmas
rapidly mixing process:
in
Ri(t)
informally the behaviour of
formalizing
these assertions.
That is,
least
initially tends
R.(t)
the
1/q.,
roughly like
looks
R.(t)
The function
length
mean
of the initial
starts to level off, and remains
t
large compared
define
a
to
parameter
Interpret
Ri
as
T
R;
the
essentially
mean
the
to
probability
length
i
at
think of
R.
as
sojourn in
i
of time
i.
approximately
(6.2).
It then
constant over the interval of
(6.5).
(mean
Ri(t)
on
So
we can
this interval.
number of visits, in the
in the short term.
transience, and then
of return to
value which must be at
value of
approximate
recall that in the infinite state space
equivalent
a
but small compared to
as
case) spent
random walk
to increase to
For another
interpretation,
setting the condition
=
1/qi(1-pi),
where
Analogously, in the rapid mixing
where
p*
is the
is
p.
is the
case we
may
probability
276
i
of return to
0
to
then
We
is close to
Ri
and
Ri
case
in the short term
Ri(t)
are
(6.17).
particular, if
quantities R and R(t)
First,
t0 e-qis ds
(6.2)Ri(t)~
not
dependent
- > e-qis,
Pi’
q-1i(1-e-qit)
=
is close
note that in the random walk
Finally,
1/q..
start the formalities.
now
In
i.
on
so
.
d(s)
Second, by integrating the inequality
we
get, for
ti i t2’
~ Ri (t2) - Ri (tl ) - (t2-tl )~r(i ) ~I
(6.3)
d(s)ds
2. T
exp(1-tl/T)
by (3.8).
So the limit
(6.4)
exists and is finite.
This
treatment of Markov process
lim
=
R.~
t-+oo’
quantity
theory;
become clear in the next section.
knowing
p..(t),
by (6.3)
we see
occurs
in the traditional
one reason
To
for its
compute
which is rarely available
Ri
analytic
significance
will
directly would require
explicitly
in
practice.
But
(6.5)
large compared
If
t
is
if
t
is small
small
compared
fying both
informal
compared
to
to
T
then the second term
to
i
in the
1/q.;
of
R.1
general;
(6.7)
on
the
find
right is
t
satis-
R..
This is the
given earlier.
Specifically,
from
(6.5)
for
T(1 - log
and in the random walk
~R- R(T*)I
case we can
right is small;
approximates
R.(t)
(6.6)
in
the
then the first term
rapidly mixing
these conditions and then
description
on
case
2T*/#G ;
for
T*
=
T(1
+
log #G) .
we
get
277
In Section 8
shall
we
examples where
see
Our informal discussion earlier
R.
this idea.
To state such
~(x) -~ 0
if
asserted to be
does not
result
we
on
the
x --~0,
as
vanishing
depend
a
introduce
in the next section.
extensively
vanishing
that for
suggested
should not be much smaller than
processes,
used
is estimated in this way.
R
is
Call
and
a
Lemma 6.8 formalizes
1/q..
a
rapidly mixing
notational device, to be
~(x) > 0,
function
adopt the convention that
x >
0
function
a
"universal" function, that is to say the function
a
particular
process under consideration.
The symbol
~ will denote different functions in different assertions.
(6.8)
LEMMA.
Specializing
Ri
+ q.T))},
R ~
Results like this could
sequences of processes.
X
some
vanishing &#x26;.
to the random walk case,
(6.9)
Let
for
1 -V~(#Glog(1+T)) .
equivalently
For
be processes
on
be formulated
instance,
as
Lemma 6.8 is
state spaces
G;
limit theorems for
equivalent
let
in
E
to:
Gn;
suppose
q~
then
Rnin
>
.
Both formulations have the
If
7r(i)
same
is small compared to
is not much less than
The formulation
interpretation:
1/q.r log(1+qiT)
then
R.
1/q..
involving vanishing
functions
seems
to convey this idea more
directly.
PROOF OF LEMMA 6.8.
By (6.2) and (6.5)
(6.10)
We want to evaluate this at
a
time
t0
which is
large compared
to
T
log(1+q.T)
278
but small
To do so, define
to
compared
(6.11)
a
=
log(1+qi03C4)
t0 = 03B1-1/203C4
Note that
~ e-qit - 1 2
d(t)
definition of
= 03B1 /2/qi03C0(i) .
by assumption (2.8),
so
the
gives
T
= -log(1 2 + 1 2e)
(6.12)
Evaluating (6.10)
t
at
=
>
0 .
to’
(6.13)
.
Now each of the functions
1og(1+c))
03B11/2
(6.14)
sup
y>0
is
a
vanishing
function of
We remark that for
and the result follows.
a,
non-rapidly mixing processes
there is the weaker
lower bound
Ri ~ 1 2qi
(6.15)
which cannot be
processes,
Ri
improved:
see
Of
Section 7.
course
(6.15)
reversible process the function
is
Keilson
(1979)).
non-rapidly mixing
can
be
improved.
So
=
~
(6.16)
t0 {e 1
-03C0(i)}ds ,
In
a
decreasing (in fact, completely
t
R.i
We
meaning described earlier.
does not have the intuitive
also remark that for reversible processes
monotone:
for
where
e 1
(Xt
=
Tr(i)
reversible).
279
Let
F.,
be the distribution function of the first return to
F. (t)
(6.17))
and
T*
=
Pi(T+i ~ t) .
=
~Riqi(1 - Fi(T )) - 1~ ~(1_F. T* ))
LEMMA.
rapidly mixing
1/qi(1-Fi(T*)),
discussed informally earlier.
(6.18)
a
as
preliminary,
LEMMA,
Let
X -
processes
i.
we can
approximate
Ri
by
need
we
qiRi(t) ~ 1 1-Fi(t)
(a)
(b)
PROOF.
~here ~ is vanishing
T(1 -log
In other words, for
As
i:
Let
[U ,V )
(1-e n)
nth
be the
1
1-{Fi(t)}n+1 1-Fi(t),
n _ > 1.
sojourn interval
at
i.
Then
~t)
= 03A3 q-1iP(Un~t)
~ q-1i 03A3
P(Um-Um-1~t; 1m~n)
qi1n>1L
- q lfl - F.(t)} _ l,
=
~
To prove
giving (a).
(b),
n+1
~
where
Z
has exponential
E(Z An) L
~ t, 1 r m)
(1) distribution,
=
( 1_e_n ) ~ n+1 {Fi(t)} m_1
m=1
PROOF OF LEMMA 6.17.
By Lemma 6.18(a) and (6.6),
1
.
,
giving (b).
280
giving
one
and let
side of the
n
be the
inequality.
integer part
For the
other, write
1
=
a
of
a 1/2(1-F~ (’r*) ) 1 - a~(q~d))-’’ .
.
Note
n >
Lemma
6.18(b) gives
for
vanishing
some
(1-e n)(1 -
(1 -
(6.19)
>
1 ()
{~(a)}’1
the fact that
using
t - n(T* + 1/q.),
Setting
by (6.5)
61 + 82,
81
82
for
vanishing ~.
some
Finally,
say, where
=
a
=
exp(-nT*/T)
q.T*e
°
which with
(6.19)
Lemma 6.17
establishes the lower bound in Lemma 6.17.
implies
that if the process started at
return in the short term, then
lemmas in this section
q*
=
LEMMA.
max
J~i
PROOF.
q.R.
i
rates into
1 +,~(a),
Set
where
t - a 1/2 Tlog(1+q.T),
i
a
to
Our final two
The first is
from other states
are
all small.
and
=
and
so
is at most
q*,
we
have
6.18(a),
1
(1_al/2)-1
)
(1_q*t2)-11 ~ (l-a
q.R.(t
i
i2
) (l-q
And
1/q..
unlikely
q ~’
0 ..1
Since the rate of return to
Lemma
should be about
is
upper bounds in this situation.
give
applicable if the transition
(6.20)
Ri
i
by (6.5)
W a ..
i 1 ++ 03C8(03B1)
~
q*t. By
281
t2/T)
qiT
_ al/2
+
~,(a) ,
The final lemma is applicable when there is
such that
(6.21)
LEMMA.
Suppose
tends to increase away from
f(Xt,i)
f
Let
be
a
distance function
is a constant such that for each
c
c a
k:
0
Fix
i.
Xo
on
=
G.
f
i.
Let
0
1.
s
j # i,
Ik~j
Then
PROOF.
distance function
a
(1-s)/c.
t
Consider the process
f(X(tAT.),i)
°
’
The definition of
c
ensures
that
But
s.
This
Hence
7.
F.(t)
Hitting
Mean
s+ct,
is
Yt
a
supermartingale.
f(X(t^T.),i) = 0
and the result follows from Lemma
so
6.18(a).
times
hitting
times
tractable results in all
matrix results; Kemperman
techniques.
EiTj,
processes.
approximation which
We first
cases.
and
more
generally hitting distributions,
give
single
seems
single
an
method which
array of classical
give approximations which
Keilson
(1979) gives
applicable
yields
a
are
analytic
applicable
to
different style of
to different classes of processes.
two well-known exact
state from the
no
Kemeny and Snell (1959) give elementary
(1961) presents
Our purpose is to
rapidly mixing
hitting
{T~ t},
for j ~ i,
ies
have been studied for many years, but there is
a
on
So
results, which
concern
the
stationary initial distribution.
case
of
282
PROPOSITION.
(7.1 )
In the random walk case,
(7.2)
R#G.
=
PROPOSITION.
Proposition
7.1 is useful because it shows
estimating
Ri..
in
may be hard.
practice
First,
renewal theory.
if you
E(money paid
(7.3)
Z(t)
be
an
interval)/E(duration
per
LEMMA.
(Vn,Wn),
Let
n
lim
=
(b) Suppose
In case
=
be
l,
> 1,
w/v
are
Z(~ 1 Vi) _ ~ 1 Wi’
EW2n
EVZn
sup
~,
>
~,
EW1 =
v,
w,
then
and there exist constants
w
v,
(a),
the strong law of
Then
lim inf
for aZZ n,
>
-
n-103A3Vi ~ v ,
Wn
large numbers says that
w/v
a.s.
Vn
v ,
(b)
In case
the result follows
PROOF OF PROPOSITION J.1.
we can use
Theorem
inf
Wn
>w ,
>
0,
let
the strong law for
3.3.1)
to show that a.s.
~
0 ,
easily.
Fix
U1 -
,lim
a.s.
Vn+1-Vn ~ 0 ,
=
square-integrable martingales (Stout (1974)
again
=
-
and the result follows easily.
and
EVl
a.s.
Q(Vm ,Wm; m n).
lim sup
Let
n
i.i.d. and
n
~ -
random variables.
positive
n
such that
w
where
Vn
sup
~
interval).
of
process such that
(a) If (Vn,Wn),
PROOF.
n >
~ ~
increasing
v,
Informally,
lemma about reward renewal processes.
income per unit time should be
long-term average
your
Fi(y)
give "probabilistic" proofs, quoting
We shall
a
estimating
random amounts of money after random time intervals, then
paid
are
is less useful, because
Proposition 7.2
by
estimate
we can
i,
tl
p(.)
min{t:
i}
.
=
1 E .)
and let
283
Yn
Let
be the block of
X
the interval
over
YS X~ +S ,
0
=
(Yn),
The blocks
1,
n >
are
i.i.d.
=
Un+1 _ Un
Wn
=
t i me ( s :
=
time(s:
Z(t)
s
So
Vn
that is,
Ul
°
6.3(a)
to
i)
Un+1,
t, Xs
s
Lemma
apply
we can
=
i)
and the lemma shows
(7.4)
lim
Now
EV1 -
into
(7.4)
EV1/EW1
=
t1Z(t) _
R.(t ),
and
a.s.
03C0(i).
Substituting
rearranging,
(7.5)>
EpTi - {Ri (tl ) - 03C0(i)t1 }/03C0(i)
.
Letti ng
tl ~
~,
we
have
~ 0,
so
E03C1Ti
and the resul t
~ E03C0Ti,
follows.
PROOF OF PROPOSITION 7.2.
Let
SY(t)
Y(t)
Then
Y(t)
has distribution
~ P (T. E . )
But
=
time of
i.
Let
nth
return to i
min{Sn-t: Sn > t} .
(TiE .), where
the
are
E.),
epochs
of
a
and for such
a
process
Y(t) -~-~ Y ,
P(YEdy)
=
0,
pt
=
E .).
So
renewal process with inter-renewal
have
where
SO
as
d i s tributi on
we
=
XO
=
Pi(T>
The result follows from
We can deduce
a
(2.10).
useful lower bound.
(Karlin
and Taylor
(1915))
284
(7.6)
COROLLARY.
Fix
PROOF.
which have
.
Consider the class
0.
c >
a
decreasing density f(t)
mean
is
every distribution in
C
C
minimal
with
~(2q~(i))~ .
1
of distributions
C
f(0)
with
=
The distribution in
c.
plainly the distribution uniform
has
mean
(2c)" .
at least
[0,oo)
on
[O,c 1].
on
The result
So
follows
now
Proposition 7.2.
from
In view of
the
Proposition 7.1,
(7.7)
Corollary is equivalent
to
Ri > 1/2qi .
Inequalities (7.6)
cannot be
improved,
cyclic motion Q(0,1)
consider the
case:
(7.7)
and
- l.
Of course, in the
1/qi
by Lemma 6.8.
We now start the
rapidly mixing
Q(1,2) =
=
R.
case
approximation results.
processes the exact value
from the stationary distribution is
even
i
the
mean
(7.8)
hitting
time from
PROPOSITION.
an
i
Q(N,0)
approximate
mean
hitting time
upper bound for
initial distribution.
arbitrary
For any state
=
The first says that for
of the
an
Q(N-1,N)
=
...
is essentially at least
rapidly mixing
on
for the random walk
and any initial distribution
v,
where ~ is vanishing.
In the random walk case, this says
T
is small
compared to #G
other state cannot be much
We need the
(7.9)
LEMMA.
Fix
following
more
First recall1
than
pi =
t
mean
+ ~(T/#G)}.
hitting
time
R#G.
lemma.
t,
max
PROOF.
then the
i
+ max
Then
on a
In
words, when
state from any
285
(7.10)
max
J EjTA.
So
(7.11 )>
But
obviously
max
EiTA
max
by (7.10).
max
max
Substituting into (7.11) gives
A
~r
the second
so
Rearranging,
A -
i
i
inequality),
the first
(giving
EiTA t + ETA
PROOF OF PROPOSITION 7.8.
d ( t)
+
E03C0TA
E Pi TA
inequality.
By Lemma 7.9,
(1-d(t))
1
(1
t
+
>
0 .
I
So
by Proposition 7.1 and Corollary 7.6,
t
,
the
Evaluating
right
we see
side at
(3.8)
from
Consider for fixed
Proposition
t
i
that the
how the
right
mean
to
hitting
imply
that
It is
straightforward
EjTi
must be
state the random walk
COROLLARY.
E.T.
1
times
vary with
hitting
is not much
approximately equal
0 .
small compared to
T,
side is at most
7.1 says that the 03C0-average of these
Proposition 7.8 says that.each
( 7.12 )
large compared
>
more
j.
times is
than
to
these
for
to formalize and prove such a result:
j.
let
us
just
case.
~’here is
a
vanishing function ~
such that
for random
walks
E.T.
#ljt
So
rapidly mixing
over
is
most
j.
It
I
> e}
e#G ,
for
processes have the property that
can
actually equivalent
~
=
V~(T/#G) .
is almost constant,
be shown that for reversible processes this property
to
rapid mixing,
see
Aldous
(1982a).
286
Of
cannot expect to have
course one
for aZZ
j,
is
quickly.
consider the time to hit subsets of states, rather than
now
Here even
states.
i
to hit
likely
the
bounds
on
(7.13)
PROPOSITION.
mean
approximations
time to hit
Suppose
a
are
1.
let
hard to find:
subset from the
qi ~
to
such that the
since there will often be states j
process started at j
We
approximately equal
EjTi
us
give
single
some
lower
stationary initial distribution.
Then
(a)
(b)
PROOF.
E
TA
>
-
min
~EA
(a) By (2.5)
n)
There,
N
~r
R..~A 1 -
log(1+T) )},
i
it suffices to prove this for
EA) _ ~r(A),
and
so
where
a
~
is
vanishing.
discrete-time chain.
n)
Let
Then
=
N
’- ~
n=1
N
>
~(1 -n~r(A))
1
=
-
N - 1 2N( +1
)03C0(A)
1 03C0(A)- 1 - 1 2 1 03C0(A) (1 03C0(A)
+ 1)03C0(A)
giving (a).
The
7.1.
proof of (b) is similar
Analogously
to the latter
to the
proofs
proof, fix
tl
of Lemma 6.8 and
and set
Ul - min{t: Xt ~ A}
Un mi n {t Un-1+tl : Xt E A}
vn - Un+1 - Un
Wn time{s: Un s Un+1, Xs E~A}
Z(t) time{s: Ul s t, Xs EA}
Fn o( Xs : s Un+1 )
=
.
>
=
=
=
pi(.)
=
Pi(Xt E.)
.
.
Proposition
287
Then
max
Also
max
n
Wn ~ t1; and
EV2n
So we
oo.
max
2
Estimating
v(t,)
because the state space is finite,
can apply Lemma 7.3(b)
7r(A)
by
=
Ep.TA
so
to obtain
1im
Lemma 7.9 and
rearranging,
(7.14)
We want to evaluate this at
a
small compared to
So set
a
=
time
t,
large compared
w
T
log(1+T)
but
ir(A)T log(1+T)
t1 = a-1/2T log(l+T)
Then, setting
to
=
a1/2/n(A) .
min
=
R~,
iEA
I
+
by (6.5)
~
and since
w
2>>
by (7.7),
~w(tw )- 1~ ~{a) .
(7.15)
Also
=
a 1/2 /w
(7.16)
And
by (3.8),
Putting this, (7.16)
and
(7.15)
into
(7.14}
gives the result.
In the
rapidly mixing
random walk case,
approximate lower bound of R#G/#A for
Proposition 7.13 gives
E7TTA.
If the subset
A
is
an
"sparse"
288
in the
that, starting at
sense
one
to hit any different element of
is
approximately
the
techniques
check in
(7.17)
of
shall
we
PROPOSITION.
merely
There is
now
one
in the
of
estimating the
proved by
hard to
are
such that
for
random
+
case, the
rapidly mixing
hitting
is
times
hitting
for
mean
could say little about the distribution.
stationary initial distribution
is
#A
discuss the distribution of
sight, the difficulty
that
be
can
+2log #G)).
max
We shall
unlikely
state one form of this idea.
~ 03C8(
-
=
result
a
vanishing function 03C8
a
#A
bA
Such
but since the conditions
Proposition 7.13:
practice,
the random walk is
A,
in the short term, then this lower bound
A
the correct value of
walks
where
element of
At first
TA.
A
general
suggests
But it turns out that,
time distribution
A
on
from
a
approximately exponential, provided 7r(A)
sufficiently small.
(7.18)
PROPOSITION.
There is a
vanishing function ~
such that
sup
In other words, the distribution is
is
large compared
to
T.
approximately exponential provided
In the random walk case, it is sufficient
Proposition 7.13 that #A be small compared
informal definition of
random walk the
"rapid mixing" (5.10)
exponential approximation
to
#G/T.
ensures
for the
In
a
time
hitting
by
particular,
that in
E7TTA
our
rapidly mixing
on a
single
state willI be valid.
Proposition 7.18 is proved
the details.
in Aldous
(1982b),
and
will not repeat
we
The main idea is that the conditional distributions
must
Bh =
(due
stay dose to
to drift away from
7r
counteracted by the
rapid mixing.
to
~r,
paths hitting
So
because the
A
tendency
being eliminated)
=
of vt
is
(T A > s)
is
289
approximately
We
now
and this makes
P,~(TA >s),
discuss the distribution of
when the initial distribution
in
Proposition 7.19 below.
the process hits
In other
words, the
distribution concentrated
to the
exponential
1
mean
p
give estimates
or
Of
( 7.19 )
a
course
if
p
are
formalized
say, that
p,
Given this does
to
will be
of
and
mixture of
a
a
mean
a
distribution close
(with weight 1-p).
is
So
is known, estimates of either
assuming
of the other.
is close to
these arguments tell
l,
us
only that
is small compared to
EvTA
is
So
probability
processes
approximately exponential,
distribution
approximately
remarks
our
(with weight p)
0
near
exponential.
rapidly mixing
(compared
is
of
TA
Pv-distribution
for
certain
a
in the short term
A
TA
arbitrary:
There is
happen, the distribution
not
is
v
be close to
TA
PROPOSITION.
For
arbitrary
vanishing function ~
A,
Zet
m
=
m
There
=
such that
sup
t>Em
where
v,
> t) - m xp(-t/m) ~I
v
m
E
-
~ _
Analogously
process, then
"most" j
the
to
(7.12),
when
is
A
a
will be close to
Pj-distribution
PROOF OF PROPOSITION 7.19.
Set
J =
of
TA
a
{j:
=
"small" subset in
for "most"
will be
T/m,
j,
rapidly mixing
a
and
so
approximately exponential.
and suppose
a
1.
Set
P.(T t ) al/3} .
°
We assert
(7.20)
Indeed, by definition of J
we
have,
for j ~
for
J,
Xnt0 = j ) 1 - al/3
290
and
so
P (min(T.,T.) >nt~) ~ (1-a~~)",
giving
t~a’~
Next
we
a~~m .
=
assert
jej,
(7.21)
For , setting
p,
=
P.(X. e.),
!P.(T.>t)-P
j
P~(TA
~
> t-t0)| ~
u
~
by definition of J;
2014
~~(a) by (3.8);
~~(a) by Proposition
=
Next, set
t
B
and let
=
tp
7.18;
c~ .
For
be the event
~t~,
~~A’~-~A~’~~~A~~J~)
by (7.20)
~(a) .
(7.22)
And for
t ~
t?,
mm
so
from
P,(T.>t)
P
(TA > t|B)
~ max
P,(T.>t-t.) ,
(7.21)
!P~(T~
>
~(a)
t
~
+
t~/m
~(a) ; t ~ t~ .
Using (7.22),
(7.23)
It remains to estimate
P(B).
First,
t~+m
max
(7.24)
../. B
by
Lemma 7.9
291
Second, note the elementary inequality
(7.25)
Now
for j
and
E J
p
0G.
=
by (7.25)
+ 03B11/3
~ d(t0)
and
so
by (7.24)
>
m(1-~V(a)) ;
j
E
J .
Using (7.24) again,
(7.26)
Now from the definition of
P(B)
.
Combining this
min
fact with
B,
EjTA ETAIB
P(B){max
(7.26),
(7.27)
Now 03A9
is covered
where
by
B~ =
B2
=
So
we
estimate the contribution to the
expectation
of
TA
made
over
sets.
EiTA)
E03BDTAlB1
E.Tj
~ 03C8(03B1)m
E03BDTAlB2
~ t2 = 03B11/4m .
by the argument
using (7.24)
for
(7.22)
these
292
these with
Combining
(7.27),
P(B),
This estimate for
substituted into
(7.23),
establishes the
Proposition.
REMARK.
By applying Proposition 7.19
after the first
jump
T~,
the distribution of
of
Precisely,
(7.28)
COROLLARY.
It
distribution
a
max
Proposition 7.2
of
a
process
mixture
comparatively small.
|
~ ~
obtain estimates of the density
one can
Let
be the
Ti
hope that
us
the ideas here will be useful in
processes other than first
rapidly mixing
merely mention
one
slightly
time taken for the process
following result, proved
in Aldous
(1983),
PROPOSITION.
There is a
studying
time
hitting
different result.
to visit every state.
Let
The
says that in the random walk
approximately R#Glog #G provided log
(7.29)
approximately
|Pi(T+i > t) - exp{-t03C0(i)/Ri} qiRi
sup
reasonable to
distributions.
is
is
i,
rapidly mixing
a
P (T. E.).
seems
properties
V
the return time to
that in
position
=
e
function of
=
we see
of the
obtain
we
Then from
V
i,
exponential distribution and
an
for
out of state
to the distribution v
T
is small compared to
vanishing function 03C8
such that
case
log #G.
for random
walks
V R#G log #G u og -
E|
8.
Hitting
Here
times
we
apply
-
1|
03C8(log(1+03C4) log #G)
Examples
the
theory of Sections
6 and 7 to the
examples described
previously.
EXAMPLE 3.19.
the
Random ùJalk
explicit formula (3.20)
on
the N-dimensional cube.
for
gives
p..(t)
1 ,1
an
In this
explicit
example,
formula for
RK(t):
293
Calculus
gives
R~)-~
Recalling
(3.23)
from
that
for
log N,
T -
1
In other
words, for large
returning
We
to its
can now
starting
state in
only
the short
EJ~ - 2~
exponential
analytically.
But
term.
7.
even
in such
says
as
example,
one
Ti/2N
of
converges to
could obtain this result
the
exponential;
Proposition 7.1
Proposition 7.18 also says that
that
(6.7)
small chance of the process
a
P03C0-distribution
In this
as
have from
we
apply the results of Section
Proposition 7.18 says that the
°
as
there is
N
t~/2~0 .
t~-~-,
for subsets
of
converges to
simple example analytic techniques do
a
such
AN
not
readily yield such results.
Donnelly (1982), in the
context of
exponential approximation with the
cases:
Ehrenfest urn model.
good,
even
are
special
Consider
of
cases
hitting
times
our
in low dimensions.
Let
us
indicate how
some
of the
general results.
iN’
on
particular
Kemperman (1961) investigates this
example in detail by analytic techniques.
results
compares the
exact distribution in several
the approximation is rather
EXAMPLE 5.1.
problem in genetics,
a
where
Me
as
assert
(8.-!)
R. ~
~N
The ’
that, starting
initially
Q(JJ+’!)
like the
=
1-c.
simple
(1-2c)~
(Xt)
as
at
random walk
the process
on
Z
This transient process has
with drift:
R(oo)
=
behaves
Q(j,j-1)
(1-2c)’B
=
c,
and it is not
294
justify (8.1).
hard to
is
Recall that 1f
the P
is
of
-distribution
Proposition
Moreover
E.T. N
says
=
say,
mN
in j
>
jN > N/2.
T..
first return time
Then
Proposition 7.19
Let
us now
Since
EN--~0.
holds also for
shows that
(8.3)
Finally, consider the
jN > N/2.
Corollary 7.28 shows
P(Y>t) =
2c,
=
(8.2)
it follows that
T+iN/mN
P(Y = 0)
where
1,.,
also holds for the process started at
7.18 says
exponential (1).
E.T.
~ i N mN(1+EN),
max
j
monotone
plainly
converges to
T.
7.8 shows
the process started at
where
We can now
N(log 2 + c log c + (1-c)lag(1-c)). Proposition
log
(8.3)
log N.
T -
(1-2c) 12N/( N )
(8.2)
and
and
Proposition 7.1
of Section 7.
apply the results
(N,~)
Binomial
Y ,
(1-2c)e t,
t > 0.
consider the card-shuffling models.
As
explained
at
(2.5),
the continuous-time theory of Section 7 extends to discrete-time random
walks.
In
card-shuffling
models it is often true that
RN --~ 1
(.8.4)
in other words when
the
new
with
a new
deck state in the short term.
7.18 show that the
N !,
mean
starting
P -distribution
of
N --~ ~ ;
as
deck
is
Let us
to
get back
to
7.1 and
with
as
assertion
now
unlikely
asymptotically exponential
In the cases of the uniform riffle shuffle
(4.13),
is
(8.4) holds, Propositions
When
Ti
one
(8.4)
(for
uniform
(for
random
prove
(8.4)
is
an
and random
transpositions
immediate consequence of Lemma 6.21, since
riffle)
transpositions)
for the
(4.17)
q*
=
2N,
03C4 ~ 3 2 1092N
q*
=
2/N2 ,
T -
"transposing neighbours" shuffle (4.10), using
295
Lemma 6.21.
Let
in decks
a.
of the deck
To
apply
#{i : ~r( i ) ~ Q( i ) }
Fix
in state
initially
Lemma 6.21
and let
fr, a
we
need
0
c,
Let
=
m
after
~r
be the number of unmatched cards
(8.5)
cannot
m
1.)
equal
Plainly m-2 Y
m+2.
matched is at least
So
we
m>2.
want to estimate the distribution of
And the number of successive
N-1-2m.
cards become unmatched.
If such
a
(2m+1)/N.
1-
s~ ;
(8.6)
s
=
~L~ . (1 -~L)s~ ; 2 ~
and
[~(Nl/3-2)] have, for
N 1/3
(8.5)
holds for
=
c
.
after
Applying
R(t)
Applying
and
so
T* = T(1
this to
EXAMPLE 5.5.
m m ,
algebra.
some
Lemma
6.21,
+ log(N! )) N5,
have
we
as
N--,
in
coin-tossing.
of Heads and
fair coin until sequence
elementary probability:
As at
2T*/N!
--~ 0
for this model.
Sequences
(i1,...,iN)
results.
.
R(T*) --~ 1. ~ And (6.7) gives
establishing (8.4)
in
obtain
{1 _
~ R - R(T*) ~I
i =
we
new
N/2 .
m
we
~ 0
Thus
Hence
both
are
two
Y.
m ~ 2
;
Setting
pairs which
pair is transposed, then
P(Y=m+2) ~
So
f(X1,a).
=
such that
;
(Note
be the distribution
shuffle, and let Y
one
1
s
Xl
(5.5)
let
i
Tails, let
appears.
see
Xn
Feller
For
a
Ii
be the number of tosses of
Studying
(1968).
prescribed sequence
Ii
i
is
a
classical problem
We shall derive some known
be the Markov chain of sequences of
with uniform initial distribution.
Let
T. 1
=
a
X
n = i}
’
length N,
and note
296
The discrete-time
Ti + N.
of
analogue
is
Proposition 7.1
n
=
In this
example
we
lim 03A3
=
Ri
m=0
n
(pi,j(m)-03C0(i))
’
.
have
,
(}
2-N
=
P. , i(m} 2_N , m >
1(i q =i qm
+,
N
=
1qN-m} , 0
~
:
Hence
N.
m
--
find
we
Ej =
2N{1 + S2-m1(iq=iq+m: 1 N-m } ’
This is well-known:
Proposition
to
see
(1980)
Li
7.18 says that
in the
implicit
to this
problem (Gerber and Li (1981))
noted.
Moreover, Li (1980) discusses the time
A
of sequences of
distribution of
EXAMPLE 5.7.
x
points
min
that
=
length
TA N/ETA N
xN
N
a
~ (1-Fd)-1,
Fd
which
N,
are
points
is the return
unrestricted d-dimensional simple random walk.
and
Tx /ETx
converges to
7.18
Proposition
exponential.
approach
explicitly
not to have been
seems
TA
exponential
For such
where
function
generating
d-dimensional box.
in boxes of side
converges
until
some one
of
set
a
by Propositions 7.19 and 7.13 the
occurs:
as
Rx
but
converges to
in
Random
Ti/ETi
the distribution of
as
fact is
exponential: this
for recent extensions and references.
.
when
Fix
d
>
3.
Consider
away from the sides in the sense
it is not difficult to
probability
Thus
see
for the
Proposition
7.1
implies
implies that the distribution of
297
References
ALDOUS, D. J. (1982a).
Some
inequalities
for reversible Markov chains.
J. London Math. Soc. 25 564-576.
ALDOUS, D. J. (1982b).
Markov chains with almost
Stochastic Processes Appl.
to appear.
exponential hitting times.
13,
ALDOUS, D. J. (1983).
On the time taken
by a random walk on a finite group
Zeitschrift fur Wahrscheinlichkeitstheorie.
to visit every state.
to appear.
DIACONIS,
P.
(1982).
Group theory in statistics.
Preprint.
DIACONIS, P. and SHAHSHAHANI, M. (1981). Generating a random permutation
with random transpositions. Zeitschrift fur Wahrscheinlichkeitstheorie
57 159-179.
DONNELLY, K. (1982).
The probability that a relationship between two
individuals is detectable given complete genetic information.
Theoretical Population Biology, to appear.
EPSTEIN, R. A. (1977).
Edition).
The
Theory of Gambling and
Statistical
Academic Press.
FELLER, W. (1968).
An Introduction to
Wiley.
Logic (Revised
Probability Theory (3rd Edition).
GERBER, H. U. and LI, S.-Y. R. (1981). The occurrence of sequence patterns
in repeated experiments and hitting times in a Markov chain. Stochastic
Processes Appl.
11 101-108.
KARLIN, S. and TAYLOR, H. M. (1975). A First
Course in Stochastic Processes.
Academic Press.
KEILSON, J. (1979).
Markov Chain
Springer-Verlag.
Models--Rarity
KEMENY, J. G. and SNELL, J. L. (1959).
Exponentiality.
Finite Markov Chains.
KEMPERMAN, J. (1961).
Chain.
and
The First Passage Problem for a
IMS Statistical Research Monograph 1.
LETAC, G. (1981).
Van Nostrand.
Stationary
Markov
Problèmes
Analytical Methods
in
classiques de probabilité sur un couple de Gelfand.
Probability Theory, ed. D. Duglé et al. Springer
Lecture Notes in Mathematics 861.
LI, S.-Y. R. (1980). A martingale approach to the study of
sequence patterns in repeated experiments. Ann.
occurrence
Probability 8
REEDS, J. (1982).
Unpublished
STOUT, W. F. (1974).
notes.
Almost Sure
Convergence.
Academic Press.
of
1171-1176.