RANDOM SERIES IN POWERS OF ALGEBRAIC INTEGERS :
HAUSDORFF DIMENSION OF THE LIMIT DISTRIBUTION
STEVEN P. LALLEY
A
We study the distributions Fθ,p of the random sums _ εn θn, where ε , ε , … are i.i.d. Bernoulli-p and
"
" #
θ is the inverse of a Pisot number (an algebraic integer β whose conjugates all have moduli less than 1)
between 1 and 2. It is known that, when p l .5, Fθ,p is a singular measure with exact Hausdorff dimension
less than 1. We show that in all cases the Hausdorff dimension can be expressed as the top Lyapunov
exponent of a sequence of random matrices, and provide an algorithm for the construction of these
matrices. We show that for certain β of small degree, simulation gives the Hausdorff dimension to several
decimal places.
1. Introduction
Let ε , ε , … be independent, identically distributed Bernoulli-p random variables,
" #
and consider the random variable X defined by the series
_
X l εn θn,
n="
where θ ? (0, 1). By the ‘ Law of Pure Types ’ the distribution F l Fθ,p of X is either
absolutely continuous or purely singular (but of course nonatomic). Erdo$ s proved (a)
that there exist values of θ larger than ", for example, the inverse of the ‘ golden ratio ’,
#
such that F is singular [6] ; but (b) that there exists γ 1 such that for almost eery
θ ? (γ, 1) the distribution Fθ,. is absolutely continuous [7]. (Solomyak [22] has recently
&
shown that statement (b) is true for γ l .5.) Erdo$ s’ argument shows that in fact F is
singular whenever θ is the inverse of a ‘ Pisot ’ number. Recall that a Pisot number is
an algebraic integer greater than 1 whose algebraic conjugates are all smaller than 1
in modulus (see [20]) and an algebraic integer is a root of an irreducible monic, integer
polynomial. There are in fact infinitely many Pisot numbers in the interval (1, 2) : for
example, for each n l 2, 3, … , the largest root βn of the polynomial
pn(x) l xnkxn−"kxn−#k(kxk1
(1)
is a Pisot number : see [21]. We shall call these the simple Pisot numbers.
One may ask for those values of θ such that F l Fθ,p is singular : is it necessarily
the case that F is concentrated on a set of Hausdorff dimension strictly less than 1,
and if so, what is the minimal such dimension ? (We shall call this minimum the
Hausdorff dimension of F .) Przytycki and Urbanski [18], enlarging on an argument of
Garsia [12], showed that if θ is the inverse of a Pisot number between 1 and 2 then
in fact Fθ,. has Hausdorff dimension less than 1. Unfortunately, their method does
&
not give an effective means of calculating it. Recently, Alexander and Zagier [2]
Received 10 May 1995 ; revised 8 April 1996.
1991 Mathematics Subject Classification 28A78.
Research supported by NSF grant DMS-9307855.
J. London Math. Soc. (2) 57 (1998) 629–654
.
630
showed how to calculate the ‘ information dimension ’ in the case where θ is the
inverse of the golden ratio and p l ". As it turns out, the Hausdorff and information
#
dimensions coincide for the measures we consider here, but Alexander and Zagier did
not prove this. Their argument is very elegant, relating the dimension to properties
of the ‘ Fibonacci tree ’ and thence to the Euclidean algorithm, but it does not seem
to generalize easily. (Although the authors claim that ‘ it seems likely that the
particulars will extend to β−n" ’ for βn as defined above, this is not readily apparent.)
The purpose of this paper is to characterize the Hausdorff dimension of Fθ,p as the
top Lyapunov exponent of a certain natural sequence of random matrix products.
This characterization is quite general : it is valid for every θ such that β l 1\θ ? (1, 2)
is a Pisot number, and for all values of the Bernoulli parameter p. (The method is
applicable even more generally when the process ε , ε , … is a k-step Markov chain
" #
taking values in an arbitrary finite set of integers, but we carry out the details only
in the case of Bernoulli processes.) Moreover, although the matrices involved may
have large dimensions (typically increasing exponentially in m, where m is the degree
of the minimal polynomial), they are sparse, so numerical computation of the
Lyapunov exponent may be feasible for 1\θ of moderate degree. Small simulations
give nonrigorous estimates that agree with the rigorous estimate of Alexander and
Zagier (which is accurate to the 4th decimal place) to 3 decimal places. For values of
p ranging from .1 to .5 (for which the methods of Alexander and Zagier do not apply),
small simulations based on our matrix product representation give estimates with
accuracy to p.005. Some of these numerical results are reported in Section 8 below.
Using a ‘ summation over paths ’ in conjunction with our representation, one could
obtain rigorous upper and lower bounds ; however, we have not yet done so.
Random matrix products occur in a number of other problems involving the
computation of Hausdorff dimensions. See Bedford [3] for their use in computing the
dimensions of the graphs of some self-affine functions, and Kenyon and Peres [14]
for studies of Cantor sets on the line. In each of these examples there is a natural
‘ automatic ’ structure that leads to the matrix products. Here the automatic structure
arises from number-theoretic considerations. The discussion in Section 9 below may
shed further light on this.
2. Dimension and entropy : infinite Bernoulli conolutions
For any probability distribution F on a metric space, one may define its
(Hausdorff) dimension δ(F ) to be the infimum of all d 0 such that F is supported by
a set of Hausdorff dimension d. There is a simple and well known tool for calculating
δ(F ), which we shall call Frostman’s Lemma.
L 1 (Frostman).
If for F-almost eery x the inequalities
δ lim inf
"
r!
log F(Br(x))
δ
#
log r
hold, then δ δ(F ) δ .
"
#
Here Br(x) denotes the ball of radius r centered at x. The proof is relatively easy :
see [8, Chapter 1, Problem 1.8] or [24].
631
Application of the Frostman Lemma to a probability measure on the real line
requires a handle on the probabilities of ‘ typical ’ small intervals. In our applications
F is the distribution of a random sum X l εn θn, and the value of X is (roughly)
determined to within r by the sum of the first n terms of the series, where r $ θn.
Henceforth, for any infinite sequence ε l ε ε … of zeroes and ones we shall write
" #
_
n
x(ε) l εk θk and xn(ε) l εk θk,
k="
k="
and for any finite sequence ε l ε ε … εn of length n 1 we shall write xn(ε) l
" #
nk= εk θk. Observe that x(ε)kxn(ε) θn+"\(1kθ) for any n 1 and every infinite 0–1
"
sequence ε. Consequently, if xn(εh) l xn(ε) then Qx(εh)kx(ε)Q θn+"\(1kθ). In general,
the converse need not be true ; however, if θ is the inverse of a Pisot number then there
is a weak converse, discovered by Garsia [11]. Because this result is of central
importance in the ensuing arguments, and because the proof is rather short, we
include it here. For the remainder of the paper, the following assumption will be in
force.
A 1. The ratio θ is the inverse of a Pisot number β ? (1, 2) with
minimal integer monic polynomial p(x) of degree m.
L 2 (Garsia [11]). There exists a constant C 0 (depending on β) such that
for any integer n 1 and any two sequences ε l ε , ε , … , εn and εh l ε , ε , … , εn of
" #
" #
zeroes and ones, either
xn(ε) l xn(εh)
or Qxn(εh)kxn(ε)Q Cθn.
Proof. Suppose that ε, εh are 0 –1 sequences such that nk= εk θk nk= εk θk.
"
"
Then F( β) 0, where F(x) l nk= δk xn−k and δk l εkkεk. Note that F is a polynomial
"
with coefficients 0, p1. Let βi be the algebraic conjugates of β ; then for each i we have
F( βi) 0 and
F( β) F( βi) ? .
i
Since each Q βiQ 1 (recall that β is a Pisot number) and the coefficients of F are
bounded by 1 in modulus, QF( βi)Q 1\(1kQ βiQ). Consequently,
QF( β)Q (1kQ βiQ) l C.
i
It follows from this lemma that if x(εh) ? Br(x(ε)) for r l θn+"\2(1kθ) then there
are at most 2Cj1 possible values for xn(εh). There are, in general, many pairs of
sequences for which xn(ε) l xn(εh) : the separation bound in the lemma implies that
there are only O(θ−n) possible values of the sum, but there are 2n sequences of zeroes
and ones of length n (and θ−" 2). For each n the possible values of xn(ε) partition
the space Σ of all infinite sequences of zeroes and ones : for any two sequences
ε, εh ? Σ, we see that ε and εh are in the same element of the partition if and only if
xn(εh) l xn(ε). Call the resulting partition n. For any ε ? Σ let U(n, ε) be the element of
the partition n containing ε. Then for each ε ? Σ and each value of the Bernoulli
parameter p the set U(n, ε) has a probability πn(ε) (for notational simplicity, the
dependence on p is suppressed). In the subsequent sections we shall prove the
following theorem.
.
632
T 1. For any p ? (0, 1), if ε , ε , … are i.i.d. Bernoulli-p random ariables
" #
then with probability 1 the limit
α l lim (πn(ε))"/n
n
(2)
_
exists, is positie, and is constant.
The proof will exhibit the limit α l α( p) as the top Lyapunov exponent of a
certain sequence of random matrix products, providing an effective means of
calculation. Note that when p l ", the probability πn(α) is just 2−n times the
#
cardinality of the equivalence class. In this case the matrices may be chosen to have
all entries 0 or 1, so numerical computations are easiest in this case.
Using the close connection between the partitions n and the neighborhood
system Br(x(ε)) for r l κθn, we shall prove the following formula for the dimension
of the measure Fθ,p.
T 2. The Hausdorff dimension of Fθ,p is
log α
δl
.
log θ
(3)
This formula is an instance of the by now well-known general principle
‘ dimensioniexpansion rate l entropy ’ : see [24] for more. However, the proof is not
entirely trivial, even given the result of Theorem 1 : it relies on Garsia’s lemma, and
therefore on the algebraic nature of the ratio β l 1\θ. In the special case p l ",
#
Przytycki and Urbanski proved the inequality δ log α\log θ and used it to deduce
that δ 1, but did not establish equality. Alexander and Yorke proved, again in the
special case p l ", that log α\log θ equals the ‘ information dimension ’ (also called the
#
Renyi dimension) of F, which always dominates the Hausdorff dimension, but did not
prove equality with the Hausdorff dimension. Theorem 2 follows directly from
Theorem 1 and Propositions 3 and 4 below.
It is worth noting here that the dimension δ(Fθ,p), considered as a function of p,
is symmetric about ", that is,
#
δ(Fθ,p) l δ(Fθ, −p).
"
The proof is simple. If ε , ε , … are i.i.d. Bernoulli ( p), then ε , ε , … are i.i.d. Bernoulli
" #
" #
(1kp), where εj l 1kεj. Consequently, if X l x(ε) has distribution Fθ,p, then Y l
θ\(1kθ)kX has distribution Fθ, −p. But the distributions of X and Y clearly have the
"
same dimensions, because Y is obtained from X by an isometry of the real line.
P 1. If Fθ,p is singular with respect to Lebesgue measure, then δ
1.
In fact this holds in even greater generality ; see Proposition 5 below.
P 2. If β is a Pisot number between 1 and 2, then Fθ,p is singular for
eery p ? (0, 1).
Proof. This follows by Erdo$ s’ original argument. The Fourier transform of Fθ,p
is easily computed as an infinite product :
&
_
FV θ,p(t) l eitx Fθ,p(dx) l (1jp(exp oitθkqk1)).
k="
633
It is obvious from this that
n
FV θ,p( βnt) l (1jp(exp oitβn−kqk1)) FV θ,p(t)
k="
for every t ? (k_, _) and every n 0. Since β is a Pisot number, distance ( βn, )
as n _ at an exponential rate (see, for example, [20, Chapter 1]), and hence
0
_
Qe#πiβkk1Q _.
k="
It follows from the standard criterion for convergence of an infinite product that for
every integer m 1 we have
_
lim FV θ,p(2πβn) l (1jp(expo2πiβk−mqk1)) FV θ,p(2πβ−m),
n _
k=!
and that the limit of the infinite product in this expression is nonzero. Now if m is
sufficiently large, then FV θ,p(2πβ−m) 0, because FV θ,p is continuous and takes the value
1 at the argument 0. Therefore, by the Riemann–Lebesgue lemma, Fθ,p is singular.
Theorems 1 and 2 reduce the problem of computing the Hausdorff dimension of
Fθ,p to that of computing the ‘ entropy ’ log α. This will be carried out in Sections 4–8.
It should be noted that the entropy arises in connection with several other
dimensional quantities, such as the ‘ information dimension ’ : see [1, 2] for more on
this. Before coming to grips with the entropy, however, we shall prove Theorem 2 and
discuss certain generalizations of the results of this section to a class of measures
Fθ, µ including the Fθ,p.
3. Dimension and entropy: stationary measures
Let µ be an ergodic, shift-invariant measure on the space Σ of infinite sequences
ε , ε , … of zeroes and ones. Define Fθ, µ to be the distribution under µ of X l
" # n
_
ε θ . Observe that if µ is the Bernoulli-p product measure, then Fθ, µ l Fθ,p.
n=" n
The Bernoulli measures are not the only measures of interest, however—for
instance, there are measures µ for which Fθ, µ is absolutely continuous relative to
Lebesgue measure. These measures were discovered by Renyi [19] for the case when
θ is the inverse of the golden ratio and by Parry [15, 16] for the other Pisot numbers.
As in the previous section, for each n let n be the partition of Σ induced by the
equivalence relation ε " εh if and only if xn(ε) l xn(εh). Note that these partitions are
not nested. For any sequence ε let Un(ε) be the element of n that contains ε, and let
πn(ε) l µ(Un(ε)). Similarly, for an arbitrary measurable partition let U(, ε) be the
element of that contains ε, and let π(, ε) l µ(U(, ε)). Define the entropy of the
partition by
H() lkEµ log π(, ε) l kµ(F ) log µ(F ).
F?
L 3. We hae limn
_
H(n)\n lklog α exists.
Proof. For any integers n, m 1, the partition nGσ−n m is a refinement of
n+m (here σ is the shift), because the values of xn(ε) and xm(σnε) determine the value
.
634
of xn+m(ε). Hence, as a refinement, it has the larger entropy. But by an elementary
property of the entropy function, H(nGσ−n m) H(n)jH(m) (see [23, Section
4.3]). Thus, H(Pn) is a subadditive sequence.
L 4. For eery n 1, we hae lim infk
_
πnk(ε)"/k e−H(n) a.e. (µ).
Proof. First, recall that the entropy h(σn, n) of the partition n with respect to
the measure-preserving transformation σn is defined by
1
" σ−ni ) H( ) ;
h(σn, n) l lim H(Ik−
i=!
n
n
k
_
k
see [23, Chapter 4] for details. By the Shannon–MacMillan–Breiman theorem [17],
1
" σ−ni , ε) l h(σn, )
lim log π(Ik−
i=!
n
n
k
k _
a.e. ( µ).
" σ−ni is a refinement of , so π (ε) π(Ik−" σ−ni , ε), proving the
But Ik−
i=!
n
nk
nk
i=!
n
lemma.
C 1. For any ergodic, shift-inariant probability measure µ on Σ, we
hae
lim πn(ε)"/n l α almost surely ( µ).
(4)
For the special case in which µ is the product Bernoulli-p measure, this is Theorem
1, and will be proved in Sections 4–8 below. A modification of the argument (which
we shall omit) shows that the conjecture is also true for any µ making the coordinate
process ε , ε , … a stationary k-step Markov chain.
" #
P 3. For each ergodic, shift-inariant probability measure µ on Σ, the
dimension δ of Fθ, µ satisfies
log α
δ
.
log θ
Proof. By the Frostman lemma, it suffices to show that for x in a set of
Fθ, µ-measure 1,
log Fθ, µ(Br(x)) log α
lim inf
,
(5)
log r
log θ
r !
and, by a routine argument, it suffices to consider only r l κθnk for some fixed
constant κ 0, a fixed integer n, and k l 1, 2, …. If ε, εh are in the same element of the
partition nk, then Qx(ε)kx(εh)Q θnk+"\(1kθ) ; consequently, if κ l 2θ\(1kθ) then
for every εh in U(nk, ε), we have x(εh) ? Br(x(ε)). Therefore, Fθ, µ(Br(x(ε))) πnk(ε), and
so (5) follows from the two preceding lemmas.
P 4. For any ergodic, shift-inariant measure µ on Σ for which (4) holds
almost surely with respect to µ, the Hausdorff dimension of Fθ, µ satisfies
δl
log α
.
log θ
635
Proof. The inequality δ log α\log θ was proved in Proposition 3 above. Thus,
it is enough to prove the reverse inequality. By Frostman’s lemma, it suffices to show
that for all x in a set of full Fθ, µ-measure,
log Fθ, µ(Br(x)) log α
lim inf
.
(6)
log r
log θ
r !
By a routine argument, it suffices to prove this for the sequence r l θn.
Let ε, εh be arbitrary sequences of zeroes and ones. In order that x(εh) ? Br(x(ε))
for r l θn, it is necessary that Qxn(εh)kxn(ε)Q κθn, where κ l 1j2\(1kθ).
Consequently, for any fixed sequence ε l ε ε … of zeroes and ones and for any
" #
r l θn, we have
Fθ, µ(Br(x(ε))) ρn(ε),
where
ρn(ε) l µoεh : Qxn(εh)kxn(ε)Q κθnq.
Here ε is fixed (nonrandom). Note that ρn(ε) πn(ε).
Now let ε l ε ε … ? Σ be random, with distribution µ. By the Chebyshev
" #
inequality, for each η 0 and each n 1, we have
µo ρn(ε) (1jη)n πn(ε)q (1jη)−n Eµ
0πρ (ε)(ε)1 .
n
n
We shall argue below that there is a constant CJ _ such that Eµ(ρn(ε)\πn(ε)) CJ
for all n. It will then follow that n µoρn(ε) (1jη)n πn(ε)q _ for every η 0,
and consequently, by the Borel–Cantelli lemma, that with µ-probability 1, we have
ρn(ε) (1jη)n πn(ε) for at most finitely many n. But this will imply, with probability
1, that limn _ ρn(ε)"/n l limn _ πn(ε)"/n l α. Since ρn(ε) is an upper bound for
Fθ, µ(Br(x(ε))), and this will prove (6).
So consider Eµ(ρn(ε)\πn(ε)). By definition of ρn(ε), we have
Eµ
0πρ (ε)(ε)1 l π (εh),
n
n
xn(ε) xn(εh)
n
where the outer sum is over all possible values of xn(ε) and the inner sum is over those
values of xn(εh) such that Qxn(εh)kxn(ε)Q κθn (only one representative sequence εh is
taken for each such value). But by Garsia’s lemma, each value of xn(εh) appears in the
inner sum for at most CJ distinct xn(ε) for some CJ _ independent of n. Therefore,
Eµ
0πρ (ε)(ε)1 CJ π (ε) l CJ.
n
xn(ε)
n
n
P 5. Let µ be an ergodic, shift-inariant probability measure on Σ. If
Fθ, µ is singular with respect to Lebesgue measure, then δ 1.
Proof. This is an adaptation of the arguments of [12] and [18]. By Proposition
3, it suffices to show that klog α klog θ. For this it suffices to show that
H(n)\n
klog θ
for some n 1,
because (see Lemma 3 and its proof) H(n) is a subadditive sequence such that
H(n)\n converges to klog α as n _.
Recall that the elements of the partition n are in 1–1 correspondence with the
possible values of the sum xn(ε) l nj= εj θ j, where ε ε … εn is a 0 –1 sequence. Hence,
"
" #
.
636
by Garsia’s lemma, there are at most C h θ−n elements of n, for some constant C h _
independent of n. Now the hypothesis that Fθ, µ is singular, together with Garsia’s
lemma, implies (by the same argument as in [18]) that the probability measure µ Q n
is highly concentrated on a subset of n of much smaller size than the cardinality
of n : in particular, for every η 0 there exists an n and a subset n 9 n such that
Fn\Fn
η
and
µ(F? F ) l 1kρ 1kη.
n
Consequently, with ρ̀ l 1kρ, we have
H(n) lk µ(F ) log µ(F )k F?n
F?n−n
µ(F ) log µ(F )
ρ̀ log (F n)kρ̀ log ρ̀jρ log(F(nkn))kρ log ρ
log θ−njρ̀ log ηjlog C hkρ̀ log ρ̀kρ log ρ.
Here we have used the estimates Fn C h θ−n and Fn C hηθ−n, and also the fact that
for a given partition, the probability measure that maximizes entropy is the uniform
distribution on the partition. Now ρ η and η 0 may be chosen arbitrarily small.
Since x log xj(1kx) log(1kx) 0 as x 0, the last two terms of the upper bound
may be made arbitrarily small. The term log C h is independent of η and ρ. Hence, by
choosing 0 ρ η very small, we may make ρ̀ log ηjlog C h much less than 0. It
follows that for sufficiently large n, we have H(n) kn log θ.
4. Equialent sequences
In Section 2 we reduced the problem of computing the Hausdorff dimension of the
measure Fθ,p to that of computing the entropy α, which is essentially the same as the
problem of estimating probabilities of equivalence classes. Thus, we must find an
effective way to tell when two sequences are equivalent. Recall that for fixed θ the
equivalence relation is defined as follows : two sequences ε and εh of zeroes and ones
of length n are equivalent if and only if xn(ε) l xn(εh). More generally, we say that
two length-n sequences ε, εh of integers are equivalent if and only if nk= εk θkl
"
nk= εk θk.
"
Let p(z) be the minimal polynomial of β l 1\θ, and assume that it has degree m
and leading coefficient 1. The relation p( β) l 0 may be rewritten by moving all terms
with negative coefficients to the other side, yielding an identity between two
polynomials in β of degree m with nonnegative integer coefficients. This equation
translates to an equivalence between two distinct sequences ε and εh of nonnegative
integers of length mj1 : we shall call this equivalence the fundamental relation. Thus,
for example, if β is the golden ratio, the fundamental relation is
100 " 011,
reflecting the fact that the minimal polynomial of the golden ratio is x#kxk1. Note
that there are Pisot numbers β between 1 and 2 such that the sequences in the
fundamental relation have entries other than 0 and 1 : for instance, the leading root
β l 1.755 ( of the cubic p(x) l x$k2x#jxj1 is a Pisot number with fundamental
relation 1011 " 0200. But observe that if ε " εh is the fundamental relation, then the
first entry of ε is always 1 and the first entry of εh is 0.
Let γ l γ γ … γn and γh l γ γ … γn be arbitrary sequences of integers, and let
" #
" #
ε " εh be the fundamental relation. We shall say that γh can be obtained from γ by
637
applying the fundamental relation k times in the (mj1)-block starting at the lth entry
if γj l γj for all j except j l l, lj1, … , ljm, and γl+jjkεj l γl+jjkεj for all j l 0,
1, … , m. Here ε l ε ε … εm and εh l ε ε … εm, and k may be any integer. In general,
! "
! "
if a 0 –1 sequence ε of arbitrary length n mj1 may be obtained from another
sequence εh of the same length by repeatedly applying the fundamental relation to
various (mj1)-blocks (strings of mj1 consecutive entries), then ε " εh. For example,
when β is the golden ratio the sequences 100011 and 011100 are equivalent, because
one may obtain the second from the first by the chain of substitutions 100011
100100 011100.
P 6. Two sequences ε and εh of length n are equialent if and only if εh
may be obtained from ε by applying the fundamental relation repeatedly to blocks of
length mj1, starting at the left and ending at the right.
The stipulation that the substitutions be made left to right will be crucial.
However, it should be noted that in general it is not possible to make the substitutions
in order and have all of the intermediate sequences be sequences of zeroes and ones,
even when the sequences ε, εh in the fundamental relation ε " εh are 0 –1 sequences.
For example, when the fundamental relation is 100 " 011 the sequences 10100 and
01111 are equivalent, the chain of substitutions being
10100 ,- 01200 ,- 01111.
Moreover, the proposition does not state that the fundamental relation is applied
only once at each block of length mj1 : in fact, it may be that one must use it more
than once in a given direction, for example, changing 200 to 0(–1)(–1). Later we shall
prove certain restrictions on the substitutions that can occur at a given (mj1)-block.
Finally, one should bear in mind that the sequences are merely shorthand for sums
kj= εj θ j.
"
Proof. The sequences ε and εh are equivalent if and only if θ satisfies the
polynomial equation
n
δk xk l 0,
k="
where δk l εkkεk. This happens if and only if β is a root of the polynomial f(x) l
nk= δk xn−k. Since p(x) is the minimal polynomial of β, it follows that p divides f in
"
the polynomial ring [x], that is, there exists a polynomial g(x) l n−m
b xj with
j=! j
integer coefficients bj such that f l pg.
The coefficients of g(x) provide the schedule of substitutions. The jth coefficient
bj tells how many times to apply the fundamental relation to the (mj1)-block starting
at the ( jjmj1)th entry from the right ; the sign (j or k) tells whether the
fundamental relation should be applied in the forward or the backward direction.
This is best illustrated by a simple example : let the fundamental relation be 100 " 011
(with β l golden ratio), and let ε l 10100 and εh l 01111. Then f(x) l x%kx$kxk1
and g(x) l x#j1. The quotient polynomial g has coefficients j1 in the 0th and 2nd
positions ; these indicate that the substitution 100 011 should be made once to the
block at the extreme left and once to the block two positions in, that is, 10100
01200 01111.
That the quotient polynomial g(x) does in fact provide a left-right sequence of
substitutions in eery case is easily proved by induction on n. We prove a more general
.
638
statement : if u , u , … , un and , , … , n are equivalent sequences of integers
" #
" #
(meaning that f(θ) l nj= (ujkj)θn l 0) then g(x) l f(x)\p(x) provides a correct left"
right sequence of substitutions. The statement is clearly true if n l m, because in this
case f is an integer multiple of p, as p is the minimal polynomial of θ. Suppose it is
true whenever n N, and let u , u , … , un and , , … , n be equivalent sequences of
" #
" #
length n l N. If the leading term of g(x) is bn−m− then clearly u k l bn−m− . (Note
"
" "
"
that here β is an algebraic integer—this guarantees that the leading term in the
minimal polynomial has coefficient 1.) Consequently, if one makes bn−m substitutions
of the fundamental relation in the leading (mj1)-block of u , u , … , un one obtains
" #
an equivalent sequence w , w , … , wn whose leading entry is . But w , w , … , wn and
" #
"
" #
, , … , n are equivalent sequences with the same first entry, so w , w ,… wn and
" #
# $
, , … , n are equivalent sequences of length nk1. The induction hypothesis now
# $
implies the result.
C 1. Suppose that ε " εh are equialent sequences of length n. Let
b x j l f(x)\p(x). Then εh
f(x) l nk= δk xn−k, where δk l εkkεk, and let g(x) l n−m
j=! j
"
may be obtained from ε by applying the fundamental relation bj times to the (mj1)block, starting at the ( jjmj1)th entry from the right, in the order j l nkm,
nkmk1, … , 0.
We shall call the sequence of transformations described in this corollary the
canonical transformation taking ε to εh. It produces (nkmk1) intermediate sequences
ε("), ε(#), … , ε(n−m−").
In modifying ε(i) to obtain ε(i+"), only entries in the (mj1)-block starting at the
(ij1)th entry are changed ; since these (mj1)-blocks move left to right one unit at
a time, it follows that
(a) the first i first entries of ε(i) agree with corresponding entries of εh ;
(b) the final nkikm entries of ε(i) agree with the corresponding entries of ε.
In particular, only those entries of ε(i) in the m-block starting at the (ij1)th entry can
be different from 0 or 1. Neither Proposition 8 nor Corollary 2 implies that the entries
of m-blocks arising in intermediate sequences of canonical transformations are
bounded—in principle, there could be infinitely many possible such m-blocks. In the
next section, we shall show that in fact there are only finitely many possibilities, and
give an algorithm for identifying them.
5. Admissible m-blocks
Define an admissible m-block to be an m-block (a sequence of m integers) that
occurs in some intermediate sequence in a canonical transformation of some sequence
ε of zeroes and ones to an equivalent sequence εh of zeroes and ones, and define
l oadmissible m-blocksq.
The matrices M , M that will appear in the random matrix products used to
!
"
characterize the entropy α (see Theorems 3 and 4 below) will have rows and columns
indexed by the admissible m-blocks. In this section we shall verify that is finite and
give an effective procedure for identifying its elements.
P 7. Q
Q
_.
639
We know two proofs, one based on Garsia’s lemma, the other on the following
result, a special case of Proposition 2.5 of [10] (which is attributed to J. P. Bezevin).
We reproduce the proof, because it provides an explicit bound for the size of entries
of admissible m-blocks (see Corollary 2 below).
P 8. Assume that β is a Pisot number. Then there exists a positie
integer C _ with the following property : for any polynomial f(x) with coefficients 0,
1, k1 such that p Q f in the ring [x], the coefficients bj of the quotient polynomial
g(x) l f(x)\p(x) l n−m
b xj are bounded in modulus by C. Furthermore, if β l β ,
j=! j
"
β , … , βm are the roots of p(x), then we may take C l [C h ], where [:] denotes greatest
#
integer and
m
C h l (1kβ−")−" (1kQ βiQ)−".
(7)
i=#
N. Since the coefficients bj of any such quotient polynomial are integers, it
follows that they are elements of the finite set okC, kCj1, … , Ck1, C q.
Proof. The argument, in brief, is as follows. Say that a polynomial P(x) has the
bounded diision property if for each real a 0 there exists a constant Ca _ such
that for any polynomial F(x) l ni= ai xi ? [x] with coefficients ai bounded in
!
modulus by a, if P(x) Q F(x) in the ring [x], then the coefficients of the quotient
polynomial F(x)\P(x) are bounded in modulus by Ca. Call the assignment a Ca an
expansion function for P(x). It is easily seen that if P (x) and P (x) both have the
"
#
bounded division property, then so does their product P P , and the product has
" #
expansion function a C(a) satisfying C(a) C (") @ C (#)(a), where C (")(:) and C (#)(:)
are expansion functions for P and P , respectively. It is also easily seen that any linear
"
#
polynomial of the form xkα has the bounded division property if and only if
QαQ 1, and that in this case an expansion function is
1
Ca l
4
3
2
(1kQαQ)−" a
if QαQ 1,
(1kQαQ−")−" a if QαQ 1.
Therefore, a polynomial P(x) ? [x] has the bounded division property if and only if
it has no roots on the unit circle.
Since β is a Pisot number, its minimal polynomial p(x) has no roots on the unit
circle. Thus, it has the bounded division property. In fact, β l β 1 is the only root
!
outside the unit circle, so the results of the previous paragraph imply that the bound
(7) holds for the constant C l C .
"
C 2. Assume that β is a Pisot number. Let H be the maximum of the
absolute alues of the coefficients of the minimal polynomial p(x). Then the entries of
admissible m-blocks are bounded in modulus by CJ, where
CJ l CHmjCHj1 ;
(8)
here C is the constant in (7) and m is the degree of the minimal polynomial p(x) of β.
Proof. Consider the sequence of transformations specified in Corollary 1. The
fundamental relation is applied bn−m times at the leftmost (mj1)-block of ε ; then
bn−m+ times at the leftmost-but-one (mj1)-block of the resulting sequence ; etc. The
"
integers bj are the coefficients of a quotient polynomial f\p, where f has all coefficients
640
.
in 0, k1, 1, so by Proposition 8 each bj satisfies QbjQ C. Applying the fundamental
relation once (either in the forward direction or the backward direction) changes
entries by at most H (in absolute value), so applying it bj times changes entries by at
most CH. Moreover, an entry is changed only if its position is in the current (mj1)block. Since no position is in the current (mj1)-block more than mj1 times, it
follows that the maximum amount by which it can be modified is no more than
CH(mj1). Therefore, since the entries of the original sequence ε are either 0 or 1,
entries in intermediate sequences cannot be more than 1jCH(mj1) or less than
kCH(mj1).
Next, we discuss the problem of explicitly enumerating the set
of admissible
m-blocks. By Corollary 2,
is contained in the set of m-tuples with entries in
okCJ, kCJj1, … , CJq. Unfortunately, even for Pisot numbers of small degree m, C*m
may be fairly large compared to the cardinality of
(see the table below) and so a
brute force search of the set of all such m-tuples may be needlessly lengthy. A much
more useful test seems to be that provided by the following lemma.
L 5. If ε l ε ε … εm is an admissible m-block, then
"#
m
k(1kθ)−" εi θi θm(1kθ)−".
i="
Proof. If an application of the fundamental relation changes a sequence
δ δ … δn of integers to an equivalent sequence δ δ … δn, then δj θ j l δj θ j. Now
" #
" #
recall that admissible m-blocks occur only in intermediate sequences of canonical
transformations of equivalent sequences of zeroes and ones, and that in these
canonical transformations, the fundamental relation is applied repeatedly, left to
right. Consequently, if ε l ε ε … εm is an admissible m-block such that m
ε θj j=" j
"#
j, then the ‘ excess ’ m ε θ jkm θ j cannot be greater than _ θ j, because
m
θ
j=!
j="
j=" j
j="
this excess must eventually be ‘ transferred ’ to the right of the m-block. Similarly, if
m
ε θ j 0, then this ‘ deficiency ’ cannot be less than k_
θ j, because it must
j=!
j=" j
eventually be compensated by the terms to the right of the m-block.
Thus, 9 , where is the set of all m-blocks with entries bounded in modulus
by CJ such that the inequalities of Lemma 5 are satisfied. Let b l b b … bm and
" #
bh l b b … bm be elements of . Write
" #
b ,- bh
if bh can be obtained from b by (1) appending either a 0 or a 1 to the end of b, then
(2) applying the fundamental relation to the resulting (mj1)-block either kb or
"
kb j1 times, and finally (3) deleting the entry at the beginning of the transformed
"
(mj1)-block. For example, if the fundamental relation is 100 " 011, then 20 12
(20 : 201 : 112 : 12), but 21 01. Observe that for a given m-block b there are at most
four m-blocks bh such that b bh.
L 6. An element b of is an element of
if and only if there is a finite chain
b(") ,- b(#) ,- ( ,- b ,- ( ,- b(K)
in which both endpoints b(") and b(K) are sequences of zeroes and ones.
641
T 1
β
1.618
1.325
1.466
1.755
1.839
1.380
1.866
1.905
1.928
p(x)
x#kxk1
x$kxk1
x$kx#k1
x$k2x#jxk1
x$kx#kxk1
x%kx$k1
x%k2x$jxk1
x%kx$k2x#j1
x%kx$kx#kxk1
Q Q
CJ( β)
8
200
68
28
14
1702
92
154
24
21.6
949.5
417.0
310.5
128.1
28288.8
3535.6
4790.6
1398.2
Proof. This is a direct consequence of Corollary 1 and the discussion following.
r
r
Write b bh if bh b. Note that b bh if and only if bh can be obtained from
b by (1) prepending either a 0 or a 1 to the beginning of b, then (2) applying
the fundamental relation to the resulting (mj1)-block either kbm or kbmj1
times, and finally (3) deleting the entry at the end of the transformed (mj1)block. Again, for a given m-block b there are at most four m-blocks bh such that
b
r
bh. Moreover, the conclusion of the preceding lemma holds with all the arrows
replaced by
r
.
We may now give an algorithm for determining the admissible m-blocks.
A. Set h to be the set of all m-blocks with 0 –1 entries. Update h by
appending to it the set of all bh ? such that b bh for some b ? h. Continue
updating h in this manner until it stabilizes. Next, set d to be the set of all 0, 1
r
m-blocks. Update d by appending to it the set of all bh ? E h such that b bh
for some b ? d. Continue updating d in this manner until it stabilizes. Finally,
l d.
Table 1 shows all Pisot numbers between 1 and 2 of degree at most 4 whose
minimal polynomial has coefficients at most 2 in absolute value, together with the
cardinality of
and the value of CJ.
The following result will not be needed in the analysis to follow, but it is essentially
equivalent to Proposition 7.
P 9. There is a deterministic finite automaton such that the language
accepted by is the set of all finite sequences γ l γ γ … γn with entries γi l (εi, εi ) such
" #
that ε " εh, where
ε l ε ε … εn, εh l ε ε … εn.
" #
" #
See [13, Section 2.2] for the definition of a finite automaton and the language it
accepts.
.
642
6. Counting equialence classes by matrix multiplication
We shall now show that the cardinality of the equivalence class of a given
sequence of zeroes and ones may be represented as a certain natural matrix product.
Two matrices, which we shall call M and M , are involved in these products. The
!
"
rows and columns are indexed by the set
of admissible m-blocks. By the results of
the preceding section, if β l βm is one of the simple Pisot numbers, then the admissible
m-blocks are the sequences of length m with entries in ok1, 0, 1, 2q, subject to the
restrictions that (1) any 2 that occurs must be followed only by zeroes and preceded
only by ones ; and (2) any k1 that occurs must be followed only by ones and preceded
only by zeroes. Note that for the case of the golden ratio β there are eight such
#
sequences,
(0, 0),
(1, 0),
(0, 1),
(1, 1),
(1, 2),
(0, k1),
(2, 0),
(k1, 1).
In general, entries of the matrices Mi are zeroes and ones ; they indicate whether
transitions between the various m-blocks can be made when i is the entry immediately
to the right of the current m-block. Allowable transitions may be described as follows.
(1) If the leading entry of the current m-block is a 0 or 1, the transition that
concatenates i to the m-block at the end and deletes the initial entry is allowable.
Thus, for example, when β l β , the transition (1, 0) (0, i) is allowable. (2) A
#
transition that concatenates i to the m-block at the end, then applies the fundamental
relation an integral number of times to the resulting (mj1)-block, then deletes the
initial entry is allowable if and only if the deleted entry is a 0 or 1 and the resulting
m-block is admissible. For example, when β l β , if i l 1 then the transition
#
(2, 0) (1, 2) is allowable, but (2, 0) (0, 1) is not, because the deleted initial entry
in the latter case would be a 2, nor is (1, 1) (2, 2), because (2, 2) is not an admissible
m-block.
Note that although the matrices are large (8 by 8 in the case of the golden ratio
and correspondingly larger for the other βm) they are ‘ sparse ’ : for each block b and
each i l 0, 1 there are at most two blocks bh such that the entry Mi(b, bh) is 1. Thus,
to describe Mi it is easier to make a list of allowable transitions than to write out the
entire matrix.
E. β l β .
#
Table 2 shows the allowable transitions.
T 2
From
(0, 0)
(0, 1)
(1, 0)
(1, 1)
(2, 0)
(1, 2)
(0, k1)
(k1, 1)
M
!
To
(0,
(1.
(0,
(1,
(1,
(2,
0)
0) (0, k1)
0) (1, 1)
0)
1)
0)
(0, k1)
From
(0, 0)
(0, 1)
(1, 0)
(1, 1)
(2, 0)
(1, 2)
(0, k1)
(k1, 1)
M
"
To
(0,
(1,
(0,
(1,
(1,
1)
1) (0, 0)
1) (1, 2)
1)
2)
(k1, 1)
(0, 0)
643
Notice that in M there are no allowable transitions from the m-block (0, k1).
!
This is because a k1 must always be followed by a 1 in an m-block. But notice also
that there is an allowable transition from (0, k1) in the matrix M .
"
P 10. Let ε l ε ε … εn+m be an arbitrary sequence of zeroes and ones.
" #
Then the number of sequences of zeroes and ones equialent to ε and ending in the mblock b l ε ε … εm is the (b , b )th entry of the matrix product Mε Mε … Mε ,
"
"#
! "
m+"
m+#
n+m
where b l ε ε … εm.
!
" #
It obviously follows that the total number of 0 –1 sequences equivalent to ε
is the sum over all m-blocks b of the (b , b )th entries of the matrix product
"
! "
Mε Mε … Mε .
m+"
m+#
n+m
Proof. We prove a slightly more general statement, specifically, that for any
admissible m-blocks b , b the (b , b )th entry of Mε Mε … Mε is the number of
! "
! "
m+"
m+#
n+m
allowable left-right transformations of the sequence b εm+ εm+ … εn+m ending in a
! " #
sequence whose first n entries are zeroes and ones and whose last m entries are the mblock b . The proof is by induction on n 1. The case n l 1 is easily checked : the
"
matrices M and M were defined in such a way that this would be true.
!
"
The induction step is also easy. Assume that it is true for all integers less than n
and let ε l ε ε … εn+m be a sequence of zeroes and ones. Then we know by the
" #
induction hypothesis that the number of canonical (left-right) transformations of the
sequence b εm+ εm+ … εn+m− , that move the ‘ cursor ’ nk1 steps to the right and result
! " #
"
in a sequence beginning with nk1 zeroes and ones and end in a given block b, is the
(b , b)th entry of Mε … Mε
. To obtain the number of canonical transformations
!
m+"
n+m−"
of b εm+ εm+ … εn that move the cursor n steps to the right and transform the final
! " #
m-block to b , partition the count by the possible values of the next-to-last m-block
"
b (when the cursor is nk1 units to the left). Regardless of the steps taken to reach b
the number of ways to go from b to b is given by the (b, b )th entry of Mε , again
"
"
n+m
by the induction hypothesis. (Notice that, once the first nk1 steps of the substitution
sequence have been made, the first nk1 entries of the resulting transformed sequence
play no further role, since the cursor has now moved to their right. Therefore, even
though different substitution sequences may result in transformed sequences with
different initial (nk1)-blocks, as long as they result in the same m-block b to the right
of the cursor the number of ways to complete the transformation will be the same for
each.) Finally, summing over all possible m-blocks b and using the definition of
matrix multiplication one obtains the desired equality, completing the inductive phase
of the proof.
It now follows that the asymptotic behavior of the random variable πn(ε) when
ε , ε , … is a sequence of i.i.d. Bernoulli-" random variables is determined by that of
" #
#
the random matrix product
Πn l Mε
m+"
Mε
m+#
… Mε .
n
The asymptotic behavior of these random matrix products is in turn described by the
Furstenberg–Kesten theorem ([10] ; also [4, Chapter 1]). This theorem implies that
1
lim log RΠnR l λ,
n
n _
(9)
.
644
where
1
λ l lim E log RΠnR
n
n _
is the ‘ top Lyapunov exponent ’ of the sequence Πn. Neither the convergence nor the
value of λ is affected by the choice of norm.
If the entries of Πn were eventually positive, with probability 1, then not only the
norm but also the individual entries would grow exponentially at the rate λ. This is
not the case, however. For example, examination of the entries of M , M in the case
!
"
β l β (see above) shows that the (1, 1) and (1, 0) columns of Πn cannot both have
#
positive entries. But the (Euclidean) norms of the rows do grow exponentially at rate
λ, as the following result shows. For any b ? , let ub be the vector with bth entry 1
and all other entries 0.
P 11. For each b ?
, we hae
lim Rub ΠnR"/n l eλ.
_
n
Proof. First, we shall argue that for each pair b, bh of admissible m-blocks there
exists a sequence i i … ik of zeroes and ones such that the (b, bh) entry of Mi Mi … Mi
"#
k
"
#
is positive. Write b bh if there is such a sequence. Let b, bh be arbitrary admissible
m-blocks. By the definition of , for each b ?
there exist m-blocks bd, be with
0 –1 entries such that b bd and be b. Consequently, it suffices to prove the
contention only for pairs of m-blocks b, bh with all entries in o0, 1q. But if b, bh have
only 0 –1 entries, then it is certainly true that b bh, because all the m-blocks in the
concatenation bbh are 0 –1 m-blocks and are therefore admissible ; and hence, if bh l
i i … im, then the b, bh entry of Mi Mi … Mi is positive.
"#
m
"
#
Next, we shall argue that with probability 1 there exists an admissible m-block b
(possibly random) so that
lim inf Rub ΠnR"/n eλ.
(10)
_
n
Since the entries of Πn and ub are all nonnegative, and since b? ub is the vector with
all entries 1, it follows that Rb? ub ΠnR RΠnR. Consequently, by the Furstenberg–
Kesten theorem, there exists an admissible m-block b such that
lim sup Rub ΠnR"/n eλ.
_
n
But since RΠnR"/n e , an elementary argument shows that lim sup may be replaced
by lim inf in the above equation.
The proposition follows from the results of the last two paragraphs. Choose an
admissible m-block bJ such that, with positive probability, (10) holds for b l bJ. Fix
any b ? ; then by the result of the first paragraph (and the fact that the matrices in
the product Πn are i.i.d.) there exist (random) N N ( such that the bJth entry
"
#
of ub ΠN is a least 1. Hence, for each j l 1, 2, … and each n 1, we have
λ
j
Rub ΠN +nR RubJ Π−N" ΠN +nR.
j
j
j
645
Now consider the sequence of sequences oubJ Π−N" ΠN +nqn , for j l 1, 2, … . By the
"
j
j
Kolmogorov 0 –1 Law, this sequence is ergodic, since the invariant σ-algebra is
contained in the tail σ-algebra of the sequence ε , ε , … . Moreover, by the choice of
" #
bJ, the probability that (10) holds for any one of these sequences is p 0.
Consequently, by the Birkhoff ergodic theorem, with probability 1 there exists j 1
such that
lim inf RubJ Π−N" ΠN +nR"/n eλ.
n
j
_
j
The proposition now follows directly from this and the last displayed inequality.
Let
be the set of all admissible m-blocks with only 0 –1 entries, and let l
!
b? ub. Let 1 l b? ub be the vector with all entries 1, and let w l 1k.
!
C 3. For each b ?
!
, we hae
lim (ub Πn )"/n l eλ.
n
_
Proof. Recall that for each b ? there exist a finite 0 –1 sequence ib l i i … iJ
"#
and an admissible m-block bh with 0 –1 entries such that the (b, bh) entry of Mi Mi (
"
#
Mi is at least 1. Moreover, there exists K _ such that the length J of the string ib
J
satisfies J K for all b ? , because there are only finitely many b ? .
Let u be a vector with nonnegative entries, not all 0. For each k 1 let b(k) ? be
such that the b(k) entry of uh Πk is maximal. (Note that b(k) is random.) For each
k 1 it may happen, with probability at least 2−K, that εk+j l ij B1 j J, where
i i … iJ is a 0 –1 sequence as in the previous paragraph, for b l b(k). Denote this event
"#
by Gk. Since P(Gk) 2−K for each k, and since the events GK, G K, … are independent,
#
the Borel–Cantelli lemma implies that
G )c i.o.) l 0.
P((n−K
k=n−Nn k
We shall argue that on the event n−K
G,
k=n−Nn k
uh Πn Q
Ruh ΠnR
.
Q$ maxn−Nnkn RΠ−k " πnR
This will prove the corollary, because by the preceding proposition Ruh ΠnR"/n
and, by Lemma 7 below,
lim sup
n
_
max
n−Nnkn
(11)
eλ,
RΠ−k " ΠnR"/n 1.
On the event Gk, there is at least one bh ?
and 1 J K such that the bh entry
!
of uh Πk+J is at least as large as the b(k) entry of uh Πk. Since bh ? , there is at least one
!
" Π is at least 1. Consequently,
bd ?
such that the (bh, bd) entry of Π−k+J
n
!
uh Πn uh Πk+J ub, uh Πk ub(k) Ruh ΠkR\Q Q,
the last inequality because of the choice of b(k). On the other hand, no entry of uh Πn
can be larger than Ruh ΠkR RΠ−k " ΠnR, and hence
Ruh ΠnR Q
Q# Ruh ΠkR RΠ−k " ΠnR.
.
646
The inequality (11) follows from the last two displayed inequalities.
L 7. lim supn
_
maxn−Nnkn RΠ−k " ΠnR"/n 1.
Proof. Recall that the random matrices Mε can assume only two values, so
i
E RMε R l ρ _. Now each Π−k " Πn is the product of independent copies of Mε , so
i
"
for any nkNn k n and any ε 0, Markov’s inequality implies that
PoRΠ−k " ΠnR (1jε)nq ρNn(1jε)−n.
The probability that RΠ−k " ΠnR (1jε)n for some nkNn k n cannot be more
than Nn times this. Because n(Nn)ρNn(1jε)−n _, it follows from the
Borel–Cantelli lemma that, with probability 1, we have
max
n−Nnkn
RΠ−k " ΠnR (1jε)n
for only finitely many n.
T 3. Let λ be the top Lyapuno exponent for the sequence Πn of random
matrix products defined aboe. If ε l ε ε …, where ε , ε , … are i.i.d. Bernoulli-" then,
" #
" #
#
with probability 1 we hae
lim πn(ε)"/n l "eλ l α.
#
_
(12)
n
Notice that this proves Theorem 1 in the special case p l ".
#
Proof. By Proposition 10, the cardinality of the equivalence class of ε ε … εn is
" #
ub Πn , where b l ε ε … εm. Consequently, πn(ε) l 2−n ub Πn . By the preceding
"λ #
corollary, πn(ε)"/n e \2.
7. Calculating the weight of an equialence class
For values of p other than " the probabilities πn(ε) cannot be obtained merely by
#
enumerating the equivalence classes of the various sequences. Nevertheless, it is still
possible to represent them in terms of a matrix product, with matrices similar,
structurally, to those used to enumerate the classes.
Let ε l ε ε … εn be a given sequence of zeroes and ones ; its probability under the
" #
Bernoulli-p measure is pS(ε)qn−S(ε), where S(ε) l ni= εi is the number of ones in ε.
"
Observe that not all sequences of length n equivalent to ε have the same
probabilities : for instance,when β l β the sequences 100 and 011 are equivalent, but
#
the first has probability pq# while the second has probability p#q.
Suppose in general that ε "εh ; then by Corollary 1 there is a sequence of left-right
substitutions based on the fundamental relation, some positive and some negative,
transforming ε to εh. Every time a single positive substitution is made, the likelihood
of the sequence is multiplied by the factor ( p\q)κ, where κ is the number of ones on
the left-hand side of the fundamental relation minus the number of ones on the righthand side. This is because each time the fundamental relation is applied (in the
positive direction) there is a net increase of κ in the number of ones and a net decrease
of κ in the number of zeroes. Similarly, every time a single negative substitution is
647
made, the likelihood is multiplied by (q\p)κ. Hence the likelihood of εh is ρ times the
likelihood of ε, where ρ l ( p\q)Nκ and N is the total number of positive substitutions
minus the total number of negative substitutions in the transformation from ε to εh.
In the last section we defined matrices Mi with 0 –1 entries to indicate whether
transformations between different m-blocks could occur when the entry to the right
of the block was i. We found that the entries of products of these matrices give the
number of ways to do substitutions left to right on the original sequence and arrive at
given m-blocks. If we want to tally total likelihood (relative to the parameter p)
instead of cardinality, then instead of ones as entries we should put a positive number
indicating the (multiplicative) effect on likelihood. Thus, we redefine the matrices M
!
and M as follows. For admissible m-blocks b and bh let the (b, bh)th entry of Mi be
"
zero when the transition b bh is not allowable ; piq"−i when the transition is allowable
and no substitution is made in the transition ; and piq"−iiν when the transition is
allowable and a substitution is made, where ν l ( p\q)lκ and l is the number of
substitutions made in the transition (l may be positive or negative). For example,
when β l β , we have κ lk1, so the nonzero entries of M and M are as given in
#
!
"
Table 3.
T 3
From
(0, 0)
(0, 1)
(0, 1)
(1, 0)
(1, 0)
(1, 1)
(2, 0)
(1, 2)
(0, k1)
(k1, 1)
M
!
To
Entry
From
0)
0)
k1)
0)
1)
0)
1)
0)
q
q
q#\p
q
p
q
p
q
(0, k1)
q#\p
(0, 0)
(0, 1)
(0, 1)
(1, 0)
(1, 0)
(1, 1)
(2, 0)
(1, 2)
(0, k1)
(k1, 1)
(0,
(1,
(0,
(0,
(1,
(1,
(1,
(2,
M
"
To
(0,
(1,
(1,
(0,
(1,
(1,
(1,
Entry
1)
1)
0)
1)
2)
1)
2)
p
p
q
p
p#\q
p
p#\q
(k1, 1)
(0, 0)
p
q
Note that the positive entries occur in exactly the same locations as for the
matrices defined in the previous section. The (0, k1) row of M has no nonzero
!
entries, since there are no allowable transitions from this 2-block when the next entry
of the sequence is a 0, nor does the (1, 2) row of M .
"
For any admissible m-block b l ε ε … εm define the (row) vector b to have bth
"
#
entry piqm−i, where i is the number of ones in ε ε … εm, and all other entries zero.
" #
Define w to be the column vector with entries 1 for admissible m-blocks with no
entries k1 or 2, and all other entries 0.
P 12. Let ε l ε ε … εn be a sequence of zeroes and ones, and let
" #
b l ε ε … εm. Then
!
" #
πn(ε) l b Mε Mε … Mε w.
!
m+"
m+#
n
The proof is virtually the same as that of the corresponding result in the previous
section. There are analogues of all the results of the preceding section. Let λ be the
top Lyapunov exponent of the sequence Πn l Mε Mε ( Mε , and let ub, b ?
m+"
m+#
n
and l ub be defined as in Proposition 11 and Corollary 3.
!
.
648
P 13. For each b ?
!
, we hae
lim (ub Πn )"/n l eλ.
n
_
T 4. Let λ be the top Lyapuno exponent for the sequence Πn of random
matrix products defined aboe. If ε l ε ε … , where ε , ε … are i.i.d. Bernoulli-p, then
" #
" #
with probability 1 we hae
lim πn(ε)"/n l eλ l α.
n
(13)
_
This completes the proof of Theorem 1.
It should now be clear how to modify the approach for other 0 –1 stochastic
sequences, in particular, k-step Markov chains. If k m then it is necessary to index
the rows and columns by (kj1)-blocks rather than m-blocks, to keep track of the
likelihoods involved. It should also be clear that the whole approach could be adapted
to sequences of random variables valued in other finite subsets of than o0, 1q.
8. Computing the Lyapuno exponent
Computation of Lyapunov exponents is a difficult problem, even for small
matrices. However, because of the special structure of the matrices arising in Sections
6 and 7, computation to a reasonable degree of accuracy is possible for simple Pisot
numbers βm of small degree m. In this section, we restrict our discussion to the cases
β l βm, with m 2.
Fix a value of the Bernoulli parameter p and let ε , ε , … be i.i.d. Bernoulli-p
" #
random variables. Let M and M be the matrices defined in the preceding section.
!
"
Our problem is to calculate the top Lyapunov exponent of the sequence
Πn l Mε Mε … Mε .
"
#
n
By Proposition 13, for any vector u with nonnegative entries, not all 0, we have
1
λ l lim log RuΠnR.
n
n _
(14)
A propitious choice is the vector u with (0, 0, … , 0) entry q, (1, 1, … , 1) entry p, and
all other entries 0. (Recall that the rows and columns of the matrices Mi are indexed
by admissible m-blocks, hence so are the entries of row vectors.)
The reason for the peculiar choice of u is that the random sequence of vectors uΠn
visits the ray through u infinitely often.
L 8. Let T be the infimum of the set of integers n 1 such that uΠn is a scalar
multiple of u. Then
ET _ ;
in particular, T
_ with probability 1.
The proof will be given later. In fact, we shall obtain an explicit algebraic (in p)
expression for ET : thus, for instance, when m l 2 and p l ", then ET l 12.
#
649
The existence of such a stopping time allows one to re-express the Lyapunov
exponent in a simpler form. The vector uΠT is a scalar multiple of the starting vector
u. Since scalars may be factored out of matrix products, this implies that the process
uΠn effectively ‘ regenerates ’ at time T : specifically, the law of the sequence
uΠT+n
, n0
RuΠTR
is the same as that of the original sequence uΠn with n 0. It follows that there are
infinitely many times at which uΠn is a scalar multiple of u ; label these times T l
!
0 T l T T T ( _. At each of these times the sequence regenerates. For
"
#
$
each n 1, we have
RuΠT R
1
1 n
j ,
log RuΠT R l log
n
Tn
Tn j=
RuΠT R
"
j−"
and the summands are i.i.d. with finite expectation (this because ET _). Dividing
numerator and denominator by n and using the strong law of large numbers on each,
one obtains the limit in (14) along the subsequence Tn as a ratio of expectations. But
we know the limit in (14) exists, by the Furstenberg–Kesten theorem, so the limit of
the whole sequence must be given by the same ratio of expectations. (Notice that a
standard argument shows that the limit exists without reference to the Furstenberg–
Kesten theorem.) We summarize this in the following corollary.
C 4. We hae
E log RuΠTR
ET
(15)
E log RuΠTR
.
(log θ) ET
(16)
λl
and
δl
Since we have an exact expression for ET, only the expectation in the numerator
requires estimation. Unfortunately, no further simplification seems possible : the
numerator can only be estimated by simulation or summation over paths. It should
be remarked, however, that this is a significant improvement on the crude
representation of λ as limn _ n−" E log RΠnR, because, in general, one cannot say how
fast the convergence in this limit is.
The numerator may, in general, be estimated by simulation with very little effort.
Greater or lesser precision may be obtained by adjusting the number of replications.
To obtain estimates with accuracy (to confidence level .99) to within p.002 requires
on the order of one million replications (depending on the values of p, m). Results for
m l 2 and various values of p are reported in Table 4.
T 4
p
dimension
estimated error
.5
.4
.3
.2
.1
.05
.9954
.9868
.9501
.8499
.6085
.3877
.0008
.001
.002
.004
.008
.003
650
.
We have conducted simulations for all rational values of p between 0 and .5 with
denominators less than 12 and also for .01, .02, … , .09 ; these seem to indicate that the
dimension is a strictly increasing function of p ? (0, .5]. As yet we have no rigorous
argument for this conjecture.
Proof of Lemma 8. We shall discuss only the case m l 2, the general case being
completely similar but requiring more cumbersome notation. Thus, u is the vector
with (0, 0) entry q, (1, 1) entry p, and all other entries 0.
Let un l uΠn. We shall say that any admissible 2-block for which the
corresponding entry of un is positive occurs in un. Not all admissible 2-blocks may
occur simultaneously in un : for instance, 11 and 10 cannot occur simultaneously.
However, all admissible m-blocks may occur in some un (see the proof of Proposition
11).
Since the random variables ε , ε , … are i.i.d. Bernoulli-p, the sequence must, with
" #
probability 1, contain arbitrarily long blocks of ones. Consider what happens to the
vectors un when such a long block of ones occurs. If either of the blocks (0, k1) or
(k1, 1) occurs in un, then after one or two successive ones these blocks will be
converted to the block (0, 0). If either of the blocks (2, 0) or (1, 2) occurs in un, then
after one or two successive ones they will be ‘ killed ’. If any of the four blocks with
no 2 or k1 entries occurs in un then after either one or two successive ones all will
be converted to one of the blocks (0, 0), (0, 1), (1, 1). Consequently, regardless of
which blocks occur in un, if εn+ l εn+ l 1 then only 00, 01, and 11 can occur in un+ .
"
#
#
Moreover, the blocks 00 and 01 cannot occur simultaneously. Note that if the two
blocks that occur in un+ are 00 and 11, then if εn+ l 1, the blocks that occur in un+
#
$
$
must be 01 and 11. Thus, with probability 1, for some n the vector un will have positive
entries in the 01 and 11 entries and all other entries zero.
Now consider what happens when un has positive entries in the 01 and 11 entries
and all other entries zero, and εn+ l εn+ l 0. Using Table 2 in Section 6, one easily
"
#
verifies that un+ must have 11 entry pq(un(01)jun(11)), 00 entry q#(un(01)jun(11)),
#
and all other entries zero. Thus, un+ is a scalar multiple of u.
#
The arguments of the last two paragraphs show that, depending on the
composition of un, a regeneration will occur if either (a) εn+ l 1 and εn+ l εn+ l 0,
"
#
$
or (b) εn+ l εn+ l 1 and εn+ l εn+ l 0. Elementary arguments show that this
"
#
$
%
must happen eventually, with probability 1 ; in fact, that the expected time until it
happens is finite.
The preceding argument shows that the regeneration event is determined by the
set of admissible m-blocks that occur in a given un and the subsequent pattern of
zeroes and ones in the sequence εj, but not on the actual coefficients of the m-blocks
that occur in un. It follows that T is a stopping time for the Markov chain whose state
at any time n is the set of admissible m-blocks that occur in un. It follows from
elementary Markov chain theory that ET may be computed by solving a simple
matrix equation. In the special case m l 2, there are 7 distinct sets of admissible mblocks that may occur simultaneously in un ; the transition probabilities between these
sets are easy to write down, and the resulting matrix equation is easily solved via
Mathematica. This yields the identity (when m l 2)
ET l
k2kpjp#
.
kpj2p#k2p$jp%
(17)
651
Similar formulas may be obtained for arbitrary m, although the size of the matrix
equation that must be solved grows with m.
9. The associated graph
In this section we indicate another approach to the representation of the
probabilities πn(ε) by matrix products. This approach is essentially geometric in
nature, relying on simple properties of a natural graph associated with the Pisot
number β ? (1, 2). In the special case when β is the golden ratio, this graph is the
‘ Fibonacci tree ’ exploited by Alexander and Zagier [2].
The (directed) graph Γ l (, ) is defined as follows. The vertex set is the
union of countably many finite sets n with n 0 ; the elements of n are the possible
sums xn(ε) l nk= εk θk, where ε is a 0 –1 sequence. The (directed) edges connect
"
vertices at depths n and nj1 : there are edges from xn(ε) to xn+ (ε) for every ε ? Σ, and
"
no others. To each edge from xn(ε) to xn(ε)jθn+" attach weight p, and to each edge
from xn(ε) to xn(ε) attach weight q l 1kp.
For any two vertices xn(ε), xn(εh) at the same depth n, define their distance
ρ(xn(ε), xn(εh)) by
ρ(xn(ε), xn(εh)) l βn Qxn(ε)kxn(εh)Q.
Fix a constant κ 2θ\(1kθ). For any vertex xn(ε) define its neighborhood
(xn(ε)) l κ(xn(ε)) to be the set of vertices xn(εh) at the same depth such that
ρ(xn(ε), xn(εh)) κ. Say that two vertices xn(ε), xk(εh) (not necessarily at the same
depth) have the same neighborhood type if there is a bijective mapping between
(xn(ε)) and (xk(εh)) that preserves the distance function ρ.
P 14. There are only finitely many neighborhood types, that is, there is
a finite set of ertices such that eery ertex of Γ has the same neighborhood type as
one of the ertices in .
R. The reader should notice the similarity with [5, Theorem], concerning
the Cayley graph of a finitely generated, discontinuous group of isometries of a
hyperbolic space.
Proof. This follows from Garsia’s lemma, which implies that there is a lower
bound d on the ρ-distance between distinct vertices in any neighborhood. Consider
the neighborhoods of a vertex xn+k(ε) and h of the vertex xk(σnε) (here σ is the
shift operator). There is a distance-preserving injection h , because there is a
copy Γh of the graph Γ embedded in Γ emanating from the vertex xn(σnε).
Consequently, for any sequence ε l ε ε … ? Σ and each n 1 there is a chain of
" #
distance-preserving injections
(x (σn−" ε)) ,- (x (σn−#ε)) ,- … ,- (xn(ε)).
"
#
By Garsia’s lemma, all sufficiently long chains must stabilize, that is, there is a finite
integer k such that all the injections after the kth in any such chain must be bijections
(if not, there would be neighborhoods with arbitrarily large cardinalities). It follows
that there are only finitely many neighborhood types.
Let be the (finite) set of possible neighborhood types.
.
652
For any vertex xn+ (ε) at depth nj1 1, let B(xn+ (ε)) 9 n be the set of all
"
"
vertices at depth n from which emanate directed edges of Γ leading into (xn+ (ε)).
"
L 9. B(xn+ (ε)) 9 (xn(ε)).
"
Proof. Observe that for any vertex xn+ (εh) at depth nj1 there are at most 2, and
"
at least 1, directed edges from Vn to xn+ (εh). If εn+ l 1, then there must be a p-edge
"
"
from xn(εh) to xn+ (εh), and there may be a q-edge from some xn(εd) to xn+ (εh) ; if
"
"
εn+ l 0, then there must be a q-edge from xn(εh) to xn+ (εh), and there may be a p-edge
"
"
from some xn(εd) to xn+ (εh). Consequently, B(xn+ (ε)) is finite.
"
"
Now consider any directed edge from a vertex xn(εh) ? Vn leading into (xn+ (ε)).
"
Depending on whether it is a p-edge or a q-edge,
or
Qxn(εh)jθn+"kxn(ε)kεn+ θn+"Q
"
Qxn(εh)kxn(ε)kεn+ θn+"Q
"
κθn+"
κθn+"
2θn+#\(1kθ)
2θn+#\(1kθ).
In either case,
βn Qxn(εh)kxn(ε)Q
θj2θ#\(1kθ)
2θ\(1kθ)
κ,
since κ 2θ\(1kθ).
C 5. For any 0 –1 sequence ε l ε ε … , the neighborhood type
" #
(xn+ (ε)) is determined by the neighborhood type (xn(ε)) and the entry εn+ .
"
"
For any infinite 0 –1 sequence ε and each n 1, define Yn(ε) to be the vector of
probabilities
Yn(ε) l (πn(εh))(x (ε)),
n
where the entries are indexed by the elements of (xn(ε)) (for each element xn(εh) of
(xn(ε)), choose one representative εh and let πn(εh) be the entry for that element). The
probabilities πn(ε) are computed under the Bernoulli-p measure (the measure on
sequence space Σ making ε , ε , … , i.i.d. Bernoulli-p random variables).
" #
L 10. There exist nonnegatie matrices L,i, for ? and i l 0, 1, such
that for eery 0 –1 sequence ε and eery n 1, the identity
is satisfied.
Yn+ (ε) l L(x (ε)), ε Yn(ε)
"
n
n+"
(18)
Proof. For any sequence εh l ε ε … the probability πn+ (εh) is obtained
" #
"
by summing the probabilities of all sequences εd such that xn+ (εd) l xn+ (εh). In
"
"
geometric terms, πn+ (ε) is obtained by summing all πn(εh) w for depth-n vertices xn(εh)
"
such that there is an edge from xn(εh) to xn+ (ε) ; and w l p or q, depending on the
"
weight attached to the edge. Note that there are at most two, and at least one, terms
in this sum. Moreover, by (9), the factors πn(εh) in the two terms are both entries of
the vector Yn(ε). It is clear that these equations may be written in the matrix form (18),
with the matrix L(x (ε)), ε having rows indexed by elements of (xn+ (ε)), columns
"
n
n+"
indexed by elements of (xn(ε)), and entries 0, p, 1kp.
Note that this argument used the assumption that µ is a Bernoulli measure.
Generalization to arbitrary shift-invariant measures on Σ would seem to be
653
problematic. However, if µ is a Markov measure (that is, under µ the coordinate
process ε , ε , … is a k-step Markov chain for some k _), then there is an obvious
" #
generalization of Lemma 10. We refrain from giving the details.
Lemma 10 provides a representation of the probability vector Yn(ε) in terms of
matrix products, but, unfortunately, the matrices need not be square. Nevertheless,
it is possible to use the theory of random matrix products to determine the asymptotic
behavior of the sequence Yn(ε). The idea is to embed the vectors Yn(ε) into vectors
Yn(ε) whose entries are indexed by elements of the union of all possible neighborhood
types τ ? , and to embed the matrices L,i (as blocks) into matrices i with rows and
columns indexed by elements of . The entries of Yn(ε) are all zero except for those
indexed by elements of the neighborhood type (xn(ε) ; these entries have the same
values as the corresponding entries of Yn(ε). Then Lemma 10 implies that Yn+ (ε) l
"
ε Yn(ε), and consequently, we have the following result.
n+"
P 15. If n k, then
Yn(ε) l ε ε
n
n−"
( ε
Y (ε).
k+" k
(19)
Proposition 15 provides another representation of πn(ε) in terms of a random
matrix product (recall Propositions 10 and 12), as the value of πn(ε) is one of the
entries of the vector Yn(ε). Although this representation is in some ways more natural,
and easier to derive, it seems less useful for actual computations. This is because the
matrices i are, in general, much larger than the matrices in the products in
Propositions 10 and 12, and enumeration of the neighborhood types may be
practically more difficult than the enumeration of the admissible m-blocks for a
given β.
Acknowledgements. Thanks to Irene Hueter for help with some of the numerical
computations and for valuable conversations. Thanks to Yuval Peres for the
reference to [2], and to Boris Solomyak for pointing the author to [9] after seeing an
earlier version of this manuscript.
References
1. J. A and A. Y, ‘ Fat baker’s transformations ’, Ergodic Theory Dynamical Systems 4
(1984) 1–23.
2. J. A and D. Z, ‘ The entropy of a certain infinitely convolved Bernoulli measure ’, J.
London Math. Soc. (2) 44 (1991) 121–134.
3. T. B, ‘ On Weierstrass-like functions and random recurrent sets ’, Math. Proc. Cambridge
Philos. Soc. 106 (1989) 325–342.
4. P. B and J. LC, Products of random matrices (Birkhauser, Boston, 1985).
5. J. C, ‘ The combinatorial structure of co-compact discrete hyperbolic groups ’, Geom. Dedicata
16 (1984) 123–148.
6. P. E$ , ‘ On a family of symmetric Bernoulli convolutions ’, Amer. J. Math. 61 (1939) 974–976.
7. P. E$ , ‘ On the smoothness of properties of a family of Bernoulli convolutions ’, Amer. J. Math.
62 (1940) 180–186.
8. K. F, The geometry of fractal sets (Cambridge University Press, 1985).
9. C. F, ‘ Representations of numbers and finite automata ’. Math. Systems Theory 25 (1992)
37–60.
10. H. F and H. K, ‘ Products of random matrices ’, Ann. Math. Statistics 31 (1963)
457–469.
11. A. G, ‘ Arithmetic properties of Bernoulli convolutions ’, Trans. Amer. Math. Soc. 102 (1962)
409–432.
12. A. G, ‘ Entropy and singularity of infinite convolutions ’, Pacific J. Math. 13 (1963) 1159–1169.
13. J. H and J. U, Introduction to automatic theory, languages, and computation (AddisonWesley, Reading, 1979).
654
14. R. K and Y. P, ‘ Intersecting random translates of invariant Cantor sets ’, Inent. Math. 104
(1991) 601–629.
15. W. P, ‘ On the β expansions of real numbers ’, Acta Math. Acad. Sci. Hun. 11 (1964) 401–416.
16. W. P, ‘ Intrinsic Markov chains ’, Trans. Amer. Math. Soc. 112 (1964) 55–64.
17. K. P, Ergodic theory (Cambridge University Press, 1983).
18. F. P and M. U, ‘ On the Hausdorff dimension of some fractal sets ’, Studia Math. 93
(1989) 155–186.
19. A. R, ‘ Representations for real numbers and their ergodic properties ’, Acta Math. Acad. Sci.
Hung. 8 (1957) 477–493.
20. R. S, Algebraic numbers and Fourier analysis (Heath, Lexington, 1963).
21. C. L. S, ‘ Algebraic integers whose conjugates lie in the unit circle ’, Duke Math. J. 11 (1944)
597–602.
22. B. S ‘ On the random series Σpλn, Ann. of Math. (2) 142 (1995) 611–625.
23. P. W, An introduction to ergodic theory (Springer, New York, 1982).
24. L.-S. Y, ‘ Dimension, entropy, and Lyapunov exponents ’, Ergodic Theory Dynamical Systems 2
(1982) 109–124.
Department of Statistics
Mathematical Sciences Building
Purdue University
West Lafayette
Indiana 47907
USA
E-mail : lalley!stat.purdue.edu
© Copyright 2026 Paperzz