Random sequence generation using the context

ITW 1998, San Diego, CA, February 8 – 11
Random sequence generation using the context-tree weighting method
Frans M.J. Willems
Department of Electrical Engineering, Eindhoven University of Technology, Eindhoven, The Netherlands
Abstract — We discuss generating pseudo-random
sequences with the context-tree weighting method.
The randomness of the constructed sequences is determined by the fact that they are not compressible
over the class of tree sources. i.e. not by the contexttree weighting method.
What we achieve by this method is that the constructed
sequence xT1 satisfies
log
¢
Pws =
Ω
1
P (a , b )
2 e s s
Pe (as , bs )
+ 12 Pw0s Pw1s
for 0 ∑ l(s) < D,
for l(s) = D,
where l(s) is the depth of node s. The probability Pw∏ in the
root ∏ of the context tree serves as coding probability. It
follows from (23) and (25) in [4] that1
log
1
Pw∏ (xT1 |x01°D )
log
∑
°D (S) + |S|∞(
Pa (xT1 |x01°D , S, £S )
T
),
|S|
(2)
for all xT1 2 {0, 1}T , for any sequence of past symbols x01°D ,
and relative to all tree sources with model S 2 CD and parameter vector £S . Here CD is the class of tree models with
depth not exceeding D. Moreover the cost of a model S 2 CD
is defined as
¢
°D (S) = |S| ° 1 + |{s : s 2 S, l(s) 6= D}|,
(3)
and the function
¢
∞(z) =
Ω
z
1
2
log z + 1
for 0 ∑ z < 1
for z ∏ 1.
It is our objective now to construct a sequence that is incompressible over the class of tree sources with models in
CD . We do this by applying the CTW-method. First we
fix some past sequence x01°D . Suppose now that we have already constructed xt°1
. To determine xt we compute both
1
Pw∏ (xt°1
, Xt = 0|x01°D ) and Pw∏ (xt°1
, Xt = 1|x01°D ). If
1
1
t°1
t°1
∏
0
∏
Pw (x1 , Xt = 0|x1°D ) < Pw (x1 , Xt = 1|x01°D ) we set
xt = 0, if Pw∏ (xt°1
, Xt = 0|x01°D ) > Pw∏ (xt°1
, Xt = 1|x01°D )
1
1
we take xt = 1. In case of a tie we can choose whatever binary
xt we like.
base of the log is assumed 2.
(6)
for all tree models S 2 CD and parameter vectors £S , and for
all T .
To see what this means, consider model S, and let the
number of zeros and ones in xT1 that occurred in leaf s 2 S be
as and bs respectively. Then
X
(as + bs )h(
bs
1
) = log
s
s
as + b s
¶s2S ( asa+b
)as ( asb+b
)bs
s
s
1
=
log
∏
T ° °D (S) ° |S|∞(
b
1
Pa (xT1 |x01°D , S, ( a1b+b
, · · · , a |S|
))
1
|S| +b|S|
T
),
|S|
(7)
where h(c) = °(1°c) log(1°c)°c
log(c) is the binary entropy
P
function. Observe that s2S (as +bs )h(bs /as + bs )/T denotes
the empirical conditional entropy of a symbol given its context
in S in the constructed sequence xT1 . If we denote this entropy
by Ĥ·|S (xT1 |x01°D ) then
T
°D (S) + |S|∞( |S|
)
Ĥ·|S (xT1 |x01°D ) ∏ 1 °
T
,
(8)
which holds for all tree models S 2 CD . By the logarithmic
behavior of ∞(·) with T , we obtain for any ≤ > 0 for all T large
enough
Ĥ·|S (xT1 |x01°D ) ∏ 1 ° ≤.
(9)
III. Block-behavior
(4)
II. CTW-incompressible sequences
1 The
1
T
∏ T ° °D (S) ° |S|∞(
),
|S|
Pa (xT1 |x01°D , S, £S )
¢
1
°
log
s2S
(1)
(5)
If we combine this with (2) we obtain that xT1 is such that
I. Context-tree weighting
The context-tree weighting (CTW) method[4] is a universal
source coding technique matched to the class of tree sources
(the name tree sources first appears in [1]). In the CTWalgorithm the arithmetic encoder and decoder are provided
with coding probabilities that are computed using a context
tree. The set of nodes in the context tree is denoted by TD ,
D is the depth of this tree. To each node s 2 TD there corresponds a weighted probability Pws . These weighted probabilities
are defined recursively as
1
∏ T.
Pw∏ (xT1 |x01°D )
¢
Let Se = {0, 1}e for e = 0, D, i.e. Se is a full model of depth
e. Now Ĥ·|Se (xT1 |x01°D ) denotes the empirical conditional entropy of a symbol given the e preceding symbols in xT1 . Therefore for any d 2 {0, D}
Ĥd+1 (xT1 |x01°D )
∏1°
d+1
1
d+1
P
e=0,d
[°D (Se ) + |Se |∞( |STe | )]
T
,
(10)
where Ĥd+1 (xT1 |x01°D ) is the empirical entropy of symbol
blocks of length d + 1 in xT1 . This implies for any ≤ > 0
that for all T large enough
Ĥd+1 (xT1 |x01°D )
∏ 1 ° ≤.
d+1
IV. Finite-accuracy implementation
(11)
When we represent both the estimated probabilities and
weighted probabilities by f -bit floating point numbers as described in [3] we get a redundancy increase of not more than
¢
°D (S)+T (l(S)+2) times log(1/µ) where µ = 2f °1 /(2f °1 +1).
Therefore, in the case where an incompressible sequence is
to be generated, we achieve an empirical conditional entropy
satisfying
Ĥ·|S (xT1 |x01°D )
T
°D (S) + |S|∞( |S|
) + [°D (S) + T (l(S) + 2)] log( µ1 )
∏
°
=
1
1 ° [l(S) + 2)] log( )
µ
°
T
T
°D (S) + |S|∞( |S|
) + °D (S) log( µ1 )
T
[2] F.M.J. Willems, “Implementing the Context-Tree Weighting
Method,” Abstracts 1996 IEEE Inform. Theory Workshop,
Haifa, Israel, June 9-13, 1996, p.5.
[3] F.M.J. Willems, “The Context-Tree Weighting Method: Extensions,” accepted for publication in IEEE Trans. on Inform.
Theory.
[4] F.M.J. Willems, Y.M. Shtarkov and Tj.J. Tjalkens, “The
Context-Tree Weighting Method: Basic properties,” IEEE
Trans. on Inform. Theory, May 1995, pp. 653-664.
[5] J. Ziv and A. Lempel, “Compression of individual sequences via
variable-rate coding,” IEEE Trans. Inform. Theory, vol IT-24,
September 1978, pp. 530–536.
,
(12)
which is not smaller than 1 ° [l(S) + 2)] log( µ1 ) ° ≤ for any
≤ > 0 for all T large enough.
V. Exact implementation
Exact implementation of the CTW method requires a lot of
storage space and a lot of of computational power. Reduction
of both can be achieved by observing that the denominator
of an exact estimated block probability Pe (a, b) is always a
power of 2 whose exponent is not larger
° than
¢ 2a2(a + b). To
prove this observe first that Pe (a, 0) = 2a
/2 . Then note
a
that Pe (a ° b, b) = Pe (a ° b, b ° 1) ° Pe (a ° b + 1, b ° 1) for
b = 1, · · · , a. The induction hypothesis is that Pe (a ° b, b ° 1)
for b = 1, · · · , a has a power of 2 as denominator, its exponent
being not larger than 2(a ° 1). Pe (a, 0) has a 2-power as
denominator with exponent not larger than 2a and hence also
Pe (a ° 1, 1) does, hence also Pe (a ° 2, 2), etc.
Note that weighted probabilities also have a denominator
which is a power of 2. If Pws corresponds to a subsequence of
length ø and if the subtree rooted in s has i internal nodes
(including s), the exponent is not larger than 2ø + i.
VI. Ties
In case of a tie, i.e. Pw∏ (xt°1
, Xt = 0|x01°D ) = Pw∏ (xt°1
, Xt =
1
1
1|x01°D ), we can choose any symbol value for xt . This can
be regarded as a problem, but on the other hand we get the
opportunity to change the sequence xT1 , while pseudo-random
behavior as described by (8) etc. is still guaranteed.
We could even go further. By choosing the symbol value
x 2 {0, 1} that minimizes Pw∏ (xt°1
, Xt = x|x01°D ) each time,
1
∏
T
we obtain a value of ° log Pw (x1 |x01°D ) which is larger than
T . It is possible to choose ’wrong’ values of x however. We
only have to guarantee that always ° log Pw∏ (xT1 |x01°D ) ∏ T .
Choosing wrong values for x gives extra opportunities to
change the sequence xT1 .
VII. How random are the generated sequences?
Is it e.g. possible to compress these sequences by other compression methods?
First note that the restriction that the depth D of the tree
models must not exceed D is not essential (in the exact implementation). A CTW-method with “infinite D” has been described in [3]. Therefore we achieve (10) for all d = 0, 1, 2, · · ·.
Now theorem 3 in [5] states however that it is impossible for
finite state codes to achieve compression on the generated sequence.
References
[1] M.J. Weinberger, J. Rissanen, M. Feder, “A Universal Finite
Memory Source,” IEEE Trans. on Inform. Theory, May 1995,
pp. 643- 652.