Non-stochastic infinite and finite sequences

Non-stochastic infinite and finite sequences∗
V.V. V’yugin
Institute for Information Transmission Problems,
Russian Academy of Sciences,
Bol’shoi Karetnyi per. 19, Moscow GSP-4, 101447, Russia
Abstract
Combining outcomes of coin-tossing and transducer algorithms it is
possible to generate with probability close to 1 very pathological sequences
for which computable probabilistic forecasting is impossible. These sequences are not random with respect to any reasonable probability distribution. A natural consequence from the definition of such sequences is
that each simple measure of the set of all such sequences is equal to 0. It
was Kolmogorov’s and Levin’s idea to estimate the probability of generating of such sequences in the combinations of probabilistic and algorithmic
processes ([8], [14], [21]). We collect several results in this direction for
infinite sequences and asymptotic results for finite sequences including
estimation of space and time of losing randomness for time bounded forecasting systems (a correction to [22]).
1
Introduction
The problem of the existence of non-stochastic objects has been discussed in
seventies at Kolmogorov’s seminar in the Moscow State University (see also [1]
and [10]). Levin posed this problem for infinite sequences [20], [7]. Following
Levin, the main problem is to estimate the probability of generating such sequences in the combinations of stochastic and deterministic processes (see [20],
[8]). In [20] a solution of this problem for infinite sequences was obtained. In
1981 Kolmogorov proposed a concept of finite (α, β)-stochastic sequence. He
posed the problem of existence of finite sequences which is not (α, β)-stochastic.
Shen [14] showed the existence of such sequences. In [21] some estimates of
probability of generating non-stochastic sequences in combinations of stochastic
and deterministic processes were obtained.
Similar problems were considered independently in [3], [11], [14]. Dawid [2]
considered the concept of infinite calibrable sequence which is a weak concept of
∗ The research described in this publication was made possible in part by Grant INTAS,
project No. 93-0893.
1
randomness for individual sequences. The problem of existence of noncalibrable
sequences has been discussed in these papers.
In this paper we give some amplification and correction of results from [21]
and [22].
2
Algorithmic background
Let Ω be the set of all infinite binary sequences, Ξ be the set of all finite binary
sequences, λ be empty sequence. For any finite or infinite ω = ω1 . . . ωn . . . we
denote ω n = ω1 . . . ωn . Let R be the set of all real numbers extended by adding
the infinities −∞ and +∞. [r] denotes the integer part of real number r.
We need a one-to-one enumeration of all ordered pairs of positive integers.
We fix some form of this enumeration. We use the natural correspondence
between finite binary sequences and nonnegative integers: ∅–0, 0–1, 1–2, 00–3,
01–4, 11–5, 000–6,. . . such that the absolute value of the difference between the
length of the binary sequence and the logarithm (on the base 2) of its ordinal
number is less than 1. We encode the ordered pair of binary sequences α and
β by the sequence bin ∗ (|α|)01αβ, where bin(|α|) is a binary code of the length
of α and γ ∗ = γ1 γ1 . . . γn γn for γ = γ1 . . . γn . Then the absolute value of the
difference between the length of the binary code of the ordered pair hi, ji and
2 log2 log2 i + log2 i + log2 j + 2 is less than 3. We will identify the ordered pair
hi, ji and its ordinal number.
We need some elements of the theory of algorithms. This theory is systematically treated in, for example, Rogers [12].
We also need some model of computation. Algorithms may be regarded
as Turing machines and so the notion of a program and a time of computation will be well-defined. Our considerations will be invariant under polynomial
computation time, so the results will be machine-independent. An algorithm
transforms finite objects into finite objects. Integer and rational numbers (but
no reals) are examples of finite objects. Finite sequences of finite objects are
again finite objects. We will use a notion of a computable function transforming finite objects into finite ones. A set of finite objects is called recursively
enumerable if it is a domain of some computable function. A set A of finite
objects is called (algorithmically) decidable if A and the complement of A are
recursively enumerable.
Let A be a set of all finite objects of certain type.
A function f : A → R is called (lower) semicomputable if {(r, x) |
r is rational and r < f (x)} is a recursively enumerable set. This means that
there is an algorithm which when fed with a rational number r and a finite
object x eventually stops if r < f (x) and never stops otherwise. In other words,
the semicomputability of f means that if f (x) > r this fact will sooner or later
be learned, whereas if f (x) ≤ r we may be for ever uncertain.
A function f is upper semicomputable if −f is lower semicomputable.
2
Proposition 1 There is a lower (upper) semicomputable real function f (i, a),
where i is an integer number and a ∈ A, such that the sequence f1 , f2 . . . of the
functions fi (a) = f (i, a) consists of all lower (upper) semicomputable functions
on A (we call such a function universal).
Proof. We prove this proposition for lower semicomputable functions. Each
algorithm is described by a program which is a finite sequence of symbols in
some alphabet. In practice not all sequences are “meaningful” programs, but
it is convenient to consider “meaningless” programs as programs describing
algorithms that never stop. Let π1 , π2 . . . be a computable enumeration of all
programs. We define an algorithm Φ (for checking f (i, a) > r holds) as follows.
When fed with i,a and r, it applies the program πi to all pairs (a, r0 ) such that
r0 > r and r0 is a rational number. Φ stops when at one of these pairs the
program πi stops. It is easy to check that there is a (unique) function f (i, a)
such that r < f (i, a) if and only if Φ stops at (r, i, a). Any lower semicomputable
function g on A is computed by some program πi and, therefore g coincide with
fi . 2
Let fi,s (a) = fs (i, a) be equal to the maximal r such that the algorithm Φ
stops on (r, i, a) after s steps of computation and equal to −∞ if such r does
not exist. Then fi,s (a) ≤ fi,s+1 (a) for all i, s, a and fi (a) = lim fi,s (a). Any
s→∞
such function fi,s takes only finite number of rational values distinct from −∞.
An analogous non-increasing sequence exists for any upper semicomputable
function.
A real function f is computable if there exists an algorithm which, when fed
with any a ∈ A and a rational > 0, computes a rational approximation of f (a)
with accuracy .
Proposition 2 A function f is computable if and only if it is simultaneously
lower semicomputable and upper semicomputable.
Proof. We need only to prove that any function f which is both lower
and upper semicomputable is computable. By Proposition 1 and consideration
after it there are non-decreasing by s sequence fi,s (a) and non-increasing sequence gj,s (a) of the functions taking rational values such that lim fi,s (a) =
s→∞
lim gj,s (a) = f (a) for some i and j. To compute a rational number r satisfying
s→∞
|f (a) − r| < , where is positive and rational, it suffices to find s satisfying
|fi,s (a) − gj,s (a)| < and to output fi,s (a). 2
Let f − (i, a) and f + (i, a) be functions universal for all lower semicomputable
and upper semicomputable functions from a. For any computable real function
f we call a pair hi, ji such that f (a) = f − (i, a) = f + (j, a) a program of f .
The pair of computable functions Φ = hf − (i, a), f + (j, a)i is called a mode of
description of computable functions. We denote the output fi,s (a) of the algorithm described in the proof of Proposition 2 by Φ(k, a, ), where k = hi, ji is a
program for f .
3
Let Φ be a mode of description and k be a positive integer number. We say
that a real function f is k-simple with respect to Φ if there exists a program of
length ≤ k which computes the values of f as in Proposition 2 by means of Φ.
Proposition 3 There exists an optimal mode of description Π satisfying the
following. For any mode of description Φ there exists a constant c such that for
all k all functions that are k-simple with respect to Φ are (k + c)-simple with
respect to Π.
Proof. Let f − (n, i, a) be a function universal for all lower semicomputable
functions from (i, a) existing by Proposition 1. Let f + (n, i, a) be a function
universal for all upper semicomputable functions from (i, a). Let us define a
mode of description fˆ− (hn, ii, a) = f − (n, i, a), fˆ+ (hn, ii, a) = f + (n, i, a).
For any mode of description hf − (i, a), f + (j, a)i there exist m and n such
that f − (i, a) = fˆ− (hm, ii, a) and f + (j, a) = fˆ+ (hn, ji, a). The proposition is
follows from the form of the enumeration of pairs. 2
We fix such an optimal way Φ = hf − , f + i of description and will consider
k-simple functions only with respect to it.
S
We use also a concept of computable operation on Ξ Ω [24],[16]. Let F̂ be
a recursively enumerable set of ordered pairs of finite sequences satisfying the
following properties:
• (x, ∅) ∈ F̂ for any x, where ∅ is the empty sequence;
• if (x, y) ∈ F̂ , x ⊆ x0 and y 0 ⊆ y then (x0 , y 0 ) ∈ F̂ ;
• if (x, y) ∈ F̂ and (x, y 0 ) ∈ F̂ then y ⊆ y 0 or y 0 ⊆ y.
A computable operation F is defined as follows
F (ω) = sup{y | x ⊆ ω and (x, y) ∈ F̂ for some x},
S
where ω ∈ Ω Ξ.
Informally, the computable operation F is defined by some algorithm which
when fed with an infinite or a finite sequence ω takes it sequentially bit by bit,
processes it and produces an output sequence also sequentially bit by bit.
3
Computable calibration
Let some device sequentially generates a sequence of bits composed in a sequence
ω = ω1 ω2 . . . ωn . . .. If this process is completely chaotic we describe it by a
probability distribution P . To define a probability distribution (measure) P
on Ω it is sufficient to define all the values P (x), such that P (x) = P (x0) +
P (x1) for all finite binary sequences x, where xν is the sequence x extended
by adding the bit ν = 0, 1 on the right-hand side. The measure of the interval
4
Γx = {ω ∈ Ω | x ⊆ ω}, where x ∈ Ξ, is defined as P (Γx ) = P (x). After that P is
extended on all measurable subsets of Ω as usual in the theory of measure. The
uniform measure L is defined as L(x) = 2−|x| , where |x| denotes the number of
bits in the finite sequence x. The measure P is computable if there exists an
algorithm computing values P (x) with arbitrary degree of accuracy.
In this section we will use some weak concept of infinite random sequence.
We use Dawid’s notion of randomness since it has a suitable interpretation, leads
to a wider class of random sequences and so, in this case to, stronger results.
Dawid [3] uses the concept of a forecasting system that is a real-valued
function defined on the finite binary sequences and takes values between 0 and
1. A typical situation in which a computable forecasting system f arises is as
follows. Let some device sequentially generates a sequence of bits composed in a
sequence ω = ω1 ω2 . . . ωn . . . of a certain kind (such as readings of instruments)
sequentially one by one. At some time we begin to register these observations.
If this process is completely chaotic we are trying to apply some probabilistic
theory that is relevant to these observations. For some sequences of observations
(or data sequences) ω1 ω2 . . . ωn the theory yields a probability forecast for the
next observation ωn+1 . Such a data sequence is included into domain of f and
f (ω1 ω2 . . . ωn ) is defined as the probability forecast issued for ωn+1 . For a more
general presentation and details we refer reader to [19].
If an overall probability P on Ω is given then the function f (x) = P (1 | x),
where P (1 | x) = P (x1)/P (x) is the conditional probability that x will be followed by 1, x1 being concatenation of x and 1, is a forecasting system provided
all conditional probabilities exist. It is easy to see that for each everywhere defined forecasting system f there is a unique probability distribution P satisfying
P (1 | x) = f (x) for all x with P (x) 6= 0. In this paper we will consider only
everywhere defined forecasting systems and corresponding probability distributions.
A selection rule is a function on the set of all finite binary sequences taking
values 0 and 1. A selection rule δ is said to select the subsequence s = n1 n2 . . .
under an infinite binary sequence ω = ω1 ω2 . . . ωn . . . if n ∈ s just when
δ(ω1 ω2 . . . ωn−1 ) = 1. We say that a forecasting system f (or the probability
distribution P associated with f ) is calibrated for ω1 ω2 . . . ωn . . . with respect
to δ if ether the subsequence n1 n2 . . . selected by δ under ω1 ω2 . . . ωn . . . is finite
or
r
r
1X
1X
ωni −
f (ω1 ω2 . . . ωni −1 ) −→ 0
r i=1
r i=1
We say that f (or the associated probability distribution P ) is computably calibrated for ω1 ω2 . . . ωn . . . if it is calibrated for ω1 ω2 . . . ωn . . . with respect to
all computable selection rules δ. Finally, ω1 ω2 . . . ωn . . . is calibrable if some
computable f is computably calibrated for it; otherwise, ω1 ω2 . . . ωn . . . is noncalibrable.
Let as compare the notion of calibration with Martin-Löf’s concept of infinite
5
random sequence. Martin-Löf’s definition is based on the measure theoretic
approach. This approach distinguishes a property of “typicalness”. This mean
that a random sequence must belong to each reasonable “majority” of sequences.
The accurate definition of majority is an algorithmic analogue of a set of the
measure 1. Since each set of measure 1 is a complement of a set of measure 0, it
is sufficient to define the concept of an effectively null set. Let P be computable
measure on Ω. A set M ⊆ Ω has P -measure 0 if for eachS
rational > 0 there is
a sequence Γx1 , Γx2 . . . of intervals such that M ⊆ U = i Γxi and P (U ) < .
P -null set M is called an effectively P -null set if there exists a computable
function x(i, ) such that xi = x(i, ) for all i.
Martin-Löf [9] proved that for any computable measure P there exists the
largest (with respect to measure-theoretic inclusion) effectively P -null set. The
complement of this largest effectively P -null set is called constructive support
of the measure P . A sequence ω ∈ Ω is called typical (random in the sense of
Martin-Löf) if it belongs to the constructive support of the measure P [6].
In the framework of the measure-theoretic approach we also consider a
quantitative measure of impossibility of an outcome ω with respect to a measure P . RLet P be a computable measure, E be a mathemathical expectation:
E(f ) = f (ω)dP . Following [17] let us consider a nonnegatve function p(ω)
from Ω to R which characterizes the degree of disagreement between the measure P and outcome ω: an outcome ω is impossible with respect to the measure
P at a level r if p(ω) > r. More precisely a function p(ω) is called a measure of
impossibility of ω with respect to P if
1) The function p(ω) is lower semicomputable;
R
2) It holds E(p) = p(ω)dP < 1.
For any measure of impossibility p(ω) we can define a sequence of sets Um =
P {ω | p(ω) > 2m }, m = 1, 2, . . .. By 2) P (UmT
) ≤ 2−m and Um+1 ⊆ Um for all
∞
m. Then, using 1), it isTeasy to see that a set m=1 Um is an effectively P -null.
∞
If p(ω) = ∞ then ω ∈ m=1 Um . Hence, p(ω) < ∞ for each typical sequence ω.
In [23] it is proved that a converse assertion is also holds.
Let f be a computable forecasting system and P be a corresponding probability distribution. Dawid’s [2] general calibration theorem asserts that a set
of all infinite sequences ω such that f is computably calibrated for ω has P measure 1. Here we present a version of this theorem for individual random
sequences.
Theorem 1 Any computable forecasting system f is computably calibrated for
each infinite sequence ω typical with respect to the corresponding probability
distribution P .
Proof. We give an algorithmic version of Dawid’s general calibration theorem
[2]. Let f be a computable forecasting system, P be a corresponding probability distribution and ω be an infinite sequence. Let ω n = ω1 . . . ωn . For any
6
computable selection rule δ define ξn = 1 if δ(ω n−1 ) = 1 and ξn = 0, otherwise.
Pk
k
Let nk =
i=1 ξi be a length of a sequence selected under ω by δ. Define
i−1
p1 = f (λ) and pi = f (ω ) for all i = 2, 3, . . .. Let us consider
µk =
k
k
1 X
1 X
ξi ωi , πk =
ξi pi .
nk i=1
nk i=1
Define a sequence of functions
xi (ω) =
1
ξi (ω)(ωi − pi (ω))
ni (ω)
and
ψk (ω) =
k
X
xi (ω).
i=1
Let us consider a sequence Λ1 ⊂ Λ2 . . . of Borel fields, where for any n the field
Λn is generated by all intervals Γx with |x| = n.
The functions ξi = ξi (ω), µi = µi (ω) and pi = pi (ω) depend only on the first
i − 1 bits of ω and, so, they are Λi−1 -measurable. As follows from definition,
pi = E(ωi | Λi−1 ). Then E(xi | Λi−1 ) = 0. From this it follows also that
E(ψk | Λk−1 ) = ψk−1 for k ≥ 2, i.e. {ψk } is an martingale with respect to
Pk
Borel fields Λ1 ⊂ Λ2 . . .. We have also E(ψk2 ) = s=1 E(x2s ). (A definition of
martingale will be given in Section 5).
We have
2
k
k
X
1X
1
2
2
E
ξi .
E(ψk ) =
E(xi ) ≤
4 i=1
ni
i=1
2 2
For any ω non-zero elements of the sequence n11 ξ1 , n12 ξ2 , . . . are 1, 212 ,
P∞ 1
1
π2
2
i=1 i2 ≤ 24 .
32 , . . .. Then E(ψk ) ≤
Let r1 and r2 be rational numbers such that r1 < r2 . For any infinite
sequence ω and positive integer number n define a sequence of positive integer
numbers: ν0 = 0 and for i > 0
νi = min{j | j ≤ n, ψj < r1 , j > νi−1 },
if i is odd, and
νi = min{j | j ≤ n, ψj > r2 , j > νi−1 },
if i is even. Let K be a maximal even i such that νi is defined, if such i exists,
and K = 0, otherwise. We define an upcrossing function σn (ω, r1 , r2 ) = 21 K. It
is easy to verify that if lim ψn (ω) does not exist, then for some r1 and r2 a
n→∞
value σn (ω, r1 , r2 ) is unbounded as n tends to infinity. Define
σ(ω, r1 , r2 ) = sup σn (ω, r1 , r2 ).
n
7
By definition this function is lower semicomputable.
By Doob [4] (Theorem 3.3, Section 3, Part 8) for any n
E(σn ) ≤
E(|ψn |) + |r1 |
.
r2 − r1
By definition σn (ω, r1 , r2 ) ≤ σn+1 (ω, r1 , r2 ) for each n. We have also,
π
E(|ψn |) ≤ 2√
for each n. Then the function σ(ω, r1 , r2 ) is integrable by ω
6
and
π
√
+ |r1 |
E(σ) ≤ 2 6
.
r2 − r1
Let us take an expectation of σ by r1 and r2 . Consider two computable functions
r1 (i) and r2 (i) such that r1 (i) < r2 (i) for all i and for each pair of rational
numbers r1 and r2 such that r1 < r2 it holds r1 (i) = r1 and r2 (i) = r2 for some
i.
Let us define
∞
p(ω) =
1 X 1 r2 (i) − r1 (i)
σ(ω, r1 (i), r2 (i)).
π
4 i=1 i2 2√
+ |r1 (i)|
6
It is easy to verify that this function is lower semicomputable and
Z
E(p) = p(ω)dP < 1.
Hence, this function is a measure of impossibility and so, p(ω) < ∞ for each
typical sequence ω. As follows from above, if lim ψn (ω) does not exist then
n→∞
p(ω) = ∞.
Hence, lim ψn (ω) exists for any typical infinite sequence ω. By Kroneker’s
n→∞
Pk 1
Lemma [15] (Lemma 2, Section 3, Part 4) if lim
i=1 ni ξi (ω)(ωi − pi ) exists
k→∞
then
µk (ω) − πk (ω) =
k
1 X
ξi (ω)(ωi − pi ) −→ 0
nk i=1
for this ω. Theorem 1 is proved. 2
4
Infinite noncalibrable sequences
By a probabilistic algorithm we mean a pair (P, F ), where P is a computable
measure on Ω and F is a computable operation.
The following theorem shows that using probabilistic algorithm it is possible to generate with probability extremely close to 1 infinite noncalibrable
sequences.
8
Theorem 2 For any > 0 there exists a probabilistic algorithm (L, F ) which,
when fed with an infinite binary sequence generated by the uniform distribution
L, with probability 1 − outputs a noncalibrable infinite binary sequence, i.e.
L{ω ∈ Ω | F (ω) is infinite and noncalibrable} > 1 − .
Pr
Proof.
As in [2] we use the notations ωr (s) = 1r i=1 ωni and fr (s) =
P
r
1
i=1 f (ω1 ω2 . . . ωni −1 ), where f is a forecasting system and s = n1 n2 . . .
r
is a subsequence selected under ω by some selection rule. Let β = β1 . . . βm
be a finite binary sequence. A selection rule δ is said to select subsequence
s = n1 . . . nr of length r under β if n ∈ s just when n ≤ m and δ(β1 . . . βr−1 ) = 1.
For any forecasting system f define the deviation of the forecasts by f on β with
respect to δ as Dev(f, β, δ) = |fr (s) − βr (s)|.
Let ϕ(α) = Π(i, α, 0.05) for all i and α; so we fix the accuracy of the computation of the forecasting systems. For technical reasons we redefine ϕi so that
every computable forecasting system has infinitely many programs. To do this
0
it is sufficient to replace this sequence with ϕhi,li = ϕi for all l. We denote this
new sequence also as ϕi .
be an arbitrary such that 0 < < 1. Choose a number i1 such that
P∞Let −(1+)
< 12 . Since each computable forecasting system has infinitely
i=i1 i
many programs we can use in the construction below only programs i ≥ i1 .
Let i be a program, α be a finite binary sequence, and let l be a positive
integer number. Let us define an auxiliary computable function β(i, α, l). The
definition of β(i, α, l) is as follows. For j ≤ |α| define βj = αj . Let |α| < j ≤
|α| + l and let β1 · · · βj−1 be already defined. If ϕi (β1 · · · βj−1 ) terminates then
define
0, if ϕi (β1 · · · βj−1 ) > 0.5
βj =
1, otherwise.
If j = |α| + l define β(i, α, l) = β1 · · · βj and finish the computation of β(i, α, l).
If ϕi (β1 · · · βj ) does not terminate for some |α| ≤ j < |α| + l then β(i, α, l) is
undefined.
Lemma 1 If i is a program of some computable forecasting system then
β(i, α, l) terminates for every finite binary sequence α and positive integer i.
The proof is trivial since ϕi (x) terminates for any x.
Define β(i, α) = β(i, α, 4|α|).
Let us explain the meaning of this definition. We denote by ω n the secuence
ω1 ω2 . . . ωn for any ω. In the construction below for any > 0 we shall define, by
mathematical induction on the length of sequences, a computable operation F
and an auxiliary function ∆(i, γ, n). The operation F will define the probabilistic algorithm we need. For each infinite binary sequence ω at some step n of the
induction we pay a visit to a program i, and try to define F (ω m ) = β(i, F (ω s ))
for some m and s. Let i be a program of some computable forecasting system
f . We shall define two selection rules δ1 and δ2 depending on i such that if
this attempt is successful then the deviation of the forecast by f on β(i, F (ω m ))
9
with respect to δ1 or δ2 will be sufficiently large. Lemma 1 will help us to show
that the attempt to define F (ω m ) = β(i, F (ω s )) will be successful for some
sufficiently large n. Since every computable forecasting system f has infinitely
many programs i, it will follow that each infinite sequence from the range of F
is noncalibrable.
The probabilistic algorithm F (ω) uses the infinite coin-toss string ω as input.
The construction proceeds in steps.
By step n − 1, let γ ⊆ ω be the string of input bits already processed, and
some output F (γ) produced. At step n, some more bits of ω will be read and
some more bits of F (ω) will be output. Along with F , we also define function
∆(i, γ, n) which takes 0 and ±1 as values. This function plays the role of the
monitor in the definition of F . F (ω n ) is constant for all sufficiently large n if and
only if ∆(i, ω s , n)=1 for such n and for some i and s. The second requirement
for the construction is that for each i the uniform probability measure of all ω
such that for some γ ⊆ ω and almost all n ∆(i, γ, n) = 1 is less than i−(1+) .
From this it will follow that the measure of all α such that F (α) is infinite is at
least 1 − 21 .
If ∆(i, γ, n) = 1 we say that a finite sequence γ has an i-label at step n. This
i-label marks an attempt to use string γ by adding to it some bits that thwart
the forecasting program i.
Let i = i−(1+) , where i ≥ i1 .
We also need some technical details. For any such i consider the binary
expansion of the real number i with precision i−5 , namely, i = c1 2−1 + . . . +
cm 2−m + ci−5 , where cj = 0, 1 for 1 ≤ j ≤ m, and 0 ≤ c ≤ 1. There exists
a set Di of mutually incomparable (under ⊆) finite binary sequences such that
L(Di ) = i −ci−5 , where L is the uniform distribution. Without loss of generality
we will ignore the member ci−5 in the following estimation. We fix some way of
generating
the sets Di given i. Note that this way can be chosen such that the
S∞
set i=i1 Di is algorithmically decidable. Then we can also define a decidable
set E of finite
binary sequences incomparable mutually and with all sequences
S∞
from the set i=i1 Di such that for any infinite binary sequence ω there is some
S∞
S
γ ∈ D̂ = i=i1 Di E for that γ ⊆ ω.
Step 0. Define ∆(i1 , λ, 0) = 1 and F (λ) = λ, where λ is empty sequence.
Step n > 0.
Let γ be the sequence such that γ ⊆ ω and for any i if ∆(i, γ 0 , n − 1) = 1,
0
γ ⊆ ω then |γ| ≥ |γ 0 δ|, where δ ∈ Di . We shall show how to define F (γ).
The induction hypotheses are as follows. After the (n − 1)-th step it holds:
1) For any i there is at most one m such that ∆(i, γ m , n − 1) = ±1.
2) For any γ 0 ⊆ γ if ∆(i, γ 0 , k) = 1, where k ≤ n − 2, and ∆(i, γ 0 , n − 1) 6= 1
then F (γ 0 δ) = F (γ 0 δ 0 ) for all δ, δ 0 ∈ Di .
3) If for some γ 0 ⊆ γ ∆(i, γ 0 , n − 1) = 1 and γ 0 δ ⊆ γ for some δ ∈ Di then
10
γ 0 ⊆ ω is the maximal such that F (γ 0 ) has been defined in the construction.
In this case we say that γ is in a i-wait zone.
Define m(i, n − 1) = sup{m | ∆(i, γ m , n − 1) = 1} (we suppose that
sup ∅ = ∞), γ(i, n − 1) = γ m(i,n−1) if m(i, n − 1) < ∞.
Let I be a set of all i ≤ i1 + n − 1 such that m(i, n − 1) < ∞ and
β(i, F (γ(i, n − 1))) terminates in time ≤ n.
Suppose that I 6= ∅. For each i ∈ I do the following. Let γ 0 = γ(i, n − 1).
Case 1. γ 0 δ 6⊆ γ for each δ ∈ Di .
In this case if there is not j ∈
/ I such that γ is in a j-wait zone then define
∆(i, γ, n) = 1, ∆(i, γ 0 , n) = 0, ∆(i1 + n, γ, n) = 1. Otherwise, define
∆(i, γ 0 , n) = −1, ∆(i1 + n, γ 0 , n) = −1. Informally, we say that i-label either is
transferred from γ 0 to γ or is delayed.
Case 2. γ 0 δ ⊆ γ for some δ ∈ Di .
By 3) the current output is F (γ 0 ). In this case define F (γ) = β(i, F (γ 0 )) and
∆(i, γ, n) = 0. Informally, we say that γ is i-satisfied.
Define also ∆(j, γ, n) = 1 for j = i1 +n and for each j such that for some γ 00 ⊆ γ
∆(j, γ 00 , n − 1) = −1. Then for each such j define ∆(j, γ 00 , n) = 0. Informally,
we say that such j-label is transferred from γ 00 to γ with delay.
If I = ∅ define ∆(i1 + n, γ, n) = 1.
At the end of the step n for each i, x define ∆(i, x, n) = ∆(i, x, n − 1) if the
value of the right-hand side is defined and the value of the left-hand side has
not been defined at this step.
It is easy to verify that items 1)-3) hold after the step n.
Let γ be an arbitrary finite binary sequence. If the value F (γ) has not been
defined in the construction then define F (γ) = F (γ 0 ), where γ 0 is the longest
sequence such that γ 0 ⊆ γ and F (γ 0 ) have been defined in the construction.
Lemma 2 Let ω ∈ Ω. Then with probability 1 − the output F (ω) is
infinite and for each program i of a computable forecasting system f some initial
fragment γ ⊆ ω is i-satisfied.
Proof.
Let Vj = {γ ∈ Ξ | ∆(j, γ, n) = 1 for all sufficiently large n},
Uj = {ω ∈ Ω | γδ ⊆ ω for some γ ∈ Vj , δ ∈ Dj }.
Then by item 3)
[
{ω | F (ω) is finite} =
Uj .
j
If γ ∈ Vj then some j-label has been transferred to γ at some step of the
construction and do not moved at the following steps. From this we obtain that
any two different γ1 , γ2 from Vj are incomparable. From this it follows that
X
L(Uj ) = L{γδ | δ ∈ Dj , γ ∈ Vj } = L(Dj )
L(γ) < j .
γ∈Vj
Hence L{ω | F (ω) is finite} < 21 .
11
Let ω be an arbitrary infinite input sequence and output F (ω) be also an
infinite. Let also Vi,n be an event consisting in that no initial fragment of ω is
i-satisfied at steps ≤ n.
By the construction L(Vi,n | Vi,n−1 ) = 1−i if i ∈ I, and L(Vi,n | Vi,n−1 ) = 1,
otherwise. Here L(A | B) is the conditional probability of the event A given B.
Since f is everywhere defined and computable, by Lemma 1 and construction
if i ∈ I and β(i, F (γ)) 6⊆ F (ω) at some step n then i-label will be transferred
from an initial fragment γ of ω to some its extension (may be with delay) γδ ⊆ ω.
The probability that this happens k-times is ≤ (1 − i )k . So, the probability
that no
Pinitial1fragment of ω is i-satisfied for all sufficiently large k will be < i .
Since
i < 2 Lemma 2 is proved. 2
Lemma 3 With probability 1 − no computable forecasting system is calibrated for F (ω).
Proof. Let f be an arbitrary computable forecasting system and i be a
program computing f . Remember that for each program j we have defined a
computable sequence of programs hj, li such that ϕhj,li = ϕj , l = 1, 2, . . .. Define
two computable selection rules
1, if ϕi (α) > 0.5,
,
δ1 (α) =
0, otherwise
1, if ϕi (α) ≤ 0.5,
δ2 (α) =
.
0, otherwise
Let F (ω) be infinite and l be an arbitrary. By Lemma 2 with probability
1 − there exist n and m such that the sequence ω n is hi, li-satisfied and β =
β(hi, li, ω m ) = ω n . Let the selection rule δν selects under β a subsequence sν ,
and let rν be the number of all elements selected from βn+1 , . . . , β5n , where ν is
equal to 1 or 2. Note that δ1 (α) = 0 if and only if δ2 (α) = 1. From this we obtain
that rν ≥ 2n for some ν equal to 1 or 2. Since |f (α)−ϕi (α)| < 0.05 for each α it
follows from the definition of β that Dev(f, β, δν ) > 0.45(2n − 1 − n)/5n > 0.08
for this ν.
Hence, there exists ν equal to 1 or 2 such that the selection rule δν selects
under θ an infinite sequence s satisfying |fr (s)−θr (s)| > 0.08 for infinitely many
different r. Hence f is not calibrated for θ with probability 1 − . Lemma 3 and
Theorem 1 are proved. 2
Note that we can effectively on i construct some selection rule as follows.
Each time we compute β(i, F (γ 0 )) and if in the extension there are 1s more than
0s then we take 1s, otherwise we take 0s.
The proof of Theorem 1 is based on the proof from [22] with some simplifications proposed by An.A.Muchnik. The difference is in the following. In [22] in
addition to the assertion of Theorem 1 has been proved that for any ω if F (ω)
is infinite then it is noncalibrable.
Let us consider a forecasting program learning the outputs ω1 ω2 . . . ωn−1 of
some device and trying to assign definite probability to future event ωn . By
12
Theorem 2 an infinite output ω1 ω2 . . . of F with probability 1 − eventually
destroy the performance of each forecasting program. In this sense a computable
probabilistic forecasting on such sequences is impossible.
5
Predictive sequential measure of impossibility
In the following section we consider the concepts of a finite non-stochastic sequence and give some estimates of probability of generating such sequences by
probabilistic algorithms.
Let f be a computable forecasting system. We will use the notion of the
prequential (predictive sequential) measure of impossibility of an outcome x ∈ Ξ
with respect to f , which corresponds to the Dawid’s prequential approach to
statistics [2], [19]. The measure of impossibility ψ with respect to f defines
potential falsifiers x of the forecasting system f (and the corresponding probability distribution P , where P (1 | x) = f (x)) : f is falsified by an outcome x at
a level r if ψ(x) > 2r .
The definition of prequential measure of impossibility is based on the two
appealing principles:
The empirical character of the measure of impossibility implies that this
function should be lower semicomputable.
The principle of the excluded gambling strategy: the adherent of a probabilistic theory should consider it practically impossible that a predefined gambling strategy against the theory will be ever win very much.
The explication of “gambling strategy” is “non-negative supermartingale”
[4], [15], [18]. Let f be a forecasting system and P be a corresponding probability
distribution. We consider a sequence Λ1 ⊂ Λ2 . . . of Borel fields, where for any
n the field Λn is generated by all intervals Γx with |x| = n. A supermartingale
is a sequence {ξn } of real functions on Ω such that for any n the following holds:
• ξn is Λn -measurable;
• E(|ξn |) < ∞ (E is a symbol of mathemathical expectation with respect to
P );
• E(ξn+1 | Λn ) ≤ ξn (P almost surely).
As follows from the first item for any n the value ξn (ω) depends only on the
first n bits of ω, i.e. ξn (ω) = ψ(ω n ) for some real function ψ(x) defined on Ξ.
The last item shows that (1) holds
ψ(x) ≥ ψ(x0)(1 − f (x)) + ψ(x1)f (x)
(1)
for all x ∈ Ξ (we put ∞ · 0 = 0). If we replace “≤” on “=” we have a notion of
f -martingale.
13
The gambling strategy corresponding to ψ is as follows. We start with capital
ψ(λ). This ψ(λ) is given to the adherent of f on condition that he will return
us ψ(ω1 ) after the first outcome ω1 is known. The ψ(ω1 ) received is again given
to him on condition of returning ψ(ω1 ω2 ) where ω2 is a second outcome, etc.
Inequality (1) guarantees that the game is fair or even favourable to him.
Any semicomputable non-negative f -supermartingale will be called prequential measure of impossibility of an outcome x with respect to a forecasting system
f if ψ(λ) ≤ 1.
As will be shown for any computable forecasting system f there is a largest,
to within an additive constant, prequential measure of impossibility ψ̂(x) with
respect to f : for any prequential measure of impossibility ψ(x) with respect to
f there is a positive constant c such that cψ̂(x) ≥ ψ(x) for any x (see below).
The notion of impossibility for infinite sequences is absolute. Let f be a
computable forecasting system. For infinite ω we define ψ(ω) = supn ψ(ω n ),
where ψ is any prequential measure of impossibility with respect to f and ω n =
ω1 ω2 . . . ωn .
An infinite sequence ω is called possible with respect to a forecasting system
f (or the corresponding probability distribution P ) if ψ(ω) < ∞ for each measure of impossibility ψ with respect to f (ω is impossible with respect to f if
ψ(ω) = ∞, for some prequential measure of impossibility ψ with respect to f ).
It is easy to prove that the measure P of the set of all sequences possible under
P is equal to 1.
An infinite sequence ω is called stochastic if it is possible under some computable forecasting system.
Proposition 4 Let P be a computable measure such that P (x) 6= 0 for all x.
An infinite sequence ω is typical with respect to P if and only if it is possible
with respect to P .
S
Proof. Let M be an effectively P -null set and M ⊆ V = i Γx(i,) and
P (V ) < Sfor each rational > 0, T
where x(i, ) is a computable function. We
∞
put Un = m>n V2−m . Then M ⊆ n=1 Un , and P (Un ) ≤ 2−n for all n. Let us
define a sequence of measures of impossibility with respect to P
T
P (Γx Um )
.
ψm (x) =
P (x)
P∞
Put ψ(x) = m=1 ψm (x). It is easy to verify that ψ(x) is a measure of impossibility with respect to P . As folows from the definition of ψ for any ω ∈ M it
holds supn ψ(ω n ) = ∞.
To prove the
S converse assertion, note that for any measure of impossibility
ψ a set Um = {Γx | ψ(x) > 2m } has P -measure ≤ 2−m . This can be obtained
from the inequality
k
X
ψ(xi )P (xi ) ≤ 1
i=1
14
for any set x1 . . . xk of pairwise incomparable finite binary sequences.
n
T∞Hence, if supn ψ(ω ) = ∞ then ω belongs to an effectively P -null set
m=1 Um . 2
In the sequel we will consider only prequential measures of impossibility.
We need also a concept of uniform measure of impossibility. A non-negative
function ψ is called lower semicomputable non-negative uniform supermartingale if
ψ(i, x) ≥ ψ(i, x0)(1 − f (x)) + ψ(i, x1)f (x)
(2)
for all x ∈ Ξ and all programs i of computable forecasting systems f , and a set
{(i, r, x) | r < ψ(i, x)} is recursively enumerable, where r is rational.
Any lower semicomputable non-negative uniform supermartingale ψ is called
an uniform measure of impossibility if ψ(i, λ) ≤ 1 for all programs i of computable forecasting systems.
Proposition 5 There exists an universal uniform measure of impossibility ψ̂,
such that for each uniform measure of impossibility ψ there exists a constant c
such that it holds cψ̂(i, x) ≥ ψ(i, x) for all x and all programs i of computable
forecasting systems.
Proof. By Proposition 1 there exists a function f (n, i, x) universal for
all lower semicomputable functions from (i, x). Put fn (i, x) = f (n, i, x) and
fn,s (i, x) = fs (n, i, x). For any n, s let us define ψ to be the least non-negative
uniform supermartingale such that ψ ≥ fn,s and (2) holds for all x ∈ Ξ and
all programs i of computable forecasting systems f . This ψ can be defined as
follows.
Let hf − (i, x), f + (j, x)i be a mode of description of functions from x ∈ Ξ. Put
−
−
+
fi (x) = f − (i, x), fi,s
(x) = fs− (i, x) and fj+ (x) = f + (j, x), fj,s
(x) = fs+ (j, x).
Let π and τ be computable functions such that π(hi, ji) = i and τ (hi, ji) = j
for all integer numbers i and j. For any function f (x) we define also f¯(x) = 0
if f (x) < 0, f¯(x) = 1 if f (x) > 1, and f¯(x) = f (x), otherwise.
For any positive integer number t define ψt and ϕt to be minimal nonnegative functions such that ψt ≥ fn,s , ϕt ≥ fn,s and
−
ψt (i, x) ≥ ψt (i, x0)(1 − f¯τ+(i),t (x)) + ψt (i, x1)f¯π(i),t
(x)
(3)
−
ϕt (i, x) ≥ ϕt (i, x0)(1 − f¯π(i),t
(x)) + ϕt (i, x1)f¯τ+(i),t (x)
holds for all x, i. Since fn,s (i, x) = −∞ for almost all pairs (i, x) such ψt and ϕt
can be easily defined. Note that these functions take rational values and equal
+
+
0 for almost all pairs hi, xi. From fπ(i),t+1
(x) ≤ fπ(i),t
(x) and fτ−(i),t+1 (x) ≥
−
fτ (i),t (x) it follows that ψt+1 (i, x) ≥ ψt (i, x) and ϕt+1 (i, x) ≤ ϕt (i, x) for all
t, i, x. Put
ψ(i, x) = sup ψt (i, x).
t
15
This function is lower semicomputable. Tending t to infinity in (3) we obtain
−
ψ(i, x) ≥ ψ(i, x0)(1 − f¯τ+(i) (x)) + ψ(i, x1)f¯π(i)
(x)
(4)
for all i, x.
Let i be a program of some computable forecasting system f . Then f (x) =
−
fπ(i)
(x) = fτ+(i) (x) and (2) holds. We have also ψ(i, x) = lim ϕt (i, x).
t→∞
Now we return to the proof of the proposition. Let us denote ψn,s = ψ and
ψn,s,t = ψt , ϕn,s,t = ϕt . Since fn,s+1 ≥ fn,s we have ψn,s+1 ≥ ψn,s . Define
ψn (i, x) = sup{ψn,s (i, x) | ϕn,s,t (i, λ) ≤ 2 for some t}.
s
We set ψn (i, x) = 0 for all x if ϕn,s,t (i, λ) > 2 for all s, t. Put
ψ̂(i, x) =
∞
X
2−n−1 ψn (i, x).
n=1
It is easy to see that ψ̂ is an uniform measure of impossibility.
For any other uniform measure of impossibility ψ 0 there exists an n such that
0
ψ (i, x) = fn (i, x). Let i be a program of a computable forecasting system. Then
for any s ϕn,s,t (i, λ) ≤ 2 for all sufficiently large t. Hence ψ 0 (i, x) = ψn (i, x)
and 2n+1 ψ̂(i, x) ≥ ψ 0 (i, x) for all x. The proposition is proved. 2
We fix some ψ̂(i, x) with these properties.
6
Finite non-stochastic sequences
Let k, l be positive integer numbers. A finite sequence x of the length n is called
(k, l)-stochastic if for some program i of length ≤ k of k-simple forecasting
system f it holds ψ̂(i, xj ) ≤ 2l for every j ≤ n.
A finite sequence x of the length n is (k, l)-non-stochastic if for each program
i (of length ≤ k) of k-simple forecasting system f there is some j ≤ |x| such
n
that ψ̂(i, xj ) > 2l . For any positive integer numbers k and l let Dk,l
be a set of
all (k, l)-non-stochastic sequences of the length n.
Proposition 6 For all positive integer numbers k, l, n and k-simple forecasting
n
systems f it holds P (Dk,l
) ≤ 2−l , where P is probability distribution corresponding to f .
Proof. Let f be a k-simple forecasting system. Let i be a corresponding
n
program for f of length ≤ k. For any x ∈ Dk,l
there is an initial fragment x0
0
l
of x such that ψ̂(i, x ) > 2 . Let x1 , . . . , xs be all such fragments with maximal
length. Clearly, they are pairwise incomparable. By definition of ψ̂ we have
16
Ps
Ps
Ps
n
1 ≥ j=1 ψ̂(i, xj )P (xj ) > 2l j=1 P (xj ). Then P (Dk,l
) ≤ j=1 P (xj ) ≤ 2−l .
2
The next theorem provides an upper estimate of the probability of generating
(k, l)-non-stochastic sequences of the length n. At first we present a suitable
notion of (lower) semicomputable semimeasure.
For any probabilistic algorithm (P, F ) we consider a function
Q(x) = P {y ∈ Ξ | x ⊆ F (y)}.
(5)
It is easy to verify that this function has the following properties:
Q(λ) ≤ 1,
Q(x0) + Q(x1) ≤ Q(x)
for all x,
{(r, x) | r is rational, r < Q(x)} is recursively enumerable.
Any function having these properties is called semicomputable semimeasure.
It can be proved that for any semicomputable semimeasure Q there exists a
probabilistic algorithm (L, F ) such that (5) holds with P = L [24], [16].
Let P be a semimeasure and DP
be a finite set of pairwise incomparable finite
sequences. Then define P (D) =
P (x). If P is semicomputable then define
x∈D
Ps (x) = max{r | (r, x) ∈ As }, where A = {(r, x) | r < P (x)} and As is a finite
part of the recursively enumerable set A computed after s steps of enumeration.
Theorem 3 Let be an arbitrary small positive number, P be a lower semicomputable semimeasure. Then for all positive integer numbers k, l and all
sufficiently large n it holds
n
P (Dk,l
) ≤ 2−k+(1+) log2 n + 2−l .
Proof. For any finite sequence δ and positive integer n define a measure Q
as follows. If there exists an s such that
X
Ps (z) ≥
|δ|
X
δj 2−j ,
j=1
|z|=n
choose a minimal such s and define Q(x) = Ps (x) for each x of length n such
that x 6= 0n , where 0n is the sequence of n zeros. We extend Q for all other x
in a natural fashion. If such s does not exist put Q undefined.
P
Without loss of generality suppose that is rational. Let an =
P (z)
|z|=n
and δ be a binary sequence representing the rational approximation of an from
below with accuracy 2−(k−(1+) log2 n) . Then the function Q is a measure and
17
the program (δ, n) of the corresponding forecasting system has the length less
than k − (1 + ) log2 n + log2 n + 2 log2 log2 n + c ≤ k for all sufficiently large n,
where c is a positive constant.
n
Since 0n does not belong to Dk,l
for all sufficiently large n, we have
n
an − 2−k+(1+) log2 n ≤ Q(Dk,l
) ≤ an
n
n
for these n. By Proposition 3 Q(Dk,l
) ≤ 2−l . Hence P (Dk,l
) ≤ 2−k+(1−) log2 n +
2−l . The theorem is proved. 2
n
Let k(n) and l(n) be two integer-valued functions. Then Dk(n),l(n)
is the set
of all (k(n), l(n))-non-stochastic
sequences
of
the
length
n.
S∞
n
m
Let us define Ik,l
= m=n Dk(m),l(m)
.
The following theorem shows that in the case of computable bounds k = k(n)
and l = l(n) the probability of generating of finite non-stochastic sequences is
asymptotically decreased.
Theorem 4 Let k(n) and l(n) be unbounded non-decreasing computable
integer-valued functions. Then for any semicomputable semimeasure P it holds
n
lim P (Ik,l
) = 0.
n→∞
Proof. Let be an arbitrary positive rational number and let 0 = n0 <
n1 < . . . be a computable sequence of integers such that
[(1 − )k(ni )] < [(1 − )k(ni+1 )]
and
l(ni ) < l(ni+1 )
for all i. For any δ ∈ Ξ define a measure Pδ as follows.
If there exist i and s such that |δ| = [(1 − )k(ni )] and
X
Ps (z) ≥
|δ|
X
δj 2−j
(3)
j=1
|z|=ni+1
put Pδ (x) = Psi (x) if |x| = ni+1 and x 6= 0ni+1 , where si is the minimal s
satisfying (3). We extend Pδ for all other x in a natural fashion. If such i and
s do not exist put Pδ undefined.
P
Let us denote rn =
P (x). Since P is a semimeasure, rn+1 ≤ rn for all
|x|=n
n. If lim rn = 0 then the assertion of the theorem is true. Suppose that this
n→∞
P
limit is positive. Then put ai =
P (z) > 0 for all i.
|z|=ni+1
Let δ i be a binary sequence representing the rational approximation of the
number ai from below with accuracy 2−[(1−)k(ni )] . Then Pδi is a measure. Put
Pi = Pδ i .
18
By definition the forecasting system corresponding to the measure Pi is
([(1−)k(ni )]+c)-simple, where c is a constant. We have [(1−)k(ni )]+c ≤ k(ni )
for all sufficiently large i. We will consider only such i.
Let X consist of finite sequences of the length ≤ ni+1 incomparable pairwise
and with the infinite sequence of zeros. Let
o
Xn
P (x) | |x| = nj+1 and z ⊆ x for some z ∈ X .
Rj =
It holds Rs ≤ Ri for s ≥ i. Since Pj (X) = Rj − j for j = i, s, where j ≤
2−[(1−)k(nj )] , we have
Ps (X) ≤ Pi (X) + 2−[(1−)k(ni )] .
Let D be a finite set of pairwise incomparable finite sequences such that each
x ∈ D is (k(n), l(n))-non-stochastic, where n is the length of x, and x 6⊆ 0∞
(= 0 . . .). We call such D a (k, l)-non-stochastic section.
Let D be any (k, l)-non-stochastic section. Define Ai = {x ∈ D | ni < |x| ≤
ni+1 }. Let s = s(D) be the maximal such that ns < |z| for all z ∈ D and
r be the minimal such that |z| ≤ nr+1 for all z ∈ D. Then the forecasting
system associated with the measure Pi is k(n)-simple if ni < n ≤ ni+1 , since
k(ni ) ≤ k(n) for these n.
As follows from the definition of non-stochasticity, for any x ∈ Ai there is the
longest x0 ⊆ x such that ψ̂(δ i , x0 ) > 2l(ni ) . Let x1 . . . xt are all such x0 . Clearly,
Pt
they are pairwise incomparable. We have also j=1 ψ̂(δ i , xj )Pi (xj ) ≤ 1. From
Pt
this we obtain Pi (Ai ) ≤ j=1 Pi (xj ) ≤ 2−l(ni ) . Hence,
Pr (D) =
r
X
i=s
Pr (Ai ) ≤
r
X
Pi (Ai ) +
i=s
r
X
2−[(1−)k(ni )] ≤
i=s
2−l(ns )+1 + 2−[(1−)k(ns )]+1 .
Let D̂ be a set of all sequences of the length nr+1 extending sequences from
D. As follows from above
P (D̂) ≤ Pr (D̂)+2−[(1−)k(nr )]+1 ≤ 2−l(ns )+1 +2−[(1−)k(ns )]+1 +2−[(1−)k(nr )]+1 .
n
Suppose that lim P (Ik,l
) ≥ h > 0. Then for any n there exists a (k, l)n→∞
non-stochastic section D such that s(D) > n and P (D̂) > h and any sequence
from D is incomparable with 0∞ . This is a contradiction since in the previous
estimates for non-stochastic section this measure will be arbitrary small for all
sufficiently large s and r. The theorem is proved. 2
n
By definition the set Dk,l
consists of all sequences of the length n on which
any forecasting program of the length ≤ k will be falsified at level l. Theorem 4
shows that in the case, where k = k(n) and l = l(n) are computable, the probability of producing any such sequence of the length ≥ n tends to 0. Compare
with Theorem 2, which asserts that it is possible to generate with probability
19
≥ 1 − infinite sequences on which any forecasting program will be eventually
falsified. In the following section we present some estimation of place and time
of losing randomness in the situation of Theorem 2.
7
Non-calibration effects for time-bounded forecasting systems.
In this Section we present some analogue of Theorem 1 in some restricted setting. An analysis of time and space of computation shows that we need only
polynomially bounded resources to demonstrate non-calibration effects on finite sequences for time-bounded forecasting systems with probability extremely
close to 1.
Theorem 3 of [22] in its direct formulation is partially incorrect. Here we
present the correct version of this theorem which corresponds to the correct
construction of Section 4 [22].
Let k and T be positive integer numbers. A forecasting system f is called
(k, T )-simple if there exists a program of length ≤ k (under some optimal mode
of description) computing any value of f in time ≤ T .
Theorem 5 For each > 0 there exist a probabilistic algorithm (L, F ), an
uniform measure of impossibility ψ(i, x) and polynomials p1 , p2 , p3 such that
the following holds.
• With probability 1 − the algorithm F , when fed with an infinite sequence
ω, for each positive integer number l and (k, T )-simple forecasting system f
outputs in time ≤ p1 (2k , l, T ) a sequence β ⊆ F (ω) of length ≤ p2 (2k , l, T )
such that ψ(i, β) > 2l , where i is a program of f of length ≤ k computing
any value of f in time ≤ T .
• If i is a program of (k, T )-simple forecasting system f of length ≤ k then
ψ(i, x) is a computable f -martingale and its values are computed in time
p3 (|x|, T ).
The corresponding construction is given in Section 4 of [22]. This construction is similar to the construction of Theorem 2 but more optimal by time of
computation.
A finite sequence x is called (k, l, T )-non-stochastic if for any (k, T )-simple
forecasting system f there is j ≤ |x| such that ψ̂(i, xj ) > 2l , where i is a program
of f of length ≤ k computing any value of f in time T .
As follows from Theorem 5, for each > 0 there exist a probabilistic algorithm (L, F ), a rational number δ > 0 and a polynomial p such that for for
all sufficiently large n the algorithm F outputs with probability 1 − in time
≤ p(n) a (δ log2 n, nδ , nδ )-non-stochastic sequence of length n.
20
References
[1] T.M. Cover, P. Gacs, R.M. Gray. (1989) Kolmogorov’s contributions to
information theory and algorithmic complexity, Ann. Probab. 17, No.1,
840–865.
[2] A.P. Dawid. (1982) The well-calibrated Bayesian [with discussion], J. Am.
Statist. Assoc. 77, 605–613.
[3] A.P. Dawid. (1985) Calibration-based empirical probability [with discussion] Ann. Statist. 13, 1251–1285.
[4] J.L. Doob. (1953) Stochastic Processes New York: John Wiley & Sons,
London: Chapman & Hall.
[5] A.N. Kolmogorov. (1965) Three approaches to the quantitative definition
of information, Problems Inform. Transmission 1, 4–7.
[6] A.N. Kolmogorov, V.A. Uspensky. (1987) Algorithms and randomness,
Theory Probab. Applic. 32, 389–412.
[7] L.A. Levin, V.V. Vyugin. (1977) Invariant properties of information bulks
In Lecture Notes in Computer Science 53, 359–364. Springer-Verlag.
[8] L.A. Levin. (1984) Randomness conservation inequalities: Information and
independence in mathematical theories, Inform. and Control 61, 15–37.
[9] P. Martin-Löf. (1966) The definition of random sequences, Information and
Control 9 (6) , 602–619.
[10] An.A. Muchnik, A.L. Semenov, V.A. Uspensky. Mathematical metaphysics
of randomness Theor. Comp. Sci. (this issue).
[11] D. Oakes. (1985) Self-calibrating priors do not exists [with discussion], J.
Am. Statist. Assoc. 80, 339–342.
[12] H. Rogers. (1967) Theory of recursive functions and effective computability,
New York: McGraw Hill.
[13] M.J. Schervish. (1985) Comment [to Oakes, 1985], J. Am. Statist. Assoc.
80, 341–342.
[14] A.Kh. Shen. (1983) The concept of (α, β)-stochasticity in the Kolmogorov
sense, and its properties, Soviet Math. Dokl. 28, 295–299.
[15] A.N. Shiryaev. (1984) Probability, Berlin:Springer.
[16] V.A Uspensky, A.L. Semenov, A.Kh. Shen. (1990) Can an individual sequence of zeros and ones be random, Russian Math. Surveys 45, No. 1,
121–189.
21
[17] V.G. Vovk, V.V. V’yugin. (1993) On the empirical validity of the Bayesian
method, J. of the Royal Statist. Soc. B 55 (1), 253–266.
[18] V.G. Vovk and V.V. V’yugin. (1994) Prequential Level of impossibility with
some applications, J. R. Statist. Soc. B, 56, 115–123.
[19] V.G. Vovk. (1994) A logic of probability, with application to the foundation
of statistics [with discussion] J. R. Statist. Soc. B 55, 317–351.
[20] V.V. V’yugin. (1976) On Turing invariant sets, Soviet Math. Dokl. 17, No.
4, 1090–1094.
[21] V.V. V’yugin. (1985) Nonstochastic objects, Problems Inform. Transmission 21, 77–88.
[22] V.V. V’yugin. (1996) Bayesianism: An algorithmic analysis, Information
and Computation 127, N1, 1–10.
[23] V.V. V’yugin. (1997) Ergodic theorems for individual random sequences,
Theoretical Computer Science (this issue).
[24] A.K. Zvonkin and L.A. Levin. (1970) The complexity of finite objects and
the algorithmic concepts of information and randomness, Russ. Math. Surv.
25, 83–124.
22

Download Report

Non-stochastic infinite and finite sequences

Paperzz.com

Your Paperzz