Johnson, N.L.; (1973).Return to repetitions."

•
IThis research was supported in part by the
Durham, under Contract No. DAHC04-71-0042.
u.s.
Army Research Offic,
A RETURN TO REPETITIONS
N.L. Johnson
D~partment of Statistios
University of North Carolina at Chapel HiZl
Institute of Statistics Memeo Sereis No. 871
June, 1973
•
A RETURN TO REPETITIONS
by
N.L. Johnson
University of North Ca;r>oUna at Chapel. HiZZ
1.
Introduction
In [1] it was shown that in a series of independent trials with m mutually
E ,E , ... ,E, and Pr[E.] = p. (j=l, •.• ,m;
I 2
m
J
J
PJ' > 0,
p. = 1) at each trial, then denoting by N the number of trials needed
L
j=l J
to obtain a randomly chosen sequence of L outcomes (an L-sequenae)
exclusive possible outcomes
m
r
(1)
whatever be the p.'s.
J
(The random choice of the L-sequence is supposed to be
such that the probability of obtaining a sequence with b. E.'s (j=l, •.. ,m) in
J J
m
b.
a specified order is IT p. J .)
j=l J
In the present paper we first give formulae for the variance of N and
L
then exploit (1) to obtain some further results.
2.
Variance of N
L
For a specified L-sequence with k critical points (see [1]) and a ..
J1
Ej's occuring up to the i-th critical point. the conditional distribution of
has probability generating function
[1
+
(l-t)g(t)]-l
N
L
•
2
where
r ( mIT p. -a..J1}t -a.1
k
get) =
i=l j=l J
m
r
a..
Note that a k = L, since the last (k-th) critical point
j=l J1
is always at the end of the sequence.
with a. =
1
The conditional expected value of
k
g (1) =
m
II
r ( p.
i=l j=l
N
L
is
-a ..
J
1)
J
and the conditional second factorial moment is
k
2[{g(I)}2
+
gl (1)] = 2({
I
m
a
(II p~ ji)}2 -
i=l j=l
J
k
m -a ..
r
a. ( IT p. Jl)]
i=l
j=l J
.
1
The unconditional moments of N are obtained by taking expected values over
L
the .distribution of L-sequences. Using the symbol I to denote summation
S
over all L-squences we have
(2)
(shown in [1]), and
(3)
Note that
k, as well as the
in S.
We use the formula
(4)
e.
a .. 's will depend on the particular L-sequence
J1
•
(3)
to obtain var(N ) .
L
We first evaluate
m a'
T
Remembering that each
T
=
S j=l
J
L a.
i=l
i=l
1
L-sequence has
= Lm
L
a
m
-a ..
IT p. J1) •
j=l J
k
= L, and using (2) we
hav~
..
I( mIT p.Ja' k ) k-lI a. ( IT p.-aJ1)
ill
+
S j=l J
o
(with
k
H IT p.J k ) I a. (
i=l
1
j=l J
= 0).
1
Following an argument similar to that in Sections 2 and 3 of [1], the
second term in
T can be shown to be equal to
1
+
2
+ ..• +
1
(L-l) = IL(L-l) .
Hence from (4)
(5)
The first term has to be evaluated by direct enumeration, though some partial
simplification is possible.
L = 1,2,3,4,5.
Table 1 shows the results obtained for
Note that, in contradistinction to
NL does depend on the
p. ' s.
J
E[N L], the variance of
•
Table 1.
m
Variance of N (Note:
L
<P,-
h
=I
j=l J
L
1
2
3
4
5
4
-h
p. ) .
Var(N L) with Pl= ..• =Pm
2</>1-m(m+l)
242
2[<Pl+2</>1]-m -Sm +2m-2
363
2[</>1+2<P 2+(2m+l)<P l ]-m -gm +6m-B
422
2[<P 1+2</>3+ 2<P l +</>2+2(m +1)</>1]
8
-m -l3m4 +2m2+6m-14
S
2
3
2[</>1+2<P4+</>3+2</>1</>2+2m¢1+(2m +m+2H 1 ]
-m lO -17mS+2m2+14m_28
Since the minimum value of
= m-1
2
m -m
4
2
m -m +2m-2
2 +6m-8
m6
-m3
+2m
8432'
m -m +2m +6m +6m-14
10 S
4
3
2
m -m +2m +6m +6m +14m-28
-1
L is attained when Pl="'=Pm = m
these values of the Pj'S minimize Var(N ). The minimum values of Var(N L)
L
are shown in the last column of Table 1. Note that (m-l) is a factor in
<P
each expression, as must be the case since N = Land Var(N ) = 0 when
L
L
m=!.
3.
Effects of Partial Repetition
Suppose that, in the situation described in Section 1, we wait until
the first L' «L) terms of the required L-sequence appear.
What is the
expected number of further trials needed in order to obtain a completion
of the L-sequence?
Let
denote the needed number of trials.
Since the
L' term
initial subsequence must be obtained prior to obtaining the L-sequence, we
have
5
where
N,
L
has the same distribution as
N , with
L
L' replacing
L.
Hence
(6)
E[NL/L,J = E[N L] - E[N L,]
= (mL + L-I) - (mL' + L'-l)
= mL -
L'
m
+
L - L'
As with E[N ] this is an integer, independent of the p.' s. At each trial
L
J
the probability that there is E. at that position of the L-sequence is p.,
J
J
and the probability that
E. is observed in the trial is also p., so the
m
J
J
probability of correct matching is
so, in the next
correctly the next
s
I p7
j=l J
trials (s ~ L-L') the probability of obtaining
terms (i.e. the (L'+I)-th, ... ,(L'+s)-th terms) in the
m 2
L-sequence is on average (I p.)s. If these terms are obtained correctly,
j=lJ
the conditional expected number of further terms needed to complete the
L-sequence is
s
E[NL1L'+sJ since we now already have the initial (L'+s) terms
of the L-sequence.
Hence
(7)
where
m 2 s
m 2
E[NLI L'] = s + (J.I=lPJ.l E[NLIL'+s] + {I - (j=l
I p.)'}ELIL'
J
,s
ELIL',s
= expected
number of further trials needed to complete the
L-sequence when the first L' terms have been obtained
correctly but the correct sequence has been lost during
the next
s
trials, (s
~
L - L').
6
e
From (6) and (7), putting
(8)
€LIL',s
=mL-m L' +L-L'-s
m 2
= I
p.
j=l J
we have
s
5 -1 L'
5
+ e (I-e) m (m -1)
or, equivalently
(8)'
ELIL',s
= mL-m L'+s +L-L'-s
s -1 L' s
+ (I-e) m (m -1)
It is of interest to enquire whether an initial correct
followed by s
L'-su~sequence
trials which are not all correct (with respect to the
relevant L-sequence) is on balance favorable to early completion of the
L-sequence, or not.
We compare
€LjL',s with the expected number of trials
needed to complete a random L-sequence, after the first
s(~L)
trials.
The latter is
E[N L] - s
= mL + L -
1 - s
(taking the needed number of trials to be zero when
5
= L = NL).
Now
€LIL' ,s - {E[N L] - 5)
= (l_es)-lCmse s _ l)m L'_ L' + I .
(Note that this does not depend on L.)
The difference is positive if
(m 5 e5 - l)mL' > (l-e 5 )(L'-I)
That is
(9)
eS
>
...!.5 •
m
I
+
(L' -l)m
-L'
1 +m -5 (L'-l)m -L'
The right hand side of this inequality is greater than or equal to
-5
m , and less than or equal to
7
G = max (L'-l)m-L' .
where
L'
Remembering that
we find
L'
must be a positive integer, and m > 1,
G = (L'-l)m-L'I L'=2 = m-2
Hence the right hand side of (9) lies between
Since
m-s
~
e s<
1 .
Further, it is possible to choose the p.'s so that
J
to 1.
In particular, the
e is arbitrarily close
Pj'S can be chosen so that
(10)
i. e. so that
for all L' and L
>
L'.
On the other hand, for all
L' > 1, it is possible to choose the
p. 's so that
J
e<
(11)
m-l[{l
+
(L'-l)m-L'}/{l
+
(L'_l)m-L'-s}]l/S
i. e. so that
for this particular (but not necessarily all) L' and L
Note that to get low values of
equal (to m-1 ).
>
L'.·
e we must make the Pj'S nearly
If this is the case then there is still on average, some
residual advantage in having had the first L'-subsequence in correct order,
even though the next
s
terms were not in correct order.
8
Table 2 shows values of the right hand side of (10) and Table 3
the ratio of this quantity to m- l (the minimum possible value of
It can be seen that only for a relatively small range of values of
is it possible to have
€LILIJs < E[NL]-S
range decreases as m and/or
Table 2.
1+m- 2 l/s
Values of (s -2)
m +m
m\s
1
2
3
LI
6
at a11 J and this
increases.
= 68
•
4
5
6
2
0.556
0.543
0.533
0.527
0.522
0.519
3
0.357
0.349
0.345
0.342
0.340
0.339
4
0.262
0.257
0.255
0.254
0.253
0.253
5
0.206
0.204
0.203
0.202
0.202
0.201
6
0.171
0.169
0.168
0.168
0.168
0.167
Values of 6 /m- 1
Table 3.
4.
s
for any
6).
8
m\s
1
2
3
4
5
6
2
1.111
1.085
1.066
1.053
1.044
1.037
3
1.071
1.048
1.034
1.026
1.021
1.018
4
1.046
1.029
1.020
1.015
1.012
1.010
5
1.032
1.019
1.013
1.010
1.008
1.007
6
1.023
1.013
1.009
1.007
1.005
1.005
Derived Sequences
From any L-sequence based on m possible outcomes
L-sequence based on ml «m) possible outcomes
.
E , ... ,E , an
1
m
Ell J' .. ,EI I can be derived
m
9
by combining certain of the first set of outcomes to form single outcomes
in the second set.
Thus, from an L-sequence based on El, ... ,E
lO
we might
derive an L-sequence based on
Ei=E l ; E2=E 2uE 3 ; E;=E4 ; E4=E SUE 6 uE 7; ES=EsuE g ; E6=E lO '
m'
(Here m • 10,
= 6).
For example, the 8-sequence
E2E2E7~lESElElOE3
becomes the 8-sequence
Ei Ei E4 Ei E4 Ei E6 Ei·
Of course there are many ways in which this can be done.
Let us denote the class of sequences based on {E.} by S, and the
J
class based on {E!} by S'.
Any
J
L-sequence in S'
L-sequence in S corresponds to a single
(but not conversely).
If the original sequence is random-
ly chosen (in the sense described in Section 1) in
~
derived sequence in S',
The expected number of trials needed to produce
the latter is
m,L
This is, of course, less than
in S'
+
L- 1
= mL
+
L - I, because the L-sequence
which is first obtained may not actually be constructed from the
same L-sequence in S as that required.
S'
S, so will be the
Given that the L-sequence in
has just been completed, the expected number of further trials needed to
complete the original L-sequence in S is, by an argument similar to that
used in Section 2 to obtain
(12)
E[NLIL,J,
(mL + L _ 1) - (m fL + L _ 1)
= mL _ m,L
The probability that a realized L-sequence in . S'
constructed from the original L-sequence in S is
.
is indeed
10
(13)
L(u) denotes summation over the values of j
where
E' (u
belongs to
u
= l, •.. ,m').
for which
.
E.
J
This is, in particular, the probability
that the first realization of the derived L-sequence in S'
is a reali-
zation of the original L-sequence in S.
So if
€'
denotes the expected number of further trials needed to
obtain a realization of the original L-sequence following a realization
of the derived L-sequence in S'
which is not also of the arigina1
L-sequence in S, then
whence
L
= (m L-mL )/(l-e(S'))
t
(14)
€
(Note that
condition rot
6(S')
<
=1
if and only if S'
= S,
•
which is excluded by the
m.)
The expected value of the number of additional trials needed given
only that the first
L have not produced the original L-sequence is
(15)
m
(1 -
(I p~) L = probability
j=l J
chosen L-sequence.)
that is
that first. L trial --does not give the randomly
11
m,L_ l
(16)
>
(mL-l)eL(S') _ (mL_m,L)(
r p~)L .
j=l J
-1
In the special case Pl= P2=···=Pm= m
m n m- 2
=I
8(S')
(where n
u
= number
u -1
u=l num
= m'/m
of E.'s belonging to E'.)
u
J
Hence (14) and (15) become
(14' )
L
=m
(whatever the value of m')
and
= mL
(15' )
•
respectively.
So in this case
€'
= E[NLINL
>
L].
It makes
DO
difference,
on average, to the expected number of additional trials needed, whether
or not the unsuccessful first
in S'.
L trials constitute the derived L-sequence
This is not, of course, true in the general case.
..
REFERENCE
[1]
Johnson, N.L. (1968) "Repetitions," Ameriaan MathematiaaZ MonthZy.,
75., 382-3.