UNIVERSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C. Mathematical Sciences Directorate Air Force Office of Scientific Research Washington 25~ D. C. AFOSR Report No. ON THE RECURRENCE OF PATTERNS by Marcel Paul SchUtzenberger March, 1961 Contract No. AF 49(658)-213 Simple formula are derived for the generating function of the recurrence event defined by a set of words, (or "patterns"). Qualified requestors may obtain copies of this report from the ASTIA Document Service Center, Arlington Hall Station, Arlington 12, Virginia. Department of Defense contractors must be established for ASTIA services, or have their "need-to-know" certified by the cognizant military agency of their project or contract. Institute of Statistics Mimeo Series No. 283 • On the recurrence of patterns by " Marcel Paul Schutzenberger I. Introduction. Let us consider a Bernoulli process in which the individual outcomes are the letters xi u= in these letters. € X and a set of words (or "patterns") By definition, the event g occurs for the first time at the end of a sequence f of letters if the last letters of f form a word from U and if this occurence has not happened earlier on f. This type of recurrent event has practical applications in engineering and in statistics and it has been considered at some length by W. Feller (I) to the word u = xi especially in the case where U reduces (= xlX l ••• Xl ), i.e. in the case of the so called success run of length n. In this note we develop Feller's results in order to get a simple expression for the generating function of U when U is a finite set and with the help of some combinatorial properties of free monoids we discuss in more details the case where U reduces to a single word. II. Notations. We denote by F the free monoid generated by X (Cf 2, chap. 1), that is the set of all finite words in the letters Xi product ff' defined for any f ,f' followed by the word f'. € € F and x € X). with the F as the word made up of the word f The empty word is always denoted bye; denotes the degree (or length) of f (i.e. for any f € X leI = 0 and If I Ifxl = If1+ 1 2 ~t We consider a mapping which sends every x € X onto tPr(x} where t is an ordinary variate and where Pr(x} is the probability of x; At extends in a natural fashion to an homomorphism of the free algebra generated by X into the ring of the polynomials in t. further extend ~t to the module with real coefficients < F of the infinite sume f We = E < f,f> f f, f > which have the property that the f€F ordinary power series in t L l<f,f> I ~tf is convergent for all f€F probability distributions (Pr(x) )in some open domain· around O. Then, classically, F is a topological algebra and every f € which is such that <f,e> r- l = e + E =1 F admits a (continuous) inverse (_r}n n> 0 We observe that for any subset F' of F the sum E f (= 0 , if f€F' F' is empty) belongs to F and we shall usually represent it by the same letter F'; then, of the subset F'. subsets F' ~tF' is just the ordinary generating function With these notations, the intersection of two and F" can be interpreted as a special case of the ro', f><F",f>f of the two elements F' and F" of F. f€F be any recurrent event on F and A (resp. A*}be the set Hadamard product Let now ~ of the words at the end of which E occurs (resp. occurs for the first time); we denote by S the complement in F of the right ideal A*F and, consequently, S is the set of the word on which ~ has not yet occured; we recall that it is a natural convention to assume that e € A, e € S but e if A*. Feller's original definition can be translated into the two following statements: 11.1. Every f € F can be factorized in one and only one manner as a product f = as with a € A and s € S. 3 11.1' • Every a € + can be factorized in a unique A (= A - (e) manner as a product a = a.~1).2 a. where the words a • •• a. ~m i all j belong to A*. Thus, after Feller, we can write the following identities in S =F A = A* - A*F = (e - A*)( e - X )-1 (since F = (e _ X )-1 (e - A*)-l or, in equivalent fashion since A =e F ); + + A is invertible, (e + A+) -1 • = A+ ! If is persistent (i.e., by definition, if lim ~t A* mean recurrence time of E T ), the t~l is equal to lim ( 1 - ~tA*)(l -~tX) t~ because of our previous formula we have also T = lim t-:>l III. A general =1 ~tS -1 and • for~u1a. We revert to the problem described in the introduction and we consider a fixed, not empty, set of words U; according to our definitions a word a i. F belongs to A* if and only if it satisfies the two conditions: € the word a ends with some word u € U, that is, a belongs to some left ideal Fu; ii. f € it is not possible to factorise a in the form a F, u U and f' € € + F (= F - (e) = fuf' with ). It follows instantly that we still define the same recurrent event if we reduce that u' € U = U by eliminating from it all the words u' which are such FuF, for some other word u € U. Let us introduce the following notations for each reduced set u belonging to the U: A u = AnFu ; In the algebra F we have the identities: A*u = A*nFu. 4 A=e+ E A ueU u A* ; A = A A* u u u As observed by Feller the value of ~ probability n occurs before any other word u l u that the word u t A* for t = 1 is exactly the u from U. Multiplying on the left by (e - X) the last identity above and using the fact that e - X + (e - X) A is invertible we obtain u E ueU A* = (e - X + E (e - X) A )-1 (e - X) A • u ueU u u III.l. ~X Since, = 1 an~ III. 2 • as well known, U is persistent, we deduce nu = ~t(e-X) Au lim t ~l E ~t (e-X)A ueU u In similar manner, we find that S = (e-X+ E (e-X}A )-1 and finally, ueU u III.2' . T = lim t~l ( E ueU Thus the problem reduces to the computation of the sums (e-X}A u and we prove: 111.3. For any ueU one has the identity in u = E u'eU F , (e-X) Au , Mu,u where the set corresponding to M, is defined as the set of the u ,u words f of degree strictly less than u that satisfy the relation u'f € Fu. 5 Proof. Let us left-multiply the identity by (e - X)-lj by definition, its left member (e - X)-lu is equal to Fu, the set of all words ending with u. Because of 11.1, any word fu € Fu admits one and only one factorisation fu = as with a € A and s € Sj by the very definition of ble that /s/ > U /u/ because, then(l), s would have the form s would belong to A*j thus, /s/ < /u/ and consequently a -=I e. it is impossi- = flu and it Because of 11.1'., this implies that a == alia' with a"€A and a' contained in a certain A*' (u' € U), that is a' u = f'u'. reduced, u' cannot be a proper factor of Since U is assumed to be u and there exists a uniquely determined word f" satisfying the relations f == a"f'fll and fU u == u's. Thus we have proved that (1) s belongs to M, • u ,u Here as repeatedly in this paper we use the following theorem of F.vl. Levy ( 3) which embodies the so called cancellation laws of the free monoid: If f ,f 2 ,f ,f 4 l 3 € exists one and only one and f 2 = = f 3f 4 and /f / ~ /f /, there l 3 f € F which satisfies the equations f == f f l 5 3 5 F are such that f f l 2 f f 4. 5 Reciprocally, any word of the form f == as with a € A , and s u € M, u ,u belongs to Fu and because of the uniqueness of the factorisation described in the first part of the proof this remark concludes the verification of the identity. 6 Let us assume now that U is a finite set and consider the UxU matrix M whose entries are the sums M j since e belongs to M , if and only u',u u,u if u = u', det AtM is arbitrarily close to one when t tends to zero. Consequently, the system formed by the linear equations At u = E ( \ (e - X)A t)( \M, ) u=€U u U ,u in the unknown generating functions At(e - X)A u ' can be solved by determinental expressions once the AtM, 's are known and, using the u ,u formulas 111.1,111.2 and 111.2', this provides us with the desired result. For example, when U = ( u ) , a single word, we have T = lim t~l It can be proved that the non commutative sums A u can also be obtained from the equations 111.3 by successive elimination but, by necessity, the formulas are rather' cumbersome. However, for U = (u) we have III.4 A M -1 u,u u , A* u = (e_X)-l u«eX)M +u)-l(e_X). u,u For U = ( u, u' ) we have for instance III.4' • Au = (e -1)( -1 )-1 . - X) -1 ( u - u'M, u,u ,Mu,u - Mu,u 1M, u,u ,M, u,u As a straightforward consequence of the general formulas we mention the following fact whose proof is left to the reader 7 111.5. If U is the direct union of sets = 0 when M ul,u u l € U. I , U € U 0' u J J /= U j which are such that jl, the generating function A+ (= A .. (e) )corresponding to the event U is the direct sum of the generating functions A+j corresponding to each of the events U.• =J This explains the simplicity of the results when U is the union of n. runs u j = x j J of length n in the letters x • j j IV. A symmetry property. Let us denote by f = x. x. U = ru.} onto ~ .-..J f the involution of 'r= x. F which sends elery x. and consider the set 2 ~l to which we assoc iate as above a recurrent event u and the ~l ~2 • •• x. f ~m =set AI*J of the words • •• xi ~m "'- .. at the end of which U occurs for the first time. ,-J IV.l. There exists a bijection Proof. ~ A* It is more convenient to keep AI* which commutes with ~ ~t. U unchanged and to define B* by the following conditions symmetric of those used in the definition of A*: the word il• ideal iiI. b belongs to B* if and only if: it begins with some word u € U, that is, it belongs to some right uFj it is not possible to factorise b in the form b = fluf with f€F, U€U and fl€F+ • Thus, clearly, AI* of a bijection ~: = (b' : A* b € B* ) and we only have to prove the existence ~ B*. Let us consider any word If f = e, we define ~a a = fu from A* where u as being a itselfj u = xi x . ••• l ~2 8 if fie, we consider the words a(k) = x. ~m-k+l x. ~m-k+2 obtained by a cyclic permutation of the letters of t3a as a(k) where k t3a = a(k)' f xi x .••• X . . 1 ~2 ~m-k ~m and we define is the smallest integer which is such that a(k) belongs to some right ideal u' F (u' We surely have a ..oX. k<m (= ~l x. U). since a(m) = uf belongs to uF; i f with k' < k belong to m-k' S = F-A*F and, by the very definition of A* they do not admit a factor from all the words fx. lui) € • •• xi ~2 U; thus, by construction, t3a satisfies i' and according to this last remark it also satisfies ii'; this proves the existence of a t3: A* mapping --~ B*. Reciprocally, given any b = uf in B*, we can find a minimal cyclic permutation ab of its letters which belongs to some left ideal Fur (u' =a U) and, as above ab belongs to A*. € and (t3 0 a) b =b for any a € A* and b Since, trivially, (0 € B*, the mapping 0 t3 t3)a. is a bijection and the proof is ended. Remark. A slightly less explicit result follows instantly from S =F - A*F and from the conditions i and ii used for defining A* which, together show that S is the complement in F of the two sided ideal FUF, or in our present notations, that V. S can also be defined as F - FB*. The case of U consisting of a single word. Although,as we shall see, M has the simple form (e - vn ) (e _ v)-l u,u for "almost all" words u, its expression can happen to be more intricate (as, e.g~ for u = xyxzxyxyxzxyx formula for computing it. ) and we shall discuss a recurrence 9 These remarks have some relevance also for the general case since, if u =I u I and M, u ,u =I 0, we have M, u ,u = (M" ,,) g where u ,u g is the word of lowest degree in MUI u apd where u" is uniquely determined with the , help of F. W. Levy's theorem by the relations u As usual, for any f € = u"g and u' € FU". F, fO is defined as e and we start by veri- fying two statements of independent interest which will be needed later on. V.l. m~ If a,b,c, € F are such that ab °which satisfy the relations a Proof. = ca there exist d,d' € F and = (dd,)md; b = dId; c = dd'. We use repeatedly F. W. Levyls theorem and we assume that the result is true for all similar equation with a strictly lower total degree /ab/. /'0/, we /a/ > /'0/, we fact, a = a" = d'a If /a/::. have directly b If can write two new relations in l since we have (ca")b and c = ad' (and a = d, i.e. m = 0). a = alb and = c(a1b) a = ca" and, by hypothesis; thus, we are back to an equation alb = cal ( = a) of the same type as the original one and the result follows from the induction hypothesis. If a,b,c,d, V.2. there exist h € F and n ,n ,n l 2 3 n a Proof. 1 0 • n,m~ F and € =h l ; be 0 are such that ~ =h 0 which satisfy the relations: n 2 n ; bd =h 3• We distinguish three mutualqexclusive cases: /a /::. /'0/. Then, by cons idering the left factors of the given 10 equation we can define in a unique manner the integers Pi (1 ~ i < 3) and the words a. (1 < i < 6) satisfying the relations: ~ - - bcb From these we deduce the new equation a a = l Pl p' a a4a a l = a4a 3a 4a 3 a , that is, finally 3 5 , a a = a a4 ( = a). Using V.l. and induction this shows that a = h,n, 4 3 3 3 n' and a4 = h 4 for some h € F and the full result follows easily. If p =1 3 , + P3' we l+Pl . obta~n P3 = 0, we must also have Pl quently, =0 and then a3a4al = a4a5 ( = b); conse- la 5 1 ~ la)1 and, taking into account the relation a)a4 = a a6 (=a) 5 we finally get a a4 = a a and the end of the proof is the same as pre4 3 3 viously. 2 0 m • =1 /b/< la/~ + m', say. /bc/. We write b' 2 n tiontakes the form a + apply 1 o The hypothesis la/~ l(bc)~1 implies now that = bc; = b'c'b'd' c' = e; = (bc) m' bd d' with, now, lal ~ /b'I; thus we can and the result is proved in this case. /bcl < la/. We define nil mil -> 0 and bit , c ll € F ll l mll /'I.a + I ~ I(bc )2+n I = /'I.al+mllbll I; b c = a. Writing 0 3 and the equa- • II II to an equation of the original type but in which d ll Thus we can apply 2 o by the relations a II =e = bc we are back and lall I (=/bc/)< and the result is proved in all cases. + We revert to our problem of studying M for a given u € F (= F - (e); u,u from V.l it instantly follows that g € M (i.e. ug = g'u for some g' € F u,u and Igl < lui) if and only if u = vg = glv for some v € F. In order to 11 obtain a more useful formula we define eu for any u € F+ as the largest integer (possibly, zero) which is such that u can be written in the form u = (ff,)euf = f(f'f)eu) ( =1 It may happen (only if eu for some pair (f € F+, f' € F). as will be seen later) that several pairs satisfy this relation; in this case we consider only the principal one which we characterise by the supplementary condition that its total Iff'l degree is the lowest possible; in any case we define u = (ff,)eu-lf > 0 and = e if eu = O. if eu = xyxzxyxyxzxyx we have eu = 2 since u = (x)(yxzxy)(x)(yxzxy(x), U= xyxzxyx but, now, the principal pair of u = U is (xyx, z) and we have U = xyx (= u2 ); U2 = X(=U ); U = 1 l 3 3 For instance, if u e. It is a natural convention to define M as zero and we have: e,e For any u + F , M u,u =e + (M- -)(f'f). u,u It follows immediately from V.l that M u,u Proof. =0 eu € >0 1 0 • Thus we may assume and that M contains a word g u,u a word which satisfies the conditions 0 € if and only if and our formula is valid in this special case. from now on that eu g' =e < Igl < lui, ug F+, that is, = g'u F. We distinguish several (possibly overlapping) cases If Ig/:s I (ff,)l+nfg (ff') eu-l f = g'f(f'f)l+n a factor of degree I (= lUI) (With n + 1 Ig'f/we for some we have the equation = eu); cancelling on the left obtain a relation f"(f'f)l+n' g where f" is a right factor of f'f, that is, where f"'f" f"'. € = = flf (f'f)l+n for some We now compare the left factor of the last relation above and we obtain f"f'" = f"'f"; according to the remark made in the proof of V.2 12 this implies that f" f := of m = hand for some h II hm h' for some left factor h' of h. u we now obtain u = hn II h' with n" the maximal character of 1 + n m+m' = hm' f"' := = 0; 1 and mil power of fifo = eu, Reverting to the expression = (m+m')(l+n)+m"o 2 0 • If Because of this is the only possible if thus, finally, f" =e = f'f) (or and g is a Since, trivially, any such word belongs to M we have u,u lUI from proved that the set of the words of degree at most the set F; consequently € M is just u,u 0 < m < eu. ( (f'f)m) Igl ~ Iff'l we can write g = gllf'f = ffl gill and g' and after simplification we derive from the original equation the new relation ug ll = glllU. Consequently the formula is also true in this case and 10 = 1. and 20 cover all the possibilities except when eu 30 If < Igl < Iff' /. eu:= 1 and = ba, we find that g Igl < lui and eu =1 = ab g' Applying V.l to the equation ug and u = (ab)ma ; we have m = 1 since and, thus, u = ff'f = aba with (f,f') is the principal pair; consequently, f b = a'f'a" a = for some a',a" € F+ this proof since u =f m' c when = def'f eu = 1. V€ + F. = (cd )I1tHc = a"a with de € for some c,d, and € F. Mf f and this concludes , € the probability on that a random word of degree However we have f because = (e_vn)(e_v)-l. F+ belongs to Fl if M u,u It is an open question to find an exact expression for Let us say that a word u for some = aa l , lal < If I Using once more V.l. it follows that (cd) m' c, a' = dc, a" = cd and f Finally, g := dcf'cd (cd) = g'u n belongs to Fl' . 13 v.4. The probability a tends to one when n n Proof. tends to infinity. The statement is a straightforward consequence of the remark to be proved below that every word not in F l admits at least one factorisation of the form abacaba with a € F+ and b,c, E F. Indeed, by definition, u € F l if au =0 or if au =1 and of =0 since in those cases, Mu,u = e or = e+ f j thus ,wren ou = 1, u €F l only if af> 0, i.e. if for some a €F+,bE F and c (= f') € F we have u = abacaba. Let us assume now that au > 1. V.2. that, then, the equation u It is a direct consequence of = (ff,)ouf uniquely determines f and flj thus, by V.3.,u€F l if (f,f') is also the principal pair of ff'f and = O. if af The case where of > 0 has already been discussed and we consider the case where the principal pair (g,g') of ff'f is not by (f ,f' ) j pypothesis gg' g = ff' f and by definition If I > If I; thus, as in the proof of V.3. f = (ab)ma for some a € F+. either m > 0 and then of > 0 or, else, we have f finall~ u Remark. = (abag'ab)aua Consequently, = a, f' = bag'ab; and, This concludes the proof. A more accurate description can be given for the set F + Indeed, if u € F =[u: is such that eu > 0 there exists at least one f € F+ which is such that u o o = ff'f for some f' E F; as in the last proof above, it is easily checked that the word f defined in this manner is unique and consequently the proportion a' of the words n u with au =0 relations a 'n+l 2 among the words of degree = a'2n'. a'2n = a'2n-2 - n satisfies the recurrence (card X)-n a'. n techniques, it is easily deduced that lim n ~ 00 a'n > O. By standard au=O). ) . 14 References (1) W. Feller, An Introduction to Probability Theory and Its Applications. Vol. 1 second ed. Chap. XIII, sect. 7 and 8. (2) C. Chevalley, Fundamental Concepts of Albegra (3). F. W. Levy, Bull. Calcutta Math. Soc. 36 1944, pp. 191-196.
© Copyright 2025 Paperzz