1940] On On the the Probability Distribution Probability on Distribution By Yukiyosi KAWADA a Compact Group on a Compact and Kiyosi . I. 977 Group. I.* ITO. (Read April 2, 1940]. been In the probability mainly considered. tage des cartes,(1)" esses, one must group group been theory But roulette, establish or dice, treated a probability of n letters, on the rotation of the sphere. The case recently studied real or vectorial in order to study in detail random variables the problems of have "bat- by Poincare as Markoff proctheory on the substitution group of the of the rotation circle or on the rotation group of the circle has by P. Levy(2). Our some of his results, and to establish a probability separable topological group in general, so that aim is to generalize theory on the compact we can treat the above (*) Monbusyu-Kagakukenkyu: Tokyo-Teikokudaigaku, Rigakubu, Dai nigo Kenkyu. (1) H. Poincare, Caleul des probabilite, (1912), p. 301. (2) P. Levy, L'addition des variables aleatoires defines sur une circonference, Bull. Soc. math. France, 67 (1939), 1-41. Cf. also H. Weyl, Ueber die Gleichverteilung von Zahlen mod. Eins, math. Ann., 77 (1916), 313-352. 978 Y. KAWADAand K. ITO. [Vol.22 cited problems of Poincare as special cases. We have actually two powerful methods in the probability theory: the method of characteristic functions, and that of function of concentration of P. Levy(3). In our general case also we can use these methods with some modifications. To use the first method, we have to replace the characteristic functions by " characteristic matrices " based on the Levy(3). Neumann's representation theory of compact groups. In this paper we treat chiefly the Markoff process with this method. In the appendix we give a method of constructing an invariant measure in a compact separable group by means of a Markoff process(4). I. Characteristic matrix. Let G be a compact separable topological group. By a probability distribution p on G is meant a real function p(E) defined for all Bore] sets E of G which satisfies the following conditions: It is well taining E known such that for any Borel set E there exists a G -set F con- that, For a random variable x which takes values in G we can determine a probability distribution px(E): that is the probability of the event that x takes values in E. An important probability distribution p on G is the uniform disibution where on G: mG(E) is an invariant tr measure: By the theory of A. Haar and J. von Neumann(5) there exists always an invariant measure and mG(E) is uniquely determined by (5) and the (3) See P. Levy, Theorie de l'addition des variables aleatoires, (1937). (4) Cf. also S. Bochner, Average distribution of arbitrary masses under transformations, Proc. Nat. Acad. U.S.A., 20 (1934), 206-210. group 1940] On the Probability Distribution on a Compact Group . I. 979 additional condition MG(G)=1. A Borel set E is said to be a continuity set of the probability distribution p when p(E)=p(E)=p(E°) is verified , where E is the closure of E and E° the open kernel of E. It is a well known fact that if p1(E)=p2(E) for every continuity set of both probability distributions p1 and p2, then we can conclude p1=p2, namely p1(E)=p2(E) for all Borel sets E. Similarly, a necessary and sufficient condition for p1=p2 is that s verified for any real continuous function f(s) on G. Two random variables x1 and x2 are said to be "equal in law" if px1(E)=px2(E) for any Borel set E on G; therefore a necessary and sufficient condition for the equality in law of x1 and x2 is for any real continuous function f, where E means the expectation. There are at most an enumerable infinite number of mutually inequivalent irreducible continuous unitary representations of the compact separable group G as shown in the Weyl-v. Neumann's theory of representations(6). The always degree the Let these be denoted of * shall principal be denoted representation These matrices is continuous D( )(p) (.5) J. von Neumann, 1 (1933), 106-114. (6) J, von Nenmann, 38 (1934), 445-492. (r=0, the nr. Especially *(o) shall mean absolute values of the components on G we call define D( )(p) as 1, 2, Zum Haarschen Almost by of G: As D( )(s) is a unitary matrix, are not greater than 1: and since *(s) by * periodic ....) shall be called Mass in topologischen functions Gruppen, on a group, Trans. characteristic Comp. Math Am. Math. Soc, 980 Y. KAWADA and K. ITO. [Vol.22 matrices of the probability distribution p on G. Similarly the characteristic matrices D( ) (x) of a random variable x is defind by For the principal representation for any probability distribution especially for from orthogonality the tations For a value, the of By the inequality distribution p=mG relation of the on G, (7) we have also we inequivalent can deduce irreducible represen variable xs which takes only a fixed point s on G as have and for a random Theorem only uniform p. hold G: random we D(0) 1. variable x'=axb Two probability if (a, b*G) distributions p1 and p2 are equal if and Proof. In virtue of the approximation theorem of Weyl(6) we can choose for any given complex valued continuous function f(s) on G and for any positive number e a finite number of constants * such that Integrating therefore on G, we have 1940] On the Probability Distribution Now for two probability duct probability distribution expression: on. a Compact Group. I. 981 distributions p1 and p2 we define the pro(or convolution) p=p1*p2 by the following whereby the integrability results from the fact that both p1(Es-1) and p2(s-1E) are Baire functions of s. Let x1 and x2 be two independent random variables which take values on G, then the probability distribution px1x2 of the product variable x1x2 is given clearly by px1x2=* heorem 3. For two probability distributions p1 and p2 en. TG, we have the following relation between the characteristic matrices of p1, p2 and p1*p2 Proof. From the definitions By (11) and theorem follows 1, 2 we have for any probability distribution p on G. A probability distribution p on G is said to be stable after P. Levy(7) if p*p=p. Then we have Theorem 3. A probability distribution on G is stable when and only when p is a uniform distribution mn on a closed subgroup II. We call prove this theorem directly with the help of the characteristic matrices, 7 in S3. II. The but we shall Covergence sequence {pn} of the prove it later of probability probability be convergent to a probability distribution which is a continuity set of p holds (7) See P. Levy, loc. cit., (2). as a corollary of theorem, distributions. distributions p on G, on G is said if for any Borel to set E 982 Y. KAWADA and K. ITO. [vol . 22 The limit probability distribution is then uniquely determined by {pn}. It is well known that this convergence can be also defined as follows: pn*p if for any real (or complex) continuous function f(s) on G As usual it is defined that a sequence of random variables {xn} is convergent in law (or convergent in the sense of Bernoulli) to a random variable x if {pxn} converges to px. Theorem 4. A necessary and sufficient condition for the convergence of the probability distributions {pn} to a probability distribution p0 is Proof. We shall first prove that the condition (19) is sufficient. For any continuous function f(s) and for any positive number there exist from (14) a finite number of constants * such that On the other for which hand, from the hypothesis where r rune over indices with *0, (19) there and C( )=*. exists an integer n0 From (20), (21) we get and therefore, as can be arbitrarily chosen, that is, pn*p0, by (18). To prove the necessity, it suffices to observe that are continous functions of s, Q.E.D. Theorem 5. Let {pn} be a sequence of probability distributions on G. If the characteristic matrices {D(r),(pn)} converge to a matrix D(r)= (d(r)ij): 1940] On the Probability Distribution on a Compact then the sequence {pn} converges to a probability has these matrices D(r) as characteristic matrices Group . I. distribution 983 p, which Proof. By (13) and the hypothesis (23) we can choose for any continuous function f(s) a finite number of constants (r)ij(m) such that Then the sequence Mm=*(m)d(r)ij (m sequence. Let its limit be M(f), The correspondence where as -1, 2, f(s)*M(f) is easily verified. a unequely From By satisfies the following a functions theorem determined of F. probability (24) and (25) we can conclude Let Knr be all topology less of the than Knr or be space of matrices equal to 1 introduced in as as the space of the topological conditions: on G. Riesz(8), distribution pn*p, (s) we"have D(r)=lim D(r)(pn)=D(r)(p), are is a fundamental then f(s) and g(s) are coutinuous fore, ....) there p and exists, such especially for f(s)= Q.E.D. of their usual. We product d(r)ij degree nr whose absolute of {Knr}. components values, shall there- that and let the define Then Knr are com- pact separable Hausdorff space, consequently it is also the case for R. Now in virtue of (10) we can define a corresponding point P(p) in R to any probability distribution p on G by This correspondence, (8) See, for example, (1938), 408-411. is one to one S Saks, Integration and continuous; in abstract and spaces, the subset Duke Math. Jour. of 4 984 Y. KAWADA and K. ITO. [vol. 22 all the elements of the form P(p) of R is closed in R, as follows from theorems 1, 4, and 5. Therefore we have the following theorem: Theorem 6. The space of all the probability distributions on G with the topology pn*p as defined above is a compact separable Hausdorff space, embeddable in R(9). Remark. The definition of characteristic matrices is a modification of that of the characteristic function of the usual real distribution: we use namely the continuous irreducible representations of G instead of the character function eux of the additive group of real numbers {x} in the usual case. Theorem 1, 2, 5 are quite parallel to the usual distributions, they correspond to theorem 9, 13, 11 in the work of H. Cramer, " Random variables and probability distribution," (1937)(10). III, Markoff process (homogeneous case). Now we consider a simple Markoff process on G. Let Pn(s, E) denote the transition probability that an element s*G may be trans(9) Cf. N. Kryloff and N. Bogoliouboff, La theorie generale de la mesure dans son application a l'etude des systemes dynamiques de la mechanique non lineaire, Annals of Math, 38 (1937), 65-113. (10) When G is a finite group of order g, the characteristic matrices are more closely related to the probability distribution. Let s*R(s)=( u,us-1)u,v ( s t=1 when s=t, and =0, when s*t) be the right regular representation of G, and s-D(r)(s) (r=0, 1, 1) be the irreducible representations of G. If we reduce the regular repre-...., hsentation R(s) into irreducible representations with a non-singular matrix T =*G R(s)p(ds)=*p(s)R(s)=(p(u-1r))u,r . Therefore, if we know D(r)(p),...., D(h-1)(p), or R(p), the probability p(s) is given as the components of R(s) Instead of the regular representation R(s), the representation *(s): s*D(s)=R(s)-U of G is often used in the literature, where U is defined by g-1*R(s), namely the matrix whose components are all equal to g-1 . Then therefore D(p) is the zero matrix e T. Uno and Y. Hasimoto, sur 224-233. 17(1915). ties if and only if p is a uniform distribution le probleme du battage des Secartes, this Proc. Ser. 3. 17 (1935), 1940] On the Probability Distribution on a Compact Group . I 985 ferred into a Borel set E*G by the n-th operation of the process. If we apply to the unit element e of G the operations of the simple Markoff process, we obtain a sequence of random variables s1, s2, s3, .... on G. The transition probability Pn(s, E) is equal to the conditional probability Ps(n-1)=s(sk*E), and the n-th operation corresponds to the multiplication of the random variable s-1n-1sn. We will denote this variable with xn. An important special case arises, when the n-th operation (xn) is independent of the last result (sn-1) for any n: i.e. xn in is independent terized by the of sn-1=x1x2 .... xn-1. property(11) that are respectively represented on G as follows : where pn corresponds we by probability to the probability p(n) be the probability then Such distribution a special distributions distribution process pn(n=1, is charac- 2, .... ) of xn on G. Let of sn have where P(n)(s, E) is the transition probability that s may be transferred into E after the first n operations. Another special case is the one, where the Markoff process is temporally homogeneous, namely Pn(s, E) is independent of n. A famous example of a simple homogeneous Markoff process with the property * is the problem of "battage des cartes" treated by H. Poincare. In this case G is the symmetric group of m-letters. Let the permutations of m cards be denoted by 1, .... , N (N=M!). We denote withthesame notationalso i thesubstitution 1=( where 1/ 2),1= (1, 2, .... , m). We assume that a man has a certain habit in cutting the card,, that may be represented as a probability distribution on G, namely: let p( 1) be the probability with which he effects the substi(11) This property will he called * in this paper. ( 986 Y. KAWADA and K. ITO. tution 1 by cutting once the pij that the permutation i once the cards is given by In the following As usual may Then be the transferred transition into j we treat a simple temporally process with the property It is, therefore, sufficient p(n), which is merely cards. [Vol. 22 probability after cutting homogeneous Markoff G(11)on a general compact separable group G(12). only to consider the probability distribution the n-th power of p: we define the spectrum S(p) of a probability distribution p as the aggregate of all the points s of G for which the probability p(E) of any open set E containing s is positive: p(E)>0. It can be easily verified that S(p) is a closed set on G. Let H be the closed subgroup generated by S(p). If a closed subgroup H0 has the property that then holds clearly H0*H: H is the smallest closed subgroup with this property. A probability distribution p(E) on G is called proper in G, if the closed subgroup H generated by the spectrum S(p) is equal to G. Generally H is not equal to G, but and similarly * and * easure of G. must be equal to mG(E) from the unicity of the invariant m 1940] On the Probability Distribution on a Compact Group . I. 987 and Therefore there is no loss of generality if we consider p(E) only within H instead of within G or, what is the same thing, if we consider only the probability distribution which is proper in G. We have the following mean ergodic theorem: Theorem 7. Let p be a probability distribution on G which is proper in G. tion mG holds Then for any continuity set E of the uniform distribu- More precisely we have Theorem 8. Let p be proper in G. 1) If the spectrum S(p) is contained in a residue class s0H or Hs0 (so * H, H*G) of a closed subgroup H, then H is an invariant subgroup of G and G is the smallest closed subgroup containing s0H. Therefore G/H is abelian. For any continuity set E'*H of H of the uniform distribution mH on H holds 2) If the spectrum S(p) is not any closed subgroup H, then for every continuity set E of mG. Proof cf theorem S. To prove and (11) to show that contained (37) it is sufficient i.e. that any characteristic root (r) of matrices is smaller than 1 in its absolute value: Let y(r) Let the be coordinate a characteristic vectors sector be denoted in any for (r) with by x(r)i: residue class of by theorem D(r)(p) (r=1, 2, ....) unit length, then 4 988 Y. For the is KAWADA and K. ITO. a unitary transformation Ur which transforms [Vol. 22 x(r)1 in y(r): matrices the characteristic matrix of the unitary irreducible continuous re- presentation The coordinate for Therefore vector x(r)1 is then the matrix *(r)(p) a characteristic vector for D(r)(p), must be of the form namely *(r) (s) being a unitary representation, we have or If (r) were equal to 1 for some r, then by (40) and (41) the spectrum S(p) should be contained in the aggregate H of all elements s} of G for which *(r)(s)=1. As *(r)(s) is a unitary matrix, H{ should be the set of all the elements {s} of G which are represented by *(r) in the form =U-1r*(r) Therefore H should be closed proper subgroup of G, what contradicts our hypothesis that p is proper in G; so we must have (r)*1. We shall prove next that if there is no closed subgroup H mentioned in 1) of theorem 8, then we have If (r) were equal to ei ( a real number), then by (40) and (41) the spectrum S(p) should be contained in the aggregate F of all the elements {s} with *(r)11(s)=ei ; in other words F is the set of all the ele- 1940] On ments Let the which so be any Probability are Distribution represented elements of F, by *(r) then on in the a Compact Group . I. 989 form holds The closed subgroup H(*G) could be then defined as the aggregate of all the elements {s] for which *(r)11(s)=1, then we would have this contradicts our assumption. Thus by (42) and (43) theorem 8, 2) is proved. In the general case let e* , ... , e* characteristic roots of D(r)(p) (r=1, 2, .... matrices Ur (r=1, 2, ....) such that ( (r)j*0 (mod. 2 )) be the ), then we can choose unitary where the square matrices A(r) has no characteristic root with absolute value 1. Let F (or H) be the aggregate of all the elements {s}, for which * ..., mr; r=1, 2, .... element s0 F )). Then H is a closed subgroup of G and for any and, just as in the former case, we can prove that the spectrum S(p) is contained in F. The closed subgroup generated by s0H=F*S(p) must be equal to G by our assumption that p is proper in G. Therefore H must be an invariant subgroup and the factor-group G/H abelian. Now let p'(E) be defined by p'(E)=p(s0E), then the spectrum S(p') is contained in H and 990 Y. KAWADA and K. ITO. [Vol. 22 As *(r)(s) is a unitary matrix we can easily see from (44) that Weshall prove nextthatthematrices (*) characteristic matrices of the uniform (r=1, 2,....) are distribution mH on H. posing the representations *(r)(s) considered as representations the continuous irreducible representations of H, we obtain where there is no principal representation (s); for if * of H, then This contradicts (r=1, ...., qr; of H were the (45). r=1, Thus Decomof H into among D(r1)11(s), ...., principal representations D(rqr)H from * 2, ....) follows *(r) (mn) isoftheform (*) that the characteristic matrices (r=1, 2,....). By(45) wehave proved, therefore, for the continuity set E of mH in G. But by theorem 4 it is sufficient for the proof of (36) for the continuity set E*H of mH in H, to observe that all the continuous irreducible representations of the closed subgroup H of the compact group G are found in the decomposition of the continuous irreducible 1940] On the Probability Dist ribution on a Compact Group . I. 991 representations of G in the form (46)(13) . Therefore the proof of theorem S is completed . Proof of theorem 7. It is sufficient to show that By (44) we have Proof of theorem 3. Let H be the closed subgroup generated by the spectrum S(p) of the probability distribution p . Then by theorem 7 we have Corollary. In the problem of "battage des cartes" we distinguish three following sorts of habit. 1) Starting from a permutation, there is at least one permutation which can not be obtained however often we cut the cards, namely the spectrum of the corresponding probability, distribution p is contained (13) of modul This group-theoretical of representations theorem can be proved easily by means of the concept of a compact group used by E.R. van Kampen, Almost periodic functions and compact groups, Ann. of Math. 37 (1936), 78-91. The irreducible representations of H which are given by decomposing irreducible continuous representations of G form clearly a modul * of representations of H; that is, conjugate complex representations of such representations, or the representations obtained by decomposing direct products of such representations take also part in *. On the other hand for any two elements s,t in G there is at least one irreducible representation *G for which DG(s)*DG(t). Therefore for any two elements s,t in H we can find a representation DH of * for which Du(s)*Dn(t). From these two properties we can conclude that * contains all the irreducible continuous representations of H. (see corollary at. p. 82 in the paper of E.R. van Kanmpen.) 992 Y. KAWADA and K. ITO. [Vol . 22 in a proper subgroup of G. 2) All the possible substitutions by cutting once the cards are odd substitutions, namely the spectrum of p consists merely of odd substitutions. 3) The other case. Only in the third case we have the uniform distribution m, as the limit probability distribution, but not in cases 1) and 2). For any continuous function f(s) on G we define f(n)(s) by then from theorem 7 follows But we have a stronger result: Theorem 9. If the probability holds distribution uniformly for s in G. Especially in the case 2) of theorem p is proper in G, then 8 we have uniformly for s in G. Proof. We have seen the convergence of (50) and (51), so that it remains only to prove its uniformity. We shall prove it only for (51) in the case 2) of theorem 8, the general case being to be proved analogously. For any continuous function f(s) we have by (13) a finite number of constants (r)ij such that Let Max * follows be denoted by C. Then from (52) and D(r)(st)=D(r)(s)D(r)(t) 1940] On the Probability Distribution where no can be chosen independently of the convergence of (51), Q.E.D. IV. on a Compact Gro up. I. 993 of s ; this shows the uniformity Non homogeneous case. We will treat next a Markoff process with the property *, where the homogenity is not assumed. Theorem 10. Consider a simple Markoff process with the property If there is a positive then for any Borel number set E we >0 independent of E such that have uniformly for all Borel sets E(14) Proof. Without loss of generality we can assume that s=1. Let pn'(E) be given by then p'n*mG=(pn-mG)*mG=pn*mG-mG=0 and Henceforth follows by mathematical induction (14) Cf. T. Uno and Y. Hasimoto, loe. cit. (10). similarly mG*p'=0. 994 Y. KAWADA and K. ITO. then we have qn(E)>=0 Taking G-E [Vol. 22 and for E in (60) we have 1-p(n) (E)>=(1-(1- )n)(1-mG(E)), namely From (60) and (61) it results, that limn*p(n)(E)=mG(E) Bore] sets E, Q.E.D. We define now two or more random lly commutatice in law, if variables x1, holds for any permutation (i1, .... , in) of (1, .... ,n), holas for any Borel set E with mG(E)< As special cases of this theorem uniformly .... By theorem 6 we can , xn as mutua namely D(r)(x1.... . then we can replace (62) by the condition that pxn are absolutely continuous to mG uniformly in n, or by a stronger condition that Proof. for all choose the assumption with a subsequence {xnr} respect so that 1940] On the Probability pxnr*p (r=1, 2, 3, ....) Distribution holds. on a Compact Group. I. 995 Then we shall for any Borel set E with mG(E)< /2. then p(E) =limr* px nr(E)<=1- If for mG(E)< mG(E)< , then we can represent E is a continuity . E=*En, 3, ....); 1- . If E is any Borel set with mG(E)< /2, so that mG(O-E)< /2. of open continuity therefore Then set of p, If E is any open set with E as a union En: O(O*E) En*En+1(n=1, 2, first show that sets p(E)=limn*p(En)<= we can choose an open set we have p(E)<=p(O)<=1- , as (O)< . Thus (63) is i n any ease proved. mG From (63) we can conclude that the spectrum S(p) of p generates the whole group G and that the condition 1) in theorem 8 can not hold, because any proper subgroup H (and any residue class) has its measure 0: Therefore from theorem 8 we have limn*p*n=mG. consequently for a suitable subsequence From pxnr*p follows rm, or From (65), (10) and the hypothesis in law we conclude that are mutually commutative Q.E.D. Appendix. We have seen by theorem value Mf(s)=* 9 how we can give concretely the mean foranycontinuous function f(s)ona com- pact separable group by Markoff process: let p be a probability distribution (namely a mass distribution with total mass 1) with the condition 2) in theorem 8, then 996 Y. uniformly KAWADA in s, where p(n) is defined and K. ITO. recurrently [Vol. 22 by The condition 2) of theorem 9 is satisfied when p has the spectrum S(p)=G. For example, let {sn} (n=1, 2, 3, ....) be an enumerable set everywhere dense on G, and let us distribute the mass 1/ 2n on the point Then the so defined distribution p: Sn. satisfies clearly the condition 8(p)=G. Our proof of theorem 8 is based on the existence of a uniform distribution mG on G. But conversely we can prove the uniform convergence of (66) directly in the case of the probability distribution p of (68), whence we can prove the existence of mG by the method used by J. von Neumann(15) Theorem 12. Let p(E) be the probability distribution on a compact separable group with S(p)=G. Then the sequence for a continuous function f(x) (f): on G converges uniformly to a constant M Proof. The uniform convergence of (69) is proved as in a note of S. Iyanaga and K. Iiodaira(6). (15) (16) a Group, J. von Neumann, loc. cit. (5). S. Iyanaga and K. Kodaira, On the Theory Proc. Imp. Acad. Tokyo, 16 (1940), 136-140. of Almost quite Periodic similarly Functions in 1940] On the Probability Let Mn, mn„ and On by the definition Distribution be defined of f(n)(s). on a Compact Group . I. 997 by Therefore it is sufficient, to prove limn*On= 0 for the proof of (69). Let be an invariant metric on G: (s, t)= (sa, ta) = (as, at), then we can find for any assigned positive number so that (s, t) < implies |f(s)-f(t)|< /2 from the uniform continuity of f(s) on G. We can see then W e can divide G into a finite number of mutually disjoint Borel sets E. where Ei point. For then has its diameter smaller Then from S(p)=G any given pair of than /2 and has at least an inner we have points s, t of G we choose Ej by so that 998 Y. KAWADA and But by (70), (72) and the definition in sE1UtEj is less than /2, If we choose n0 so that (1- K. ITO. of {Ei} therefore the [Vol. oscillation 22 of f(l-1)(u) we have )n0O0< /2, then On< for n>n0, namely limn* On=0 is proved. The conditions 1)-5) are direct consequences from our definition of M(f), Q.E.D. For an almost periodic function f(s) on a group G we can quite analogously construct the mean value of f(s) by means of the metric introduced in G: Mathematical Tokyo (Received Oct, 24, 1940). Institute, Imperial University.
© Copyright 2025 Paperzz