Quaderni di Statistica Vol. 2, 2000 On parametric models for invariant probability measures Patrizia Berti Dipartimento di Matematica Pura ed Applicata "G. Vitali", Università di Modena E-mail: [email protected] Sandra Fortini Istituto di Metodi Quantitativi, Università "L. Bocconi" E-mail: [email protected] Lucia Ladelli Dipartimento di Matematica, Politecnico di Milano E-mail: [email protected] Eugenio Regazzini Dipartimento di Matematica "F. Casorati", Università di Pavia E-mail: [email protected] Pietro Rigo Dipartimento di Statistica "G. Parenti", Università di Firenze E-mail: [email protected] ~ Summary: Let ~ be a sequence of random variables with values in a Polish space . If ~ is exchangeable (stationary), then ~ is i.i.d. (ergodic) conditionally on some random probability measure ~ on (~ on ). In the exchangeable case, two necessary and sufficient conditions are given for ~ to be ~ ~ i.i.d. conditionally on ~ , where is a given transformation. Essentially the same ~ conditions apply when is stationary, apart from "i.i.d." is replaced by "ergodic" and ~ by ~. The basic difference between the exchangeable and the stationary case lies in the empirical measure to be used. Up to a proper choice of the latter, the two conditions work in a large class of invariant distributions for ~. Finally, the set of ergodic probability measures is shown to be a Borel set, and almost sure weak convergence of a certain sequence of empirical measures is proved in the stationary case. Keywords: Empirical distribution, Exchangeability, Invariant probability measure, Predictive distribution, Predictive sufficient statistic, Stationarity. 39 1. Introduction ~ Let ~ be a sequence of random variables with values in a Polish space . Suppose ~ is exchangeable or, equivalently, ~ is independent and identically distributed (i.i.d.) conditionally on some random probability measure ~ on the Borel -field on , . In Theorem (7.2) of Fortini, Ladelli and Regazzini (2000) referred to as FLR in the sequel conditions are given for ~ to be i.i.d. conditionally on ~ ~ , where ~ is a given transformation. In fact, the hypotheses of Theorem (7.2) concern sequences of predictive sufficient statistics, and are conceived to have some statistical content within a Bayesian predictive framework. In so far as the sole representation part is concerned i.e., finding conditions under which ~ is i.i.d. given ~ ~ , where ~ is an assigned transformation the hypotheses of Theorem (7.2) are sufficient but not necessary. The aim of this paper is to take up again the representation part to provide necessary and sufficient conditions. From a statistical point of view, such conditions are useful for investigating existence of underlying parametric models for the distribution of ~. In particular, two conditions are given. The first one (that will be denoted as condition (b)) is of the abstract type. Nevertheless, by using it, various other sufficient conditions are easily obtained, included the one proposed in FLR. Moreover, it allows a shorter and more direct proof of Theorem (7.2). The second condition (condition (c)) applies to the special case where ~ is continuous (an assumption made also in Theorem (7.2)). In this case, however, it has some statistical meaning. The problem studied in this paper is different from the one tacled in Olshen (1974). In the latter, among other things, an exchangeable sequence ~ is shown to be i.i.d. conditionally on some real random variable . Such result, clearly, is based on the Borel isomorphism theorem. In our case, instead, we investigate whether ~ is i.i.d. conditionally on ~ ~ , where ~ is a given transformation (possibly, suggested by a sufficient statistic) and ~ is the almost sure weak limit of the empirical measures. ! " # # ( ) $ % * * , - . / + , 0 , - . / , 1 0 1 , 1 2 3 4 5 6 7 4 6 40 & % ' A further and minor goal of this paper is to start, in a particular case, an analysis of Bayesian nonparametric problems for non exchangeable data. Suppose ~ is stationary or, equivalently, ~ is ergodic conditionally on some random probability measure ~ on . Then, conditions (b) and (c) are still working, apart from "i.i.d." is replaced by "ergodic" and ~ by ~. In this framework, incidentally, the set of ergodic probability measures is shown to be a Borel set, and almost sure weak convergence of a certain sequence of empirical measures is proved. The basic difference between the exchangeable and the stationary case lies in the empirical measure to be used. Up to a proper choice of the latter, conditions (b) and (c) work for a large class of invariant distributions for ~. The paper is organized as follows. Section 2 is devoted to preliminaries and notation, Section 3 includes the statement of Theorem (7.2), while Sections 4 and 5 contain the main results, in the exchangeable and stationary case, respectively. 8 8 9 : ; < = > ? @ A 2. Preliminaries and notation The basic notation is that of FLR, with some slight adaptions in view of the last Section 5. Let be any set. Then, is the space of all sequences of elements of , and ~ is the -th coordinate map on , that is, B D E F D G B D G L H H H I C J N K M O P Q N ~ P R N S T for all N O O N T R N U N P L Z U V V V S Q X and W M P Y W . [ P When is a topological space, denotes the Borel -field on , and is equipped with the product topology. By a Polish space, it is meant a topologically complete separable space. If is Polish, then is Polish, too, and . Given a -field on , let be the set of probability measures (p.m.'s) on , and let be the -field on R P S Q P P Q Z P Q R \ Z S ^ P T Q R S _ ` _ a ] b c ] d d f 41 c g e h i j k l m n k o p q l r s p l t generated by the sets , for and . Any measurable function , where is any measurable space, is called a random p.m.. Thus, a function on is a random p.m. whenever is a p.m. on for , and is -measurable for . We will consider a sequence of observable random elements taking values in a Polish space . For our purposes, it is convenient to work in the coordinate space and to identify the sequence of ~ ~ ~ observables with . Moreover, we let and , i.e., and denote the sets of p.m.'s on and on , respectively. Both and are equipped with the topology of weak convergence of p.m.'s. Since is Polish, and are Polish, too, and and coincide with the -fields and defined earlier. For every , the empirical measure associated to ~ ~ is defined as u v w x y z { | } ~ w z } w z } ¡ ¤ ¡ ¤ £ ¢ ¢ ¤ ¤ £ ¢ ¢ ¢ ¤ ¤ ¤ ¤ £ ¥ ¦ ¢ ¦ ¢ ª § ¨ © « ¬ ® ¯ ¯ ¯ ® ° ± « ¹ º » ² ¼ ~ ³ ´ µ ´ ¶ · ¸ ½ ¾ ¿ where is the unit mass at . Clearly, is a p.m. on for , and is Borel measurable for , i.e., is a random p.m. from into . A p.m. on is exchangeable if it is invariant under all finite permutations of , that is, if for all functions of the form Â Ã Ä À Å Æ Ä Ç Æ Á Í Ë Å È Ì Ç Å É Ã Ä Å Æ Ä Ê Æ Ê Í Ì È Ä Ç Æ Í Ë Ë Ð Ð Ã Ä Ñ Õ Ò Ó Ö Ç Î Ä Ç Æ Æ Ä Ï Î Ä Ï Æ Æ Ô × Ø Ù Ø Ú Û Ü Ý Þ Þ â ß â à á à Þ ã ä å æ æ æ å ä å é ä ê å ê ë æ æ æ ç è é ã ä å æ æ æ å ä å ì ú ï ø ð ÿ ý ú ú ÿ ÿ ñ that ~ õ ñ ~ ~ ø ñ ó ý ñ õ õ ú ó ô ô ô õ æ æ æ ç é ù ó ø û ý ò ô ô ô ó þ ø ú ë and some permutation of . As denotes the p.m. on given by . When is exchangeable, we will also say is exchangeable under ñ ø û å ê î ÷ for some usual, ä ì í ô 42 ò ñ õ ü õ ñ ö ó ô ô ô ó ï õ Given , let denote the corresponding product p.m., i.e., is the p.m. on which makes the coordinate random variables ~ ~ i.i.d. according to We are now in a position to state de Finetti's representation theorem for (infinite) exchangeable sequences. Theorem 1 (de Finetti's representation theorem) Let be a p.m. on where is Polish. Then, the following statements are equivalent: (i) is exchangeable; (ii) There is a unique p.m. " on such that # # for all $ ! ; % (iii) There is a random p.m. ~ (i.e., a Borel function ~ such that, for each ,~ is a version of ~ & , - . 5 / 0 6 1 . 6 , 0 2 2 . Moreover, when is exchangeable, one has all and 7 - . 5 : . 7 3 0 - 8 ' + ( ) , 4 2 . 9 ) ~. 1 ~ 1 * 0 - 7 0 for 0 ; < ~ = > where " # & ? -a.s., @ " stands for weak convergence of p.m.'s. = We close this section with the notion of statistic. By definition, a statistic is a measurable function of the data. In our case, the data are ~ the -tuple ~ for some Under the exchangeability assumption, the order in which observations appear is not relevant, and ~ ~ can be summarized by the empirical measure . Let A B C D E E E D C F A E G B > C D E E E D C F < G > H > I J K L M N O M U P Q R V 43 S P T W be the union of the ranges of all the random p.m.'s , and let be a Borel subset of such that . Further, let be a Polish space and ~ a Borel function. Then, for each , ~ is ~ ~ Borel measurable on and is a summary of the data . Accordingly, in Sections 3 and 4 a statistic is meant as the restriction to , ~ , of any Borel function ~ , where and . In last Section 5, a slightly different (but conceptually equivalent) notion of statistic is used. X ] ] ^ Y Z [ \ _ ` ` a b c d e f c g j k b h m i l n o p q r r r q p s j l u t t w c n c e f g f v f s j j k b j k t f x j k 3. Statement of Theorem (7.2) of FLR y w n m Let be an exchangeable p.m. on and let ~ ~ be the sequence of coordinate random variables on . According to Theorem 1, conditionally on some random p.m. ~, ~ is i.i.d. according to ~ . Fix a Borel function ~ , where is Polish, and , and suppose that where is as in Theorem 1. Under these conditions one question is whether, conditionally on ~ ~ , ~ is again i.i.d. according to ~ . Answering this question is useful, for instance, for investigating existence of an underlying parametric model for the distribution of ~. Suppose in fact that, conditionally on ~ ~ , ~ is i.i.d. according to ~ . Then, ~ and ~ ~ play essentially the ~ ~ ~~ same role, and a random parameter can be defined as . In other terms, the "original random parameter" ~, whose existence is granted by de Finetti's theorem, can be reduced through ~. As far as ~ is concerned, in what follows it should be viewed as any given transformation. However, in a particular statistical problem, ~ is typically suggested by some sufficient statistic. In any case, a positive answer to the earlier question occurs in case o s n d p z p e v s l { m | o | c p e f g j w f v n k b b t f s f ~ n x f s z } j j k j k j k ~ n | | c s p p n | | | n c | c s p s (a) ~ and ~ ~ for some function ~, -a.s., such that 44 where is the completion of with respect to the distribution ~ ~ of . Indeed, under (a), ~ is i.i.d. (according to ~ ) conditionally on ~ ~ . In particular, one has ¡ ¢ ¦ £ § ¡ ¨ ¤ ¡ ¥ ª § for all © ~ where is the distribution of ~ and is the completion of . Theorem (7.2) of FLR, quoted in Section 1, just gives conditions for (a). To state it, various definitions are needed. First, let be any p.m. on . For each , ~ ~ ~ is called the predictive distribution (at time ). A statistic ~ is predictive sufficient if, for each and , one has ¨ ¨ ¨ © © © ® ¯ ° ± ² ³ ª § « ¥ ´ µ ¶ ® · ¸ ¸ ¸ · ® ¬ ¹ º » ² ¶ ¼ ½ ² ´ ¿ ¾ ´ Á  à ³ À Ä Â Å ® Æ ~ ¯ ° ± ² ³ ´ ~ ¶ À Ç È ¹ É Â Å ® Æ ¾ º ¼ ~ ¯ ° ± ² ³ ´ ¶ À º ® ~ · ¸ ¸ ¸ · ® » ~ ¹ Å ¼ a.s.. So, the informal idea is that ~ is predictive sufficient provided predictive distributions depend on data only through it. We send back to FLR for more on predictive sufficiency and for some references therein. Next, let be a countable -class such that and for all , where is the marginal of on the first coordinate. Further, for each , fix a function on such that: is a p.m. on for , is -measurable for , ~ ~ ¶ ½ ¾ Ê Ñ Ë Í Ó Ô Õ Ö × Ø Î Ò Ì Ï Ê Î Ù Ö Ú Ü Ó Ñ Ï Ð Î Ò Ï Ó Û Û Ý Þ ß à á Þ â ã ä å ß æ á ã á ã ç è é å â Þ ã ä å ç ê ß æ ð ñ é è ë å â ò ã õ å ë ò ö ÷ ø ù ò ú ö û ÷ ú ü ý ó ê â ã ô ä å ò ö ÷ þ æ ì ê ÿ ë è í î ê ï å for all and , where denotes the distribution ~ of . Clearly, such exists since is Polish. Finally, consider the following condition; cf. condition (7.1) of FLR. þ 45 (*) There is a set ~ Each For each $ % & ( ) * ' / satisfying: ; is a convergent sequence; , and , there is + % & / , ! ( " ) * $ ' * whenever 1 # such that - . for all sufficiently large / / & 0 2 / and 3 & ' ( , * - . 4 5 for all sufficiently large , 6 6 8 7 9 being a metric for . 5 We are now able to state Theorem (7.2). Theorem (7.2) of FLR Let be Polish spaces, an exchangeable p.m. on , ~ ~ ~ and a Borel function. If is predictive sufficient, is continuous on , , and condition (*) is satisfied, then condition (a) holds. : ; 9 < = > : ? @ 9 G A B C H D A > ? I E F A J K G H G C H C Finally we note that, if ~ is predictive sufficient, ~ is continuous on and , then condition (*) is necessary in order that (a) holds for a continuous ; cf. Theorem (7.4) of FLR. A > ? I E F A J K G H G C H C L 4. The exchangeable case < = > : ? In this section, is an exchangeable p.m. on are as in Theorem 1. For definiteness, setting N O P Q R S T \ U V ] Q W converges weakly as it is supposed that 46 X @ Y and ~ and Z M [ , K ~ ^ ` a b weak lim e for all f _ ` a b f a c d and ~ for all , where is any fixed element of ~ Given any Borel function , with , let us consider the condition: l g h m n g h i j g o . k o p z y and y q r s t u s v p w s x p p y s { | p (b) ~ is injective on z q , for some } such that y } v w s x p . ~ } w x Before any comments on (b), we prove that it amounts to (a). Theorem 2 Let be Polish spaces, an exchangeable p.m. on , and ~ a Borel function. Then, conditions (a) and (b) are equivalent. Moreover, under (a) or (b), the function involved in (a) can be taken Borel measurable. z u w x y q r s t u p Proof (a) that ~ z (b). Under (a), there is a set , , such ~ ~ ~ and for all . Then, ~ whenever and , i.e., ~ has ~ outer measure . Further, is an analytic set. Hence, there is ~ with and . Since ~ is injective on , condition (b) holds. ~ (b) (a). Since ~ is injective on , for each there is precisely one such that ~ . Fix , and define ~ ~ for , and for . Then, ~ for all . Further, is Borel measurable. In fact, ~ , due to ~ is Borel and injective on , and for the same reason one has v w x w x y q w x v s w x w x v p z ~ ~ £ ¤ ¥ ¦ § ¨ ¥ © ¥ ¤ ® ¦ £ ¤ ¥ ¦ § ¨ ¬ £ ¤ ¥ ¤ ¨ ¦ ¦ ¡ ¢ ¥ © ª « ¥ ¤ ® ¦ § ¨ ¨ © ® £ ¯ ª ° ± ² ³ ¥ ¼ ® ¦ © ¤ ~ ½ ´ for all ¤ ¶ µ ¶  · Å ¸ µ ¹ µ Æ ¿ ª · · ½ ¦ À ¥ Á ¹  ~ ¹ ® ¾ µ ¿ · à º . 47  ¶ Ä À ~ ¹ µ ¿ ¸ ¶ ·  Šµ » · ~ Finally, let , one has ~ for a Borel function . Ç Ç Ò É È Ó É Ë Ë Ï Í Ì Ì Ò Ê Ë Ç Ì . Then, ~ ~ and Ð É Ó Ô Õ Ð Ö Ø Ë È Ì Õ Ò È Ñ Ë ~ É Í Ì Ë Ì È Ò × Î and, for all . Hence, (a) holds Ù × A first remark on Theorem 2, even if not essential, is that the function in condition (a) can always be taken Borel measurable. Apart from this fact, two other points need to be discussed. One is the possible meaning of (b) and the other is that, taking (b) as a starting point, various other sufficient conditions for (a) can be obtained. Ú 4.1. Meaning of condition (b) From a statistical point of view, to get condition (a) (which is our goal, as explained in Section 3), it would be desirable to ask conditions only on the way ~ summarizes data. More precisely, it would be desirable to ask conditions on ~ , in particular on its connections with ~ predictive distributions, but not on the behaviour of on . Strictly speaking, this is not possible in general. Suppose in fact that , choose any Borel function on , and take to be any Borel set with . Then, apart from trivial cases, admits two Borel extensions to , say ~ and ~ , such that (b) holds ~ ~ for and fails for . In view of Theorem 2, condition (a) holds for ~ ~ but fails for ~ , even if ~ . In Theorem (7.2) of FLR, for instance, continuity of ~ is asked on all . However, all other conditions of Theorem (7.2) concern ~ only, and also the continuity assumption looks admissible. Indeed, when ~ is continuous on and admits a continuous extension to (e.g., when ~ is uniformly continuous on ), it is natural to take ~ on as such continuous extension. Generally, condition (b) deals with the behaviour of ~ on , and in this sense it does not have an intuitive statistical content. Nevertheless, (b) is also necessary, and thus one can think in term of it without any real loss of generality. In addition, (b) makes clear which properties are requested to ~ in order that ~ can be reduced through ~: Û Û Ü Ý Û Ý à Þ ä å æ ç è ß å â á é ê å ã ë ä é æ ç ì í â ã ë ì é î î ë ë î ï ì î ë î ï î ë î ð å ç ç î ð å ã ï ë ï î é î ë ð å ì î å é ë î å î é ë ì î é ñ ë î ò 48 å ì î ì over a set of -probability , ~ must be able to distinguish between two different weak limits of empirical measures. ó ô õ ö 4.2. Other sufficient conditions for (a) By using Theorem 2, sufficient conditions for (a) can be obtained. Moreover, it is possible to give a shorter (and more direct) proof of Theorem (7.2). We begin with the latter point. Proof of Theorem (7.2) of FLR Since is countable and with and ~ ~ ~, . Moreover, condition (*). Let ÷ ø ù ú û ü ý þ ~ û ÷ ~ ÿ for all for all -a.s., where the , there is and are as in ~ . Since ~ is an analytic set and , there is with ~ and . Let . By definition of , ~ continuity of ~ and condition (*), it follows that ~ whenever ~ ~ ~ ~ ~ and . Hence, is injective on , and since , condition (b) holds. By Theorem 2, this concludes the proof. ! " # ! $ % & ) - , . / & 4 ( 1 * 2 ' + 5 0 ' ( ) * + & 0 ' ( , * ( ) * & + ' ( , * 1 6 ( * + 3 2 7 Let us turn now to some other sufficient conditions for (a). If assumptions on the behaviour of ~ on are allowed, then, by Theorem 2, a plenty of conditions are available. Because of the remarks in Subsection 4.1, however, we focus on conditions concerning ~ only, apart from continuity of ~ which is asked on all . Then, one possible candidate is: 4 5 6 8 9 & : 9 & 4 5 & 6 ; (c) There is a set > . ; ? ( < * , = ( * 49 + 3 - such that @ A B C D E F K M N S ~ G Q H I J G R K L ~ M H I J G Q N L L O P Q T whenever and @ A B A U V G J G K L M J G N L L O P W Q X Q Q Y where Z is a metric for \ and [ is Prohorov metric on W ] Z We recall that Prohorov metric ` a b c d e f g h i j k l a ` on W m d is given by ] n c ` m d o m ^ t u where v w x y ^ z s w { t | } ~ p q r being a metric for . Theorem 3 Let be Polish spaces, an exchangeable p.m. on ~ and a Borel function. If ~ is continuous on , then conditions (a), (b) and (c) are equivalent. d _ y , ` for all s W , and Proof Let ~ be continuous on with . By Theorem 2, it is enough to prove that (b) and (c) are equivalent. (c) (b). Let be defined as at the beginning of Section 4. Since ~ is analytic and has -outer measure , there is ~ with and . Then, (b) holds with . To see this, the only non trivial fact is injectivity of ~ on . Fix such that , and take with ~ for . Then, lim , ~ due to and , and thus continuity of and condition ¥ ¨ © ¨ ª « ¦ ¡ ¥ ¬ ´ © ® ¯ ¦ ° ª £ ¤ ª § ± ´ ¢ « ¬ µ ¯ ± ª ¶ « ¬ µ ¬ © ª ¨ ¯ · ¨ ¬ ¶ ¸ · ¸ µ « ¯ « ¹ ¶ º ¨ » · ¨ ¶ ¼ · (c) yield ½ ª ¿ ~ ¼ ª ¨ ¬ ¶ ¯ ~ ¼ ½ ª ¨ ¬ · ¬ © lim ª ¿ µ 50 ~ ¼ ¾ ± ª µ « ¬ ¶ ¯ ~ ¼ ¾ ± ª µ « ¬ · ¬ ² ³ . ² ³ (b) (c). Define such that and, since À Á Í Î ~ Å Ï Ç Ð È Ì Ï Ð Í ~ Ç Å Ú Á  ~ Ã Ñ Ò Ä Ó Ò Å Ô Ê Ë Õ Ç Ç Ì , note that È Ö Ç È Ú Í Ù Ï Ç Ð ~ È Å Í Î Ç È Î È , and fix . Then, ~ Æ Ç × Ø È Â É Å Ï Ù Á Ç È Í Ù Î , Ã Î Ö Ç È lim È Â Ï Ç Ö Ç È Ú Ö Í Ç È Î È × . Ø Ï Ù Ù Ù Thus, continuity and injectivity of ~ on yield Ì Û ~ Ý lim Ç Ö Û Ç ~ È Þ Î ~ ~ Ý Ö Û Í Ç È È Â Ç Å Þ Û Ï Ù Ù Ù Ü Ç ~ ~ È Å Þ Û Í Î Ç È È × . Ø Þ Ï ß Ü By Theorem 3, in the relevant case where ~ is continuous on and , (c) is equivalent to (a). Moreover, in line with the remarks in Subsection 4.1, condition (c) deals with ~ only. Hence, Theorem 3 is an improvement of Theorem (7.2). Next, given a function with and metric spaces (with distances and ), let us call "uniformly injective" in case: For each there is such that whenever ~ and . If is uniformly injective on , then condition (c) trivially holds. Thus, by Theorem 3, it is enough for condition (a) that ~ is uniformly injective on and continuous on , with . We close this section by noting that condition (c) becomes more meaningful if is translated into a sort of distance between and . To this end, it is convenient to replace Prohorov distance with some other equivalent metric. Let be the set of real valued functions on such that, for all , Û à Ë Ç È Â É à â Ë á Û ã è Õ å æ ç ä è æ æ æ è Ý Ý Õ õ ë ì ë é ð ò ó ì í î ï î ð ñ ò ï î ó ñ ñ ô ö ê ÷ í î ð ò ó ñ ê ô ø ù ú ø ù û ü î û ñ ü ÿ ý þ î î ñ ò î ñ ñ î ò ò ñ î ò ò ñ . The so called bounded Lipschitz metric on # $ ! " % & $ 51 ' is defined as ' , ý á ) * + and it can be shown to satisfy Corollary 4.3, p. 33). Thus, by using can be equivalently written as: , - , (see, e.g., Huber, 1981, instead of , condition (c) . ( / 0 1 8 2 3 9 4 (c) There is a set : ; < = > 3 ( 5 6 7 D 0 such that ~ 1 ? @ - 7 / + , G C 1 ( * - A B 1 H C 3 ~ 7 @ A B 1 G 3 3 E F / whenever and G G Q Q + P P S : D G ; < ; I J = > ? S K G N 1 J O P C 3 L 1 J R P G D 3 E K F M R G Q Q 5. The stationary case This section includes versions of Theorems 2 and 3 and a convergence result for the case where is stationary. Some remarks on Bayesian nonparametric inference for stationary data, or more generally for data with invariant distribution, are also given. We begin with a result asserting that, under general conditions, every invariant p.m. is a unique integral mixture of extreme points. Given a measurable space and a class of measurable functions , let be the set of those p.m.'s on which are -invariant, i.e., for all . Further, let be the set of extreme points of , and let be equipped with the trace -field . According to Section 2, is generated by the maps , from into the reals. We also recall that a p.m. on is perfect if, for each measurable function , there is such that and . Next Theorem 4, due to Maitra (1977), unifies results of Bogoliouboff, de Finetti, Farrel, Kryloff and Varadarayan. 4 1 W X Y ^ Z _ [ \ Y ^ Z a _ 7 T [ b 3 U V c ` b ] d e e ` j a g g h i h i ` o p n m _ n o g h i ` o p m _ o l n q r q s s o b t f _ g h i _ | } ~ x ~ } { u x v y w 52 ` _ n e { i ` n g h k ` n f _ f z y o Theorem 4 (Maitra) Suppose is countable, is countably generated and includes the singletons, and every p.m. on is perfect. Then, for each there is a unique p.m. on such that for all . ¡ ¢ In our case, where is Polish, so that meets the conditions of Theorem 4 and the -field reduces to . If is the class of all finite permutations of , then is the set of exchangeable p.m.'s and the set of product p.m.'s, i.e., . Hence, the equivalence between (i) and (ii) in Theorem 1 follows from Theorem 4, after noting that and are connected by the relation , where for . In other terms, at least formally, de Finetti's theorem can be embedded into a more general result on invariant p.m.'s. Since de Finetti's theorem is fundamental in Bayesian nonparametric inference for exchangeable data, one could hope that, taking Theorem 4 as a starting point, a relevant part of the usual theory can be extended to the invariant case. In principle, this is possibly true. However, moving from the exchangeable to the invariant case, the problem becomes technically much more intricated. So, developing a nonparametric theory for invariant data, analogous to the usual one for exchangeable data, seems to be very hard. In particular, it looks hard to get usable statistical procedures. On the other hand, it would be interesting to investigate which part, if any, of the usual Bayesian nonparametric theory can be extended to invariant data. In the sequel, as a significant example, we discuss the stationary case. Let be the shift transformation: . A p.m. on is stationary if ~ ~ and ~ ~ have the same distribution under , or equivalently if . Clearly, is stationary in case is exchangeable, but the converse is not true. If is stationary and for each Borel set with , then is said to be ergodic. When is stationary or ergodic, we will also say that ~ is £ £ ¡ ¨ ¤ ¥ ¦ ¤ ¥ ¦ § « © ¤ ¥ ¦ ¢ ¨ ª ¨ £ ¤ ¨ ¥ ¦ ¬ ® ¯ ° ± ¨ £ ² ² ³ ³ ´ µ ¶ ´ ¯ ° £ ² ® ¸ £ £ · ¥ ¥ ¹ ¹ ¹ ¥ ¥ ¹ ¹ ¹ º ¿ · » ¼ ¼ À ½ ¾ À  à  à » Ä Ä Ä Å Á  à ¼  à ¼ Ä Ä Ä Å ½ » Á Æ Á Ç È Á É Á » Á À Ê Ê Ê Æ È Ê Á É Å Ë Ì Í Ã Î Ï Á  53 Ð Ñ stationary or ergodic under . Let be the set of ergodic p.m.'s and the set of all p.m.'s on ; cf. Section 2. By relying on an argument of Maitra (1977), we now prove that is a Borel subset of . Ò Ø Ó Ô Õ Ö × Ñ Ò Ö Lemma 5 If is a separable metric space, then Ô Proof Let: Ú Ñ Ù Ø Ó Ò Õ . Ö Û of stationary p.m.'s, all , and for all By Lemma 4 of Maitra (1977), the -field is sufficient (in the classical sense) for . Since is countably generated, sufficiency of implies existence of a sufficient and countably generated -field such that ; cf. Burkholder (1961, Theorem 1). Fix countable fields and such that and , and define à Ü Ý Þ , ß the set for á î ã ï ä å æ ç è é ê ë ç å é ð ã ì ô õ æ ö ÷ ø ù ú û ÷ í ò ó â ë ñ õ ÷ ü õ ù ù ó ý û ö þ ÿ ï ò ð ï ÷ ø ù ï ï ï ï and Since for all , for all ! " . " and are countable, is Borel, and since , one has . Let . Since is degenerate on the -class , then is also degenerate on . Since , it follows that . Hence, , while it is clear that . To sum up, . $ # & & ' ' % $ & & ' ' ( $ ) ( ( ( * + Setting , Theorem 4 applies to stationary p.m.'s, and the set of extreme points of stationary p.m.'s coincides with . Hence, each stationary admits the representation for some unique p.m. on . Furthermore, just as in the exchangeable case, is the probability distribution of ~, for some Borel function ~ ~ ~ for all such that ~ is a version of . " ( / , 0 - , 0 ( 0 1 1 1 1 2 ( 54 3 4 . - To realize the program sketched above, i.e., to develop a Bayesian nonparametric theory for stationary data, one has to assess priors on . Precisely, one should "propose" some reasonable class of priors on , and calculate the corresponding posterior and predictive distributions. Such priors should have large support, so as to obtain a real nonparametric theory. Further, they should cover a broad range of potential beliefs, and the posterior and predictive distributions should be not too difficult to evaluate. Clearly, it is not easy to put together all these requisites. As a preliminary step, we investigate, for stationary data, the same problem of Section 4, i.e., existence of underlying parametric models. A different kind of empirical measure is to be used. Given , define: 5 6 7 : ; 8 5 @ 9 < = > ? ~ A C G H ~ , ,~ B B B ; for C D D N R I D E F @ , J P U V W K : O P for ~ G N Q X N R P S T L . M Q g For fixed , and , is a p.m. on . To obtain a p.m. on , we fix any and we refer to instead of . (For each p.m. on , denotes the p.m. on ~ under which ~ has distribution , ~ has ~ ~ ~ distribution , and is independent of ). Clearly, this is only a rough device to transform into a p.m. on , and will not play any essential role. Next, call the integer part of and define a Y Z [ \ b ] ^ _ e f [ ` _ c d ] ` b g a _ ] ` j \ i ^ h j k e l x m n o s t q x y r z f b p u v v w s ~ y { | { | } } } ~ v y { | { | } } } ~ w x y z ~ w . Thus, each is a Borel function, and it is a summary of ~ ~ the data , , . When is stationary, we will use the as empirical measures. One reason is the following. 55 Theorem 6 If is a Polish space and ¡ ¢ £ ¤ ¡ ¥ a stationary p.m. on § ~ ¨ © ª , then ¦ ¢ -a.s. ¨ where " " stands for weak convergence of p.m.'s. Proof Conditionally on ~, ~ is ergodic with distribution ~. Hence, it is enough to show that, if is ergodic then , -a.s.. Suppose that is ergodic, and fix . Let be the canonical projection of onto . Since the sequence of -valued random variables ~ ~ , ,~ is ergodic under , one has -a.s. as . Further, for all and , a direct calculation shows that © « © ¢ § ¨ ¢ ¢ ª ¢ ¬ ° ¡ ± ® ¯ ° ² ° ¡ ± ² ¡ ¦ ¤ ³ µ ® ¥ · ¤ ¤ « ´ ´ ´ « ± ° ¥ ¶ µ ª ª ® ¥ ¢ ¶ ª ² § ° ¨ ¢ ¸ ° ¯ ¢ £ ¤ ± ¡ ² ¥ ° º ¶ ª ¬ ¶ » ¼ ½ ¹ ¾ À Á ¿ ² § ¸ ° ¯ ¤ ° ¥ · § ° ¤ ° ¡ ¥ º · § ° ± ° ° ¤ ¥ . º à ½ ½ ª ª  ½ ¹ ª ¹ º à ² § Thus, the proof. ¸ ² ¯ ¨ ° à ¢ ¸ ¯ ¢ º ° , º ª -a.s. as ¶ » , and this concludes ¼ Ä § Å Since (and not ) is now used as empirical measure, the notion of statistic is to be slightly modified, too. Let ª ª Æ Ç È É Ê Ë Ì Í Ë Î Ï Ó Ð Ñ Î Ò Ô Õ É Î Ù Ê Ì be the union of the ranges of all the , and let be such that . In what follows, in line with the notion adopted so far, a statistic is meant as the restriction to , ~ , of any Borel function ~ . At this stage, the argument proceeds essentially as in Sections 3 and 4. The first part of Section 3 remains unaffected, apart from "exchangeable" is to be replaced by "stationary", "i.i.d." by "ergodic", by , and ~ by ~. In particular, given a stationary and a Borel × × Ó Ø Ö Æ Ö × Ø Ö Ú Ü Æ Û Û Æ Í × Ø ß Ö à Ý Þ á â ã 56 ~ function (a) simply becomes ð , where ï ä å ê ë é î (a ) ~ õ and ö ò ñ û ö ó ý þ ÿ û ü ~~ æ ô ô ò ø ø ç å è î î ~, ÷ ó and ï å ù ò , condition ï å ì í î -a.s., for some Borel function ú . A slight difference between (a ) and (a), suggested by Theorem 2, is that is now asked to be a Borel function, and not merely a measurable function. In any case, if (a ) holds then, conditionally on ~ ~ , ~ is ergodic with distribution ~. So, under (a ), the "original random parameter" ~ can be reduced through ~, i.e., the random ~ ~~ parameter can be taken to be . Next, conditions (b) and (c) turn into: ó ô þ ø ÷ ô ò ø ò ÷ ò ÷ ù ô ò ø (b ) ~ is injective on ÷ , for some ñ õ , õ ô ø ñ ø ü ô ø ù ; (c ) There is a set ô û such that ö ~ ÷ ô ú ô ø ù ~ ÷ ô ø such that ô ø ø whenever and õ ô ô ø ô ø ø ; where is now Prohorov metric on As in Section 4, condition (c ) is perhaps more expressive if is replaced by some other equivalent metric, like the bounded Lipschitz metric on . Finally, the arguments for proving Theorems 2 and 3 do not depend on exchangeability of , and can be repeated for a stationary . In fact, up to a proper choice of the empirical measure (and thus up to Theorem 6), the results in this section hold for any -invariant p.m. on , with countable. We state the stationary versions of Theorems 2 and 3 jointly, and we omit proofs. ö û ö û ú ú ú ô ø 57 Theorem 7 Let ~ & ! " # $ be Polish spaces, a stationary p.m. on , and a Borel function. Then, (a ) is equivalent to (b ). ~ Moreover, if is continuous on and , then conditions (a ), (b ) and (c ) are equivalent. ' ( % ) * , , + # $ - . & ( / * , , + ( * + , Acknowledgements: Research partially supported by: M.U.R.S.T. 40% "Processi Stocastici, Calcolo Stocastico ed Applicazioni". References Burkholder D.L. (1961) Sufficiency in the undominated case, Annals of Mathematical Statistics, 32, 1191-1200. Fortini S., Ladelli L., Regazzini E. (2000) Exchangeability, predictive distributions and parametric models, Sankhya, 62, Series A, 86-109. Huber P.J. (1981) Robust statistics, J.Wiley, New York. Maitra A. (1977) Integral representations of invariant measures, Transactions of the American Mathematical Society, 229, 209-225. Olshen R. (1974) A note on exchangeable sequences, Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 28, 317-321. 58
© Copyright 2026 Paperzz