Nonparametric statistical problems under partial exchangeability

Nonparametric statistical problems under partial
exchangeability. The role of associative means.
Translated from
Problemi statistici non parametrici in condizioni di scambiabilità
parziale: impiego di medie associative ∗
Donato Michele Cifarelli
University of Pavia and L. Bocconi, Milan, Italy
Eugenio Regazzini
University of Turin and L. Bocconi, Milan, Italy
1
Introduction and summary
In a work that will be published in Rivista di Matematica per le Scienze Economiche e
Sociali 1 we developed some nonparametric statistical problems from a Bayesian standpoint. The paper has been split into two parts. In the former, we presented the problems
and described some of the approaches that have bee successfully undertaken to solve them
during the last years. Moreover, we tried to point out the role that certain numerical
characteristics of a probability distribution (such as, in particular, associative means) play
in the solution of the above–mentioned problems. The latter, of a more mathematical
character, has been devoted to the statement of the probability distribution of any associative mean of a random distribution function (d.f.) on R when the law of such a random
distribution is derived from the infinite–dimensional Dirichlet distribution introduced, for
statistical purposes, in Ferguson (1973).
The relevance of these distributions (of associative means) becomes apparent when
one thinks of them as overall Bayesian answers to nonparametric statistical problems in
which one deems it appropriate, for specific statistical decision issues, to summarize the
information contained in a random distribution by suitable associative means.
∗
Quaderni Istituto di Matematica Finanziaria dell’Università di Torino, 1978, Serie III, n.12, pp. 136. Work done under a GNAFA-CNR project and presented in a Seminar at the Institute of Financial
Mathematics, Trieste University, February 1979.
1
See Cifarelli and Regazzini (1979) on the References.
1
A distinguishing feature of the work mentioned at the beginning is that random elements associated to observations, say (Xn )n≥1 , are assumed to be exchangeable. Then,
from a merely formal standpoint, the random variables Xn turn out to be conditionally
independent and identically distributed given a random probability distribution. From
a more concrete point of view, the assumption of exchangeability is consistent with the
(subjective) choice of letting the observations exert their influence on the prior distribution
in a perfectly symmetric way, in the sense that the order of their values is thought of as
irrelevant.
There might be circumstances under which the beliefs of an observer could be in
disagreement with the above–mentioned symmetric role of the observations. This might
happen when trials are made under appreciably different conditions so that their outcomes
must be seen as “heterogeneous”. Consider, as examples, either the case of technical trials
made under different environmental conditions, or the case of a set of economic data that
can be divided into a certain number of subsets according to their origin from economic situations presenting appreciable differences such as different levels of development. In such
cases it seems more proper to use the objective criteria, which determine “heterogeneity”,
to partition the entire set of observations into a certain number, say k, of classes in such
a way that exchangeability might be a reasonable assumption within each class. This is
the idea underlying the definition of partial exchangeability proposed by B. de Finetti in
1938: the random variables which form the array
{Xi,n : n = 1, 2, . . . ; i = 1, . . . , k}
of k sequences are said to be partially exchangeable if the probability distribution (p.d.,
for short) of the array is the same as the p.d. of
{Xi,σi (n) : n = 1, 2, . . . ; i = 1, . . . , k}
for all finite permutations σ1 , . . . , σk of (1, 2, . . .).
This way, the elements of each sequence {Xi,n : n = 1, 2, . . .} turn out to be exchangeable, whereas elements belonging to different sequences can turn out to be dependent in
various ways. The actual choice of any specific form of dependence among elements of
different classes is, however, of relevant weight with the conclusions of the inductive reasoning, since this dependence determines the features of the conditional distribution of
any subset of the above array given any other disjoint subset. That choice can range from
the extreme case of stochastic independence among elements of different classes (observations from one or more sequences have no influence on the prevision of random variables
belonging to other sequences) to the equally extreme case of exchangeability of all Xi,n ,
which would in fact negate “heterogeneity”.
Examples of applications of the concept of partial exchangeabilty can be found in
de Finetti (1938), Daboni (1972), Wedlin (1974 a, 1974 b). In this connection, we are
2
completing the writing of two papers that indicate applications of the results we present
in the next sections. More specifically, these papers deal with applications of a p.d. of
a vector of k associative means, the j–th of which is a distinguished associative mean of
the j–th random probability measure appearing in the de Finetti representation of the
law of the above array of partially exchangeable random variables. Accurate deduction
and description of that p.d. of vector of associative means will be provided, for the
first time, in Sections 4, 5 and 6 of the present paper. Indeed, as far as we know, no
specific (prior) probability law has been yet specified for a vector of random probability
measures, even if laws of this type are essential in order to apply the Bayesian mechanism
to partially exchangeable observations. Then, in Sections 2 and 3 we will define a specific
probability of this type and study some of its properties. In particular, in Section 3 we
shall determine predictive distributions since they represent essential tools of inductive
reasoning that the Bayesians methods only can provide. Moreover, from the analysis of
these predictive distributions one can get useful indications for actual elicitation of any
distinguished distribution within the class determined by the definition given in Section 2.
2
A class of prior distributions on a random vector of normalized measures.
We formulate our proposal by clinging to the common principle of looking, when possible,
for a compromise between the requirement of flexibility, with respect to the problem of
representing a wide range of possible prior beliefs, and the requirement of finding forms
which might be simple both with regard to their applicability and in comparison with other
expression deduced from different approaches. After defining the symbols that will be used
more frequently, we shall hint at the Dirichlet process which represents the starting point
for our work. For the sake of simplicity, we confine ourselves at considering real–valued
random variables (i.e. random numbers: r. n. for short), but both definitions and results
explained in the present section, as well as those introduced in the next one, could be
easily extended to other random elements.
Let:
- B be the σ-algebra generated by the half open intervals in R;
- [0, 1]B be the space of functions P : B → [0, 1];
- BF B be the σ-algebra of the finite–dimensional cylinders in [0, 1]B ;
- (B1 , ..., Bn ) be a partition of R by sets in B.
Finally,
(P (B1 ), ..., P (Bn )) ∈ D(·|α1 , ..., αn )
3
is used to mean that the p.d. of the random vector (P (B1 ), . . . , P (Bn )) is the same as the
p.d. of (Y1 , . . . , Ym ) when
Yj =
Zj
Z1 + · · · + Zm
(j = 1, . . . , m)
and Z1 , ..., Zn are independent r.n.’s defined in such a way that Zj has probability density
function
1
e−z z αj −1 1(0,+∞) (z)
fj (z|αj , 1) =
Γ(αj )
if αj > 0, for every j = 1, ..., m. As usual 1A stands for the indicator function of the set
A. The p.d. of (Y1 , . . . , Ym ) is singular with respect to the Lebesgue measure on Rm since
Pm
j=1 Yj = 1. If αj = 0 for some j, then Yj is assumed to be equal to zero with probability
one. If αj > 0 for every j, then the p.d. of Y1 , . . . , Ym−1 ) has the classical Dirichlet density
(with respect to the Lebesgue measure on Rm−1 ), that is
f (y1 , ..., ym−1 |α1 , ..., αm ) =
Γ(α1 + ... + αm )
=
Γ(α1 )...Γ(αm )
m−1
Y
α −1
yj j
1
1−
m−1
X
1
yj
αm −1
1S (y1 , ..., ym−1 )
P
where S ≡ {(y1 , ..., ym−1 ) ∈ Rm−1 : yj ≥ 0, 1m−1 yj ≤ 1}.
A definition of Dirichlet process is based on the following result:
Theorem (Ferguson, 1973). Let α be a finite, non null measure on (R, B) with ᾱ =
α(R). Then there is a probability space ([0, 1]B , BF B , P) such that, for every measurable
partition {B1 , . . . , Bm } of R, the process P defined by P ( · ) = P ( · , p) := p( · ) for every
p ∈ [0, 1]B satisfies
(P (B1 ), ..., P (Bm )) ∈ D( · |α(B1 ), ..., α(Bm )).
Then, P is said to be a Dirichlet process with parameter α and P is its (functional
Dirichlet) p.d..
Given partially exchangeable arrays, formed by k sequences of r.n.’s, one has to consider
vector-valued processes (P1 ,..., Pk ), with Pi function
from B into [0, 1] for every i, and
Qk
B . These distributions will play the role
B
k
probability distributions on
1 [0, 1] , ⊗1 BF
of prior distributions in possible applications of the Bayesian paradigm. As a first example,
one can consider k independent Dirichlet processes, the j–th of which has parameter αj .
This way one would obtain k independent sequences {Xi,n : n = 1, 2 . . .} for i = 1, . . . , k.
To go beyond the definitely narrow range of applicability of such a proposal one can
resort to an idea expounded in Lindley (1971) and applied, for example, to statistical
models in Lindley and Smith (1972). We think that this very same idea, transported
into a nonparametric setting, has led Antoniak (1974) to define mixtures of Dirichlet
4
processes. According to this definition, we start by considering k independent Dirichlet
processes P1 , . . . , Pk and assume that each αj depends on a random parameter Uj (a real–
valued parameter, for the sake of simplicity). Then, the p.d. of (P1 , . . . , Pk ) turns out to
be a mixture of products of k functional Dirichlet distributions. More precisely, let
- Ui be a r.n. which ”randomizes” the parameter of the i–th Dirichlet process Pi ion
such a way that the resulting parameter αi,ui can be defined by αi,ui (A) ≡ αi (ui , A) :
R × B → [0, ∞), for i = 1, ..., k, and
(i) for any ui in R, αi (ui , · ) is a nonnull finite measure on (R, B);
(ii) for any A in B, αi (·, A) is a Borel–measurable function;
- H( · ) be a probability measure on (Rk , ⊗k1 B), with p.d. ϕ.
Following the proof of the above–mentioned Ferguson’s theorem, we can state the
consistency of the following
Definition 1 The process (P1 , ..., Pk ) is a mixture of products of Dirichlet processes if,
with respect to a probability measure P on some suitable measurable space, the following
property holds for any family of k partitions of ((Bi,1 , . . . , Bi,mi ) : i = 1, . . . , k) any
collection of k real vectors ((yi,1 , . . . , yi,mi −1 ) : i = 1, . . . , k)

P
k m[
i −1
\
i=1 ji =1

{Pi (Bi,ji ) ≤ yi,ji } =
=
Z
k
Y
Rk i=1
Di (yi,1 , .., yi,mi −1 | αi (ui , Bi,1 ), ..., αi (ui , Bi,mi )) dϕ(u1 , ..., uk ) (1)
The formal coherence of the previous definition cannot exempt us from answering the
most fundamental question about the meaning of the representation of a prior through
joint distributions (1) in view of its possible use in inductive reasoning. For this purpose,
we deem natural investigating specific probability assessments – for events concerning a
given partially exchangeable array of observables – which characterize the p.d. defined by
(1). In point of fact, induction ought to regard the elements of the array which are really
observable (thus leading to a predictive approach), rather than formulating inferences for
a vector of random probabilities (according to the hyptohetical approach). In view of the
“circularity” between these two types of inferences, (see de Finetti and Savage (1962)),
they turn out to be in a one–to–one correspondence. Then, in a subsequent step we
could obtain representations for the law of (P1 , . . . , Pk ) from distributional information
on partially exchangeable observables. However, this does not exclude the existence of
problems and situations for which it would be easier to proceed in the opposite direction.
5
With respect to the predictive approach, the main point is singling out common simple
events such as, for example,
{Xi,n ≤ xi |U1 = u1 , . . . , Uk = uk },
(
X1,n1 +1 ∈ B1,j1 , . . . , Xk,nk +1
(n )
for i = 1, . . . , k; n = 1, 2, . . . ; xi ∈ R
)
\
k (n )
Eni,1i ,...,ni,m , U1 = u1 , . . . , Uk = uk ,
∈ Bk,jk i
i=1
where Eni,1i ,...,ni,mi ≡ { among the first ni terms of the i–th sequence, there are ni,ji terms
P i
in Bi,ji , for ji = 1, . . . , mi }. Clearly, for each i, equality m
ji =1 ni,ji = ni is valid. As to the
latter event, it should be noted that it represents the argument of the conditional predictive
distribution, given U1 = u1 , . . . , Uk = uk , according to the actuarial usage suggested by the
so–called ”Credibility theory”. More specifically, this theory is implemented by assuming
P{Xi,n ≤ xi | Uj = uj , j = 1, . . . , k} =
αi (ui , xi )
αi (ui , R)
(2)
with i = 1, . . . , k, n = 1, 2, . . ., xi ∈ R and αi (ui , xi ) is a short form for αi (ui , (−∞, xi ]),
P
(
X1,n1 +1 ∈ B1,j1 , . . . , Xk,nk +1
)
k
\ (n )
Eni,1i ,...,ni,m , U1 = u1 , . . . , Uk = uk =
∈ Bk,jk i
i=1
=
k Y
i=1
ni
αi (ui , R) αi (ui , Bi,ji ) ni,ji
+
ni + αi (ui , R) αi (ui , R)
ni ni + αi (ui , R)
. (2 bis)
The form of these probabilities is clear enough to reveal their inspiring principle. Conditional predictive distributions, given U1 , . . . , Uk are assessed as linear convex combinations
of a true p.d. and of an empirical distribution. Moreover, the peculiar features of (2)(2 bis), combined with the exchangeability of each row2 , imply – through a result stated
in [10] – that the conditional p.d. of (P1 , . . . , Pk ), given (U1 , . . . , Uk ) can be expressed
as product of k functional Dirichlet distributions with parameters α1 (U1 , · ), . . . , αk (Uk , · )
respectively. Hence, the p.d. of (P1 , . . . , Pk ) is a mixture of these products. As a result,
partial exchangeability3 and (2)-(2 bis) yield the p.d. of Definition 1. Conversely, that
distribution and partial exchangeability together yield (2) and (2 bis), as one can easily
verify using a natural adaptation of de Finetti’s representation theorem.
Now we give two propositions, useful to clarify how the processes in Definition 1 satisfy
the first requirement pointed out at the beginning of this section. In particular, the latter
highlights how (1) makes it possible to select functions Pi close – according to a meaning
we will make precise later – to preassigned measures Qi , for each i = 1, 2, . . . , k. Indeed,
the former gives conditions to be satisfied by αi (ui , ·) and ϕ(·) in order to ensure that the
2
In point of fact, one can prove that this condition is redundant. See, for example, Fortini, Ladelli
and Regazzini (2000). Exchangeability, predictive distributions and parametric models. Sankhyā (A) 62
86–109.
3
See comment in the previous footnote.
6
realziations of the Pi ’s have a support consistent with the requirements of the statistical
problem under analysis.
Proposition 1 Let Ai,ji be subsets of R, for every i = 1, . . . , k, such that
Z
dϕ(u) = 1,
(i)
Θ
with Θ := {u ∈ Rk : ∩k1 {αi (ui , Ai,ji ) = 0} }. Then
k
\
P
!
{Pi (Ai,ji ) = 0}
i=1
= 1.
Proof. By Definition 1 and (i) it follows
! Z k
k
Y
\
Di (yi,ji | 0, αi (ui , Āi,ji )) dϕ(u),
{Pi (Ai,ji ) ≤ yi,ji } =
P
Θ i=1
i=1
being Āi,ji ≡ ℜ − Ai,ji . Since, for every i, (Pi (Ai,ji )|ui ) is degenerate on 0, because
αi (ui , Ai,ji ) = 0 (ui is the i-th component of the vector u ∈ Θ), the thesis follows immediately.
Proposition 2 Given k normalized measures Q1 , . . . , Qk , each defined on (R, B), and k
classes of measurable subsets of R {(Ai,1 , . . . , Ai,mi ), i = 1, . . . , k}, let Θ1 be the subset
of Rk including the vectors u corresponding to which every Qi ( · ) is absolutely continuous
with respect to αi (ui , · ), i.e.:
n
o
Θ1 ≡ u ∈ Rk : Qi ( · ) ≪ αi (ui , ·), i = 1, . . . , k .
Then, if condition
Z
(ii)
dϕ(u) = 1
Θ1
is satisfied, for any vector ε ≡ (ε1 , . . . , εk ) with positive components,


mi
k \

\
(|Pi (Ai,ji ) − Qi (Ai,ji )| < εi ) > 0.
P


i=1 ji =1
Proof. Using the above condition (ii) and Definition 1 we have

P
mi
k \
\
i=1 ji =1

{|Pi (Ai,ji ) − Qi (Ai,ji )| < εi } =
=
Z
k
Y
Θ1 i=1

P
mi
\
ji =1

{|Pi (Ai,ji ) − Qi (Ai,ji )| < εi } U = u dϕ(u).
7
From Proposition 3 in Ferguson (1973), p. 215, it follows


mi
\
{|Pi (Ai,ji ) − Qi (Ai,ji )| < εi } u = γi > 0
P
(u ∈ Θ1 )
ji =1
and this completes the proof.
Now we state a result, whose proof is straightforward, useful to highlight some elementary descriptive aspects of the random process under investigation.
Proposition 3 If (P1 , . . . , Pk ) is a mixture of products of Dirichlet processes according to
(1), then
n
o
i −1
P
∩jmi =1
(Pi (Bi,ji ≤ yi,ji )
=
(a)
Z
D(yi,1 , . . . , yi,mi −1 |αi (ui , Bi,ji ), . . . , αi (ui , Bi,jmi ))dϕi (ui ),
=
R
ϕi being the i–th marginal of ϕ;
(b)
(c)
αi (ui , Bi,ji )
dϕi (ui );
R αi (ui , R)
Z
αi (ui , Bi,ji ) αq (uq , Bq,jq )
dϕi,q (ui , uq )
E(Pi (Bi,ji )Pq (Bq,jq )) =
αq (uq , R)
R2 αi (ui , R)
E(Pi (Bi,ji )) =
Z
(i 6= q),
where ϕi,q denotes a bivariate marginal of ϕ.
So far, we have mentioned observations (i.e., statistical data) without giving a formal
definition of sample. This definition turns out to be useful since it allows a faster statement
of the concepts we shall investigate. Of course, it must be consistent with the hypothesis
of partial exchangeability. In order to guarantee this coherence one can go back to the
already cited de Finetti’s representation theorem, according to which
k infinite sequences ({Xi,n }n=1,2,... , i = 1, . . . , k) of random numbers have a distribution satisfying the hypothesis of partial exchangeability if and only if, conditionally on a
vector of random measures (P1 , . . . , Pk ), the random numbers are mutually stochastically
independent and the elements of the i–th sequence have common conditional p.d. Pi for
i = 1, . . . , k.
If partial exchangeability condition is in force, then for any pair of classes of subsets
of R:
{Ci,ji }
and
{Ai,ti }
for ji = 1, . . . , ni , ti = 1, . . . , mi and i = 1, . . . , k, one has


ni
k \
\
P
{Xi,ji ∈ Ci,ji } Pi (Ai,ti ), Pi (Ci,ji ); ji = 1, . . . , ni ; ti = 1, . . . , mi ; u =
i=1 ji =1
=
ni
k Y
Y
i=1 ji =1
8
Pi (Ci,ji ) (3)
This justifies the following
Definition 2 X = {X1,n1 , . . . , Xk,nk }, with Xi,ni ≡ {Xi,ji }ji =1,...,ni is a sample of size
(n1 , . . . , nk ) drawn from (P1 , . . . , Pk ) if for each (n1 , . . . , nk ), (m1 , . . . , mk ) and for each
pair of families of measurable sets
({Ci,ji }ji =1,...,ni ; i = 1, . . . , k)
({Ai,ti }ti =1,...,mi ; i = 1, . . . , k)
condition (3) is verified.
In this context, (b) and (c) of Proposition 3 represent
P ({Xi,q ∈ Bi,ji })
q = 1, 2, . . . ,
and
P {Xi,c ∈ Bi,ji } ∩ {Xq,d ∈ Bq,jq }
c, d = 1, 2, . . . ,
respectively.
A question, pertaining to Definition 1, not yet answered concerns how to choose ϕ(u).
Now we are able to give some hints about this problem too. For instance, as for the i–th
marginal ϕi of ϕ, it can be observed that
Z
αi (ui , xi )
P ({Xi,q ≤ xi }) =
dϕi (ui ) = Fi (xi )
R αi (ui , R)
so that, given Fi (xi ) and αi (ui , xi ), if sufficient conditions for the identifiability of the
mixtures are satisfied, ϕi (·) may be determined. In any case, if these identifiability conditions are not satisfied, knowledge of Fi and αi can anyway yield useful information for
choosing ϕi . Once again, the circularity between probability evaluations marks limits for
the assessment of probability laws of other non–observable random elements such as, for
example, the vector (U1 , . . . , Uk ). Indeed, limits might follow from the more well–grounded
assignment of p.d.’s for observable elements. To illustrate this aspect, since
! Z
k
k
Y
\
αi (ui , xi )
{Xi,ji ≤ xi } =
P
dϕ(u)
αi (ui , R)
Rk
i=1
i=1
holds true for every xi , we note that if we are in a position to precise the strength of the
“correlation” existing between elements of different sequences, then we can gather useful
information regarding the choice of a distinguished distribution ϕ within the class of all
k–dimensional p.d.’s with marginals agreeing with the conditions cited above for the ϕi ’s.
9
3
Posterior distribution of (P1 , . . . , Pk ) and predictive distribution.
In this section we will provide the posterior distribution of (P1 , . . . , Pk ), i.e. the distribution of
(P1 , . . . , Pk | X = x)
where (P1 , . . . , Pk ) has the same p.d. as given in Definition 1 and X is a sample from it
according to Definition 2.
Under the assumptions stated in the previous section, we can say (see Ferguson, 1973)
that the p.d. of
(P1 , . . . , Pk | X = x, u)
is a mixture of products of functional Dirichlet distributions with parameters
αi (ui , Bi,ji ) + ni,ji
i = 1, 2, . . . , k;
ji = 1, 2, . . . , mi
(4)
where ni (Bi,ji ) = ni,ji is the number of the elements of Xi,ni taking values in Bi,ji . Denoting with ϕ(u|x) the conditional distribution function of (U1 , . . . , Uk ) given a realization of
the sample, it is straightforward to show the following
Proposition 4 Let (P1 , . . . , Pk ) be a mixture of products of Dirichlet processes according
to Definition 1. Let X be a sample drawn from such a process as in Definition 2. Then
the p.d. of (P1 , . . . , Pk | x) is a mixture of products of functional Dirichlet distributions
with parameters given by (4) and weight ϕ(u | x).
As a consequence, the prior distributions (1) are updated as follows

P
k m\
i −1
\
i=1 ji =1
Z
k
Y
Rk i=1

{Pi (Bi,ji ) ≤ yi,ji } x =
Di (yi,1 , . . . , yi,mi −1 | αi (ui , Bi,1 ) + ni,1 , . . . , αi (ui , Bi,mi ) + ni,mi ) dϕ(u | x).
In order to determine the expression of ϕ(u | x), we can resort to a slight modification
of a result by Antoniak (1974, Lemma 1 pag. 1164). Suppose that µi , (i = 1, . . . , k),
is a measure which is equal to the Lebesgue measure on R except at the points where
αi (ui , ·) has an atom and that, on these points, µi concentrates a unit mass. Let αi (ui , ·)
Q
be absolutely continuous with respect to µi , for each i. Moreover, set µ = kj=1 µj .
10
In general, to each vector xi,ni one can associate its distinct values x∗ i, 1, . . . , x∗i,ri and
Pri
the corresponding frequencies ni,ji 4 . Accordingly, for each i one has
ji =1 ni,ji = ni .
Then, the above–mentioned result allows us to write
Qk
Qri
1
′
i=1 αi (ui ,R)(n )
ji =1 αi (ui , xi,ji )(mi (ui , xi,ji ) + 1)(ni,ji −1) dµ ϕ(u)
i
dµ ϕ(u|x) = R Qk
Qri
1
′
i=1 αi (ui ,R)
ji =1 αi (ui , xi,ji )(mi (ui , xi,ji ) + 1)(ni,ji −1) dµ ϕ(u)
Rk
(ni )
Q
where a(n) is the Pochhammer symbol, namely a(n) = nj=1 (a + j − 1) for each n ≥ 1
and a(0) ≡ 1. Furthermore, α′i (ui , · ) is the Radon-Nikodym derivative of αi (ui , · ) with
respect to µi
(
α′i (ui , xi,ji ) if xi,ji is atom of αi (ui , ·)
mi (ui , xi,ji ) =
0
elsewhere.
We can now provide the predictive distribution
P {X1,n1 +t1 ≤ x1 , . . . , Xk,nk +tk ≤ xk | x} = E
Y
k
i=1
=
Z
k Y
Rk i=1
Pi ((−∞, xi ]) x =
αi (ui , xi )
ni ((−∞, xi ])
ni
αi (ui , R)
+
αi (ui , R) + ni αi (ui , R) αi (ui , R) + ni
ni
dµ ϕ(u|x),
tj = 1, 2, . . .; j = 1, 2, . . . , k.
Often statistical data are in the form of grouped data, i.e. they are numbers of
occurrences in each interval of a known partition of R. In these cases, if we denote with
Bi,ji such intervals for the i-th group of observations, with intervals arranged from the left
to the right as ji varies from 1 to mi , the predictive distribution is given by
P
)
\
k (n )
Eni,1i ,...,ni,m =
X1,n1 +t1 ≤ x1 , . . . Xk,nk +tk ≤ xk i
i=1
!
!
\
\
Z
k
k
\
k (n )
(n
)
i
i
(5)
dϕ u
Eni,1 ,...,ni,m
Eni,1 ,...,ni,m
{Xi,ni +ti ≤ xi } u,
P
=
i
i
k
(
R
i=1
i=1
i=1
For its computation notice that, from the hypothesis of partial exchangeability and Definition 1, we have
( k
)
ni,m
Z 1 mY
m
k Z 1
i −1
i −1
X
[
Y
i
ni,j
ni,j
(ni )
yi,ji i
···
1−
yi,ji i
Eni,1 ,...,ni,m u =
×
P
i=1
i
i=1
0
0
ji =1
ji =1
× dDi (yi,1 , . . . , yi,mi −1 |αi (ui , Bi,1 ), . . . , αi (ui , Bi,mi ))
Qmi
k
Y
ji =1 αi (ui , Bi,ji )(ni,ji )
=
,
(αi (ui , R))(ni )
i=1
4
By virtue of partial exchangeability, the consideration of the distinct values and respective frequencies,
in place of xi,ni does not cause any loss of information.
11
where ni =
Pmi
ji =1 ni,ji
for each i = 1, . . . , k. Hence,
Qk
!
k
\ (n )
i=1
dϕ u Eni,1i ,...,ni,m = R Qk
i
Rk
i=1
Qmi
1
αi (ui ,R)(ni )
ji =1 αi (ui , Bi,ji )(ni,ji )
1
i=1 αi (ui ,R)(n )
i
and
P
k
\
{Xi,ni +ti
!
\
k
En(ni,1i ),...,ni,m
≤ xi } u,
i
Qmi
ji =1 αi (ui , Bi,ji )(ni,ji ) dϕ(u)
.
i=1
i=1
k
Y
=
P
i=1
(
Xi,ni +ti
ni
αi (ui ,R)+ni
Pj−1
ji =1
ni,ji +ni,j
αi (ui ,Bi,j ∩(−∞,xi ])
αi (ui ,Bi,j )
ni
+
)
k
\
(n
)
Eni,1i ,...,ni,m .
≤ xi ui ,
i
i=1
Now, a result in Regazzini (1978, Sec. 3.a) yields
)
(
k
\
(n
)
Eni,1i ,...,ni,m =
P Xi,ni +ti ≤ xi ui ,
i
i=1

αi (ui ,xi )
ni,1 α (u


αi (ui ,xi )
ni
i i ,Bi,1 )
i ,R)

+ αiα(ui (u
,

ni
 αi (ui ,R)+ni
i ,R)+ni αi (ui ,R)





dϕ(u)
xi ∈ Bi,1
αi (ui ,R) αi (ui ,xi )
αi (ui ,R)+ni αi (ui ,R) ,
xi ∈ Bi,j , (j ≥ 2).
Thus, we have completed the analysis of the elements needed to compute formula (5).
We close this section with two illustrative examples arising from two specific forms for
αi (ui , ·).
Example 1. Assume
αi (ui , Bi,ji ) = αi (Bi,ji ) + ρi δui (Bi,ji ),
ui ∈ IR, i = 1, . . . , k; ji = 1, . . . , mi ;
Then

P
k m\
i −1
\
i=1 ji =1

{Pi (Bi,ji ) ≤ yi,ji } =
×
k
Y
m1
X
j1 =1
...
mk
X
jk =1
H
δui (Bi,ji ) =
(
Y
k
×
i=1
Bi,ji
0 ui ∈
/ Bi,ji
1 ui ∈ Bi,ji .
D(yi,1 , . . . , yi,mi −1 |αi (Bi,1 ) + ρi δ1,ji , . . . , αi (Bi,mi ) + ρi δmi ,ji ),
i=1
where H is the probability measure on Rk induced by ϕ(u) and the ρ′i s are non negative
constants for each i, and
(
0 jc ∈
/r
δr,jc =
1 jc ∈ r.
12
If ϕ(k) (u) and α′i (·) denote the density function respectively of ϕ(u) and αi (·), we obtain
dµ ϕ(u|x) =
Qk Qri
′
(k)
i=1
ji =1 (αi (xi,ji )(1 − δui ,xi,ji ) + ρi δui ,xi,ji )(1 + ρi δui ,xi,ji )(ni,ji −1) ϕ (u)dµ
,
R Qk Qri
′
(k)
i=1
ji =1 (αi (xi,ji )(1 − δui ,xi,ji ) + ρi δui ,xi,ji )(1 + ρi δui ,xi,ji )(ni,ji −1) ϕ (u)dµ
Rk
where µ is the product of k measures whose i-th component, µi , coincides with the
Lebesgue measure except at the points xi,ji (ji = 1, . . . , ri ) at each of which it concentrates
unit mass.
Example 2. Assume that the functions αi (ui , x) and ϕ(u) are absolutely continuous with
respect to the Lebesgue measure. Then
Qk
1
i=1 αi (ui ,R)
dϕ(u|x) = R Rk Q
k
Qri
1
i=1 αi (ui ,R)
′
ji =1 αi (ui , xi,ji )(ni,ji
Qri
− 1)!ϕ(k) (u)du
′
ji =1 αi (ui , xi,ji )(ni,ji
− 1)!ϕ(k) (u)du
.
Finally, it is relevant to stress the possibility of extending this framework to more
general probability spaces than those analyzed here. In any case it is worth that one
confines himself to considering standard Borel spaces if one wants to obtain results similar
to those of Proposition 4. (See Antoniak, 1974).
4
Prior distribution of a vector of associative means.
As recalled in the Introduction, in a previous paper we tried to clarify the usefulness
of associative means in nonparametric problems. In our opinion, those considerations
maintain their validity also in a context of partial exchangeability. In this Section, we set
Fi (x) := Pi ((−∞, x]) and study the prior distribution of the vector of associative means
Z +∞
Z +∞
−1
−1
ψk (x)dFk (x) ,
(6)
ψ1 (x)dF1 (x) , . . . , ψk
ψ1
−∞
−∞
which is a random vector since the probability distributions P1 , . . . , Pk are random according to Definition 1. In the sequel we will proceed through the following intermediate
steps:
- evaluation of the distribution of each component in (6);
- evaluation of the joint distribution when ψi (x) = x, for any i;
- evaluation of the joint distribution with arbitrary functions ψi .
First, we will assume the following
Condition C
13
(i) The functions ψi , besides being strictly increasing, are continuous on [0, +∞) with
ψi (0) = 0;
(ii) αi (u, x) = 0, for any negative x, for each i = 1, . . . , k and u in R;
(iii) The functions αi (u, x) are absolutely continuous (with respect to the Lebesgue measure), for any u ∈ R.
Notice that point (ii) of the Condition C, together with Proposition 1, implies
P ({Fi (x) = 0,
∀x < 0}) = 1
for every i
so that, in order to determine the distribution function (d.f. for short) of the random
vector (6), it is sufficient to study the vector
Z +∞
−1
ψj (x)dF1 (x) , j = 1, . . . , k .
ψj
0
4.1
The marginal distribution of vector (6) i-th component.
After considering the r.n.:
Z T
{1 − Fi (t)} dt,
0 ≤ τ < T < +∞,
(7)
τ
we will first look for assumptions that allows us the results in Cifarelli and Regazzini
(1979) (part 2, number 2.2). In the first place, since (by assumption C):
Z +∞
Z +∞
a.s.
{1 − Fi (t)} dt,
(8)
t dFi (t) =
0
0
we must assign a condition in order that the d.f. of the r.n. (8) can be obtained from the
one in (7), with τ = 0, by taking the limit as T → +∞. To this end we observe that, by
point b) in Proposition 3, one has
Z T Z Z T
Z S
αi (u, t)
dϕi (u) dt ,
(1 − Fi (t)) dt = (1 − Fi (t)) dt −
E τ
τ
S
R αi (u, R)
for every T, S > τ ≥ 0. If the condition
Z Z +∞
t dt αi (u, t) du ϕi (u) = E(Xi,j ) < +∞,
R
(9)
0
holds true, then
Z
lim
S→∞,T →∞ T
S
Z
R
αi (u, t)
dϕi (u) dt = 0.
αi (u, R)
Let us denote by σi,T (ξ, τ ) the d.f. of the r.n. (7) and by σi (ξ) the d.f. of the r.n. (8). If
(9) holds, then, as T → +∞
σi,T (ξ, 0) → σi (ξ)
(in a weak sense).
14
On the basis of the previous remarks, the conditional d.f. of the r.n. (7), given Ui = ui ,
coincides with the one deduced in the case of a Dirichlet process with parameter αi (u, ·).
Hereafter, we denote such a d.f. by σi,T (ξ, τ |ui ).
Now, if (9) is met we can write
Z
σi,T (ξ, 0|u) dϕi (u)
σi (ξ) = lim σi,T (ξ, 0) = lim
T →+∞
T →+∞ R
and, by the dominated convergence theorem
Z
lim σi,T (ξ, 0|u) dϕi (u).
σi (ξ) =
R T →+∞
We have seen (Lemma 7 of the 2nd part of our previous paper) that, if the conditiontion
Z +∞
log v dαi (u, v) < ∞
(10)
0
is satisfied, then it holds
lim σT =
T →+∞
Z
ξ
(ξ − t)αi (u,R)−1 µ′ (u, t) dt,
ξ≥0
(11)
0
with
µ′ (u, t) = e−
R +∞
0
log |v−t| dαi (u,v)
sin παi (u, t)
.
π
If (10) holds true for u ∈ Θ ⊆ R with
Z
dϕi (u) = 1
(10 bis)
Θ
then
σi (ξ) =
Z
Θ
σi∗ (ξ|u) dϕi (u),
ξ≥0
where σi∗ (ξ|u) coincides with the right hand side of (11).
We are now able to state the following
Theorem 1 Let {P1 , . . . , Pk } be a process defined according to Definition 1 and whose
parameters are such that Condition C, (9), (10) and (10 bis) hold true. Then, the d.f.
σi (ξ) of the functional
Z
Z
+∞
+∞
a.s.
t dFi (t)
t dFi (t) =
0
−∞
is given by
σi (ξ) =
 Z

σi∗ (ξ|u) dϕi (u)
ξ≥0
Θ

0
where σi∗ (ξ|u) is the right–hand–side of equation (11).
15
ξ<0
Theorem 1 can be exploited in order to determine the d.f. of the r.n.
Z
−1
ψi (x) dFi (x) .
ψi
R
To this end, we point out a preliminary remark and accordingly modify assumptions (9),
(10), (10 bis).
In view of Condition C one has
Z +∞
Z
−1
ψi (x) dFi (x) =
1 − Fi (ψi−1 (t)) dt
ψi
0
R
and the distribution of
1 − Fi (ψi−1 (t1 )), . . . , 1 − Fi (ψi−1 (tk ))
t1 ≤ · · · ≤ tk
is a mixture of Dirichlet distribution directed by the parameters
αi (ψi−1 (·))
(see Cifarelli and Regazzini, 1979, Section 1).
Consequently, we can state the following
Theorem 2 Let (P1 , . . . , Pk ) be as in Definition 1 and assume its parameters satisfy Condition C. Suppose the following assumptions hold
Z Z +∞
ψi (t) dαi (u, t) dϕi (u) < +∞;
(i)
R
(ii)
Z
0
+∞
log ψi (t) dαi (u, t) > −∞,
u ∈ Θ with
Z
dϕi (u) = 1.
Θ
0
Then the d.f. of the functional
ψi−1
Z
0
+∞
ψi (t) dFi (t)
is given by
σψi (ξ) = 1[0,∞)(ξ)
Z Z
R
where ζ(u, z) :=
R +∞
0
ξ
(ψi (ξ) − ψi (z))αi (u,R)−1 ×
0
sin παi (u, z) −ζ(u,z)
×
e
dψi (z) dϕi (u)
π
(12)
log |ψi (s) − ψi (z)| dαi (u, s)
Proof. See Theorem 3 part 2 of Cifarelli and Regazzini (1979) together with the
arguments developed to deduce Theorem 1.
16
4.2
The distribution of a vector of k associative means.
The arguments developed so far, combined with the structure of the random process of
Definition 1, allow us to deduce in a simple way the joint distribution of the vector (6).
We will proceed under assumptions: C, (9), (10) and (10 bis) as in the previous section.
To simplify the treatment, we will first consider the case
ψi (x) = x
i = 1, . . . , k.
Now, assumption (9) has an interesting and useful implication. Since
verges, in mean, of order 1, to the r.n.
Z +∞
x dFi (x)
i = 1, . . . , k,
RT
0
x dFi (x) con-
0
then, for every vector (λ1 , . . . , λk ) of real numbers, we will have
k
X
i=1
λi
Z
T
L
x dFi (x) →
0
k
X
i=1
λi
Z
+∞
x dFi (x)
T → +∞,
0
L
where → denotes convergence in law. On the other hand, the above equation implies (see
C.R. Rao, 1975), (xi) page 123) that
Z +∞
Z T
Z +∞
Z T
L
x dFk (x) .
(13)
x dF1 (x), . . . ,
x dFk (x) →
x dF1 (x), . . . ,
0
0
0
0
Denoting by σ(ξ1 , . . . , ξk ) the d.f. of the vector (6) with ψi (x) = x, i = 1, . . . , k and
taking into account Definition 1, we will have
σ(ξ1 , . . . , ξk ) =
Z
Rk
k
Y
σi∗ (ξi |ui ) dϕ(u),
(ξi ≥ 0, ∀i).
i=1
Analogously, if assumptions (i) and (ii) of Theorem 2 hold for every i, the d.f. of (6),
without restrictions on the ψi ’s, will satisfy, for every ξ1 , . . . , ξk ≥ 0
σψ1 ,...,ψk (ξ1 , . . . , ξk ) =
Z
Rk
Z
× exp −
0
+∞
k Z
Y
i=1
ξi
(ψi (ξi ) − ψi (z))αi (ui ,R)−1 ×
0
sin παi (ui , z)
log |ψi (s) − ψ(z)| dαi (ui , s)
dψi (z) dϕ(u)
π
The following theorem makes precise the above results.
Theorem 3 Let {P1 , . . . , Pk } be a process defined according to Definition 1 and whose
parameters satisfy condition C. Suppose that the following assumptions hold:
Z Z +∞
ψi (t) dαi (u, t) dϕi (u) < +∞;
i = 1, . . . , k
(i)
R
0
17
(ii)
with
R
Z
Qk
1
+∞
log ψi (t) dαi (u, t) > −∞, ui ∈ Θi ⊆ R, i = 1, . . . , k,
0
Θi
dϕi (u) = 1. Then the d.f. of the random vector
ψi−1
Z
+∞
0
ψi (t) dFi (t) , i = 1, . . . , k
is given by
σψ1 ,...,ψk (ξ1 , . . . , ξk ) =
where ζi (ui , z) :=
5
R +∞
0
Z
Rk
k
Y
i=1
1[0,∞)(ξi )
Z
ξi
(ψi (ξi ) − ψi (z))αi (ui ,R)−1 ×
0
sin παi (ui , z) −ζi (ui ,z)
×
e
dψi (z) dϕ(u) (12 bis)
π
log |ψi (s) − ψi (z)| dαi (ui , s) for each i = 1, . . . , k.
An example of processes with non-absolutely continuous
parameters αi (ui, x).
We will devote this section to deduce the d.f of the vector (6) in the case where the process
is defined as in Example 1 of Section 3. Points (i) and (ii) of condition C are still valid,
whereas it is obvious that point (iii) of the same condition cannot hold true in this case.
If the function αi ( · ) in Example 1 is absolutely continuous, then the d.f. αi (ui , x) turns
out to be absolutely continuous as well, for every x, except at the point x = ui where it
has an atom equal to ρi .
The condition arising from the combination of points (i), (ii) of C with the absolutely
continuity of the d.f. αi (·) of Example 1 in Section 3, will be designated by C’.
The discontinuity of αi (ui , ·) does not allow us to apply directly the results in the 2nd
part of Cifarelli and Regazzini (1979).
We first deal with the problem of evaluating the d.f. of the i–the component of vector
(6), with ψi (x) = x. Moreover, it is assumed that, in addition to condition C’, (9), (10)
and (10 bis) hold true. The same arguments employed to deduce (11) of Cifarelli and
Regazzini (1979) part 2, lead to
Z +∞
Z +∞
Z +∞
dσi (ξ)
dϕi (v)
log(s + v) dαi (v)
= exp −
, s>0
(14)
α
+ρ
i
i
(s + ξ)
(s + v)ρi
0
0
0
where αi ≡ αi (IR). After denoting by L{f (x)} the Laplace transform of f (x), (14) yields
αi +ρi −1 Z +∞
Z +∞
x
dσi (ξ)
−ξx
=
L
e
dσ
(ξ)
;
i
(s + ξ)αi +ρi
Γ(αi + ρi ) 0
0
18
Theorem 1 of the 2nd part of our previous paper gives
Z +∞
Z
exp −
log(s + v) dαi (v) =
0
with
+∞
0
dσi∗ (t)
(s + t)αi

Z t

∗

(t − ξ)αi −1 µ′ (ξ) dξ
 σi (t) =
0 Z
+∞
sin παi (t)

′

log |v − t| dαi (v)
.
 µ (t) = exp −
π
0
Therefore
Z
+∞
0
dσi (ξ)
=
(s + ξ)αi +ρi
Z
+∞
0
dσi∗ (t)
L
(s + t)αi
xρi −1
Γ(ρi )
Z
+∞
−xv
e
0
dϕ(v)
and
L
xαi +ρi −1
Γ(αi + ρi )
Z
+∞
−ξx
e
0
dσi (ξ) =
αi −1 Z +∞
ρi −1 Z +∞
x
x
−ξx
∗
−xv
L
e
dσi (ξ) L
e
dϕ(v) .
Γ(αi ) 0
Γ(ρi ) 0
¿From last equation it follows that the Laplace-Stieltjes transform LS {σi (s)} of σi (·) is
given by:
Z +∞ Z +∞
Z 1 αi −1
z
(1 − z)ρi −1
−x(ξz+v(1−z))
∗
e
dσi (ξ) dϕ(v) dz.
LS (σi (ξ)) =
B(αi , ρi )
0
0
0
The expression in round brackets on the right–hand side of the above equation is the
Laplace-Stieltjes transform of the d.f. of the r.n. zX1 + (1 − z)X2 , with z in (0, 1), X1 and
X2 being independent r.n.’s with whose d.f.’s are σi∗ (·) and ϕ(·), respectively. Therefore,
for ξ ≥ 0, one has
!
Z ξ/z Z 1 αi −1
ξ
z
(1 − z)ρi −1
zx1
ϕ
−
σi (ξ) =
dσi∗ (x1 ) dz
ρi > 0,
B(αi , ρi )
1−z 1−z
0
0
σi (ξ) = σi∗ (ξ)
ρi = 0.
Analogously, if condition C’ and the hypotheses (i) and (ii) of Theorem 2 are in force, the
R +∞
d.f. of the r.n. ψi−1 ( −∞ ψi (x) dFi (x)) can be expressed as

 0
R
σψi (ξ) =
 01
ξ<0
z αi −1 (1−z)ρi −1
B(αi ,ρi )
when ρi > 0 and as
σψi (ξ) =
σψ∗ i (ξ)
=
(
0
Rξ
R ψi−1 ( ψ(ξ)
)
z
0
sin παi (z)
π
0
ϕ ψi−1
[ψi (ξ) −
19
ψi (ξ)−zψi (v)
1−z
∗
dσψi (v) dz,
ψi (z)]αi −1 e−ζi (z) dψi (z),
ξ<0
ξ ≥ 0.
ξ≥0
R +∞
when ρi = 0, where we have set ζi (z) := 0 log |ψi (s) − ψi (z)| dαi (s).
The expression defining σψi (ξ) for ρi > 0 is the d.f. of
Vi = ψi−1 (zi ψi (Mψi ) + (1 − zi )ψi (Xi ))
zi ∈ (0, 1)
where Mψi and Xi are independent r.n’s. with d.f.’s given by σi∗ (·) and ϕ(·), respectively,
and σψ∗ i (·) is the d.f. of the associative mean characterized by ψi on the basis of Theorem 3
of our already cited article Cifarelli and Regazzini (1979).
As a consequence, σψi (·) is a mixture of such distributions if Zi has a Beta distribution
with parameters (αi , ρi ) and ρi > 0. On the other hand, when ρi = 0 the Beta distribution
degenerates at 1 and σψi (·) reduces to the d.f. of Mψi .
Also note that Vi can be seen as an associative mean. Finally, let us analyze the d.f.
of vector (6) under the particular process we are considering.
Under conditions C’, (i) and (ii) of Theorem 2 for any i = 1, . . . , k, if ρi > 0, for each
i, then
σψ1 ,...,ψk (ξ1 , . . . , ξk ) =
k
Y
1[0,∞)(ξi )
i=1
×
(Z
ψ1−1 (
ψ1 (ξ1 )
)
z1
0
Z
1
···
0
···
Z
Z
1
0
ψk−1 (
k
Y
z αi −1 (1 − zi )ρi −1
i
B(αi , ρi )
i
ψk (ξk )
)
zk
g(ξ, u, z) d
0
k
Y
!
×
)
σψ∗ i (ui )
i
k
Y
dzi (15)
i=1
where
ψ1 (ξ1 ) − z1 ψ1 (u1 )
ψk (ξk ) − zk ψk (uk )
g(ξ, u, z) = ϕ ψ1−1
, . . . , ψk−1
1 − z1
1 − zk
If some ρi is equal to 0, the previous formula is still valid provided that the corresponding
Zi degenerates at 1. In this case, ϕ(·) has to be replaced by the marginal distribution of
the components corresponding to the ρ′i s equal to 0.
Equation (15) also admits an interpretation similar to that of σψi . Consider the vector
V = (V1 , . . . , Vk ) whose generic component has been previously defined. Let the random vectors M ≡ {Mψ1 , . . . , Mψk } and W ≡ {X1 , . . . , Xk } be independent with d.f.’s
Qk ∗
1 σψi (ξi ) and ϕ(x1 , . . . , xk ), respectively. Then the expression in curly brackets in (15)
is the d.f. of V and σψ (u1 , . . . , uk ) is a mixture of such d.f. when (Z1 , . . . , Zk ) has density
function on [0, 1]k coinciding with
k
Y
z αi −1 (1 − zi )ρi −1
i
1
B(αi , ρi )
with the proviso that when ρi = 0 the corresponding Zi degenerates at 1.
20
6
The posterior distribution of vector (6) of the associative
means
As in Cifarelli and Regazzini (1979) we shall address the problem by suggesting a recursive
formula. We prefer such an approach to the one which aims at deriving the explicit
expression of the conditional d.f.’s that will occur throughout.
We shall confine ourselves to the case where data arise from a sample defined as in
Definition 2. We shall, then, exploit the result stated in Proposition 4. Note that the
parameters of the process in Proposition 4 reduce to
βi (ui , x) := βi (ui , (−∞, x]) = αi (ui , x) + ni ((−∞, x])
i = 1, . . . , k,
where ni ((−∞, x]) is the number of observations belonging to the i–th group that are not
greater than x. Hence, unlike Section 4, we now have to take into account the fact that
βi (ui , x) is not continuous, its jumps being located at the points corresponding to the
realizations of the i–th subsample.
Under the conditions recalled in the previous sections, the structure of the conditional
process (P1 , . . . , Pk | X) makes it possible to determine the d.f. σψ1 ,...,ψk ( · | X) of the r.n.
Z +∞
Z +∞
−1
−1
ψ1 (x) dF1 (x) , . . . , ψk
ψ1
ψk (x) dFk (x) | X
(16)
−∞
−∞
and it will be of the form
σψ1 ,...,ψk (ξ | x) =
Z Y
k
R 1
σψ∗ i (ξi | xi,ni ) dϕ(u|x),
ξi ≥ 0,
i = 1, . . . , k
where σψ∗ i (ξi |xi,ni ) is the d.f. whose density can be recursively obtained from equation
(23) in the 2nd part of Cifarelli and Regazzini (1979). It will suffice to replace, in that
equation, αi (·) with αi (u, ·), x with the subsample xi,ni and ψ with ψi .
Similar changes in (15) make it possible to evaluate the d.f. of the random variable
(16) when considering the process described in the Example 1.
The previous formulae reveal that the impact the observations of a given group have
on the d.f. of the associative mean of another group depends on the kind of stochastic
dependence characterizing ϕ(·).
7
Concluding remarks
In conclusion, we wish to mention two possible developments of the analysis that might
be worth of investigation.
First of all, it would be interesting to study the main features of alternative processes,
different from the one defined by (1), that can be used as prior distributions under the
assumption of partial exchangeability for the observables.
21
Secondly, it should be underlined that the knowledge of the distributions of the associative means is not enough to solve several nonparametric problems where the analysis of
other kinds of functionals is of primary importance. One can refer, for instance, to functionals that are used in Classical Statistics when addressing the nonparametric problem
known as the ”problem of two or more than two samples”.
These extensions of the present work would make it possible to resort to the Bayesian
approach in order to solve nonparametric problems.
Without such an effort, the approach we have been undertaking here, though theoretically well ground, would be very limited in practice since it would not be able to address
relevant problems to which the classical approach has been giving answers for several
years.
22
References
Antoniak, C.E. (1974). Mixture of Dirichlet processes with applications to bayesian nonparametric problems, Annals of Statistics, 2, 1152-1174.
Cifarelli, D.M. and Regazzini, E. (1979). Considerazioni generali sull’impostazione bayesiana
di problemi non parametrici. Le medie associative nel contesto del processo aleatorio
di Dirichlet. Rivista di Matematica per le Scienze Economiche e Sociali, part 1, 2,
39-52; part 2, 2, 95-111.
Daboni, L. (1972). Aspetti strutturali ed inferenziali di particolari processi di arrivo di
sinistri, Tavola rotonda: Problemi matematici ed economici ordieni sulle assicurazioni,
Accademia Nazionale dei Lincei, Roma, 1975, 55-68.
de Finetti, B. (1938). Sur la condition d’equivalence partielle, VI Colloque Genève: Actualités scientifiques et industrielles, n.739, 5-18; Herman Paris.
de Finetti, B. and Savage, L.J. (1962). Sul modo di scegliere le probabilità iniziali, Bibl.
d. Metron, 1, 81-157.
Ferguson, T.S. (1973). A bayesian analysis of some non-parametric problems, Annals of
Statistics, 1, 209-230.
Lindley, D.V. (1971). The estimation of many parameters, Foundations of Statistical
Inference, (V.P. Godambe and D.A. Sprott eds.), Holt, Rinehart and Winston-Toronto,
435-455.
Lindley, D.V. and Smith, A.F.M. (1972). Bayes estimates for the linear model, Journal of
the Royal Statistical Society, B, 34, 1-41.
Rao, C.R. (1975). Linear statistical inference and its applications, 2nd edition, J. Wiley
and Sons, New York.
Regazzini, E. (1978). Intorno ad alcune questioni relative alla definizione del premio
secondo la teoria della credibilità, Giornale dell’Istituto Italiano degli Attuari, 1-2 7789.
Wedlin, A. (1974a). Sui processi di arrivo di sinistri per rischi analoghi. Quaderni
dell’Istituto di Matematica Finanziaria Università di Trieste, n.12, A.A. 1973-74.
Wedlin, A. (1974b). Sul processo di danno accumulato, Quaderni dell’Istituto di Matematica Finanziaria Università di Trieste, n.13, A.A. 1973-74.
23