on convex sets of multivariate distributions and their extreme points

ON CONVEX SETS OF MULTIVARIATE DISTRIBUTIONS
AND THEIR EXTREME POINTS
C.R. Rao
Department of Statistics
Pennsylvania State University, USA
M. Bhaskara Rao
Department of Statistics
North Dakota State University, Fargo,USA
D.N. Shanbhag
Statistics Division, Department of Mathematical Sciences
University of Sheffield, UK
1.
Introduction
Suppose F1 and F2 are two n-variate distribution functions. It is clear that
any convex combination of F1 and F2 is again an n-variate distribution function.
More precisely, if 0 ≤ λ ≤ 1 , then the function λ F1 + (1- λ )F2 is also a
distribution function. A collection ℑ of n-variate distributions is said to be
convex if for any given distributions F1 and F2 in ℑ, every convex combination
of F1 and F2 is also a member of ℑ. An element F in a convex set ℑ is said to be
an extreme point of ℑ if there is no way we can find two distinct distributions F1
and F2 in ℑ and a number 0< λ <1 such that F= λ F1 + (1- λ )F2.
Verifying whether or not a collection of multivariate distributions is convex
is, usually, fairly easy. For example, the collection of all n-variate normal
distributions is not convex. On the other hand, the collection of all n-variate
distributions is convex. The difficult task is identifying all the extreme points of
a given convex set. However, for the set of all n-variate distributions, any
distribution which gives probability mass unity at a single point of the ndimensional euclidean space is an extreme point and every extreme point of the
set is one like this.
The notion of a convex set and its extreme points is very general. Under
some reasonable conditions, one can write every point in the convex set as a
mixture or a convex combination of its extreme points. Suppose
C = (x, y) ∈ R 2 ; 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 . The set C is, in fact, the rim and the
interior of the unit square in R2 with vertices (0,0), (0,1), (1,0), and (1,1) and is
indeed a convex subset of R2. The four vertices are precisely the extreme points
of the set C. If (x,y) is any given point of C, then we can find four non-negative
145
{
}
146
C.R. Rao, M. B. Rao & D.N. Shanbhag
numbers λ 1 , λ 2 , λ 3 , and λ 4 with sum equal to unity such that (x,y) = λ1 (0,0) +
λ 2 (1,0) + λ 3 (0,1) + λ 4 (1,1). In other words, the element (x,y) in C is a convex
combination of the extreme points of C.
In a similar vein, under reasonable conditions, every member of a given
convex set ℑ of distributions can be written as a mixture or convex combination
of the extreme points of ℑ. Depending on the circumstances, the mixture could
take the hue of an integral, i.e., a generalized convex combination! If the
number of extreme points is finite, the mixture is always a convex combination.
If the number of extreme points is infinite, mixtures can be expressed as an
integral.
As an example of integral mixtures, it is well-known that the distributions
of exchangeable sequences of random variables constitute a convex set and that
the distributions of independent identically distributed sequences of random
variables constitute the collection of all its extreme points. Thus one can write
the distribution of any sequence of exchangeable sequence of random variables
as an integral mixture of distributions of independent identically distributed
sequences of random variables. This representation has a bearing on certain
characterization problems in distribution theory. Radhakrishna Rao and
Shanbhag (1995) explore this phenomenon in their book.
There are some advantages in making an effort to enumerate all extreme
points of a given convex set ℑ of distributions. Suppose all the extreme points
of ℑ possess a particular property. Suppose the property is preserved under
mixtures. Then we can claim that every distribution in the set ℑ possesses the
same property. Sometimes, some computations involving a given distribution in
the set ℑ can be simplified using similar computations for the extreme point
distributions. We will see an instance of this phenomenon. The focus of this
paper is in three specific themes of distributions.
1. Positive Quadrant Dependent Distributions.
2. Distributions which are Totally Positive of Order Two.
3. Regression Dependence.
2.
Positive Quadrant Dependent Distributions
In this section, we look at two-dimensional distributions. Let X and Y be
two random variables with some joint distribution function F. The random
variables are said to be positive quadrant dependent or F is positive quadrant
dependent if P(X ≥ c, Y ≥ d ) ≥ P(X ≥ c)P(Y ≥ d) for all real numbers c and d.
If X and Y are independent, then, obviously, X and Y are positive quadrant
dependent. For various properties of positive quadrant dependence, see
Lehmann (1966) or Eaton (1982).
Let F1 and F2 be two fixed univariate distribution functions. Let M(F1, F2)
be the collection of all bivariate distribution functions F whose first marginal
distribution is F1 and the second marginal distribution is F2. The set M(F1, F2) is
a convex set and has been intensively studied in the literature. See Kellerer
On Convex Sets Of Multivariate…
147
(1984) and the references therein. See also Rachev and Ruschendorf (1998).
Let MPQD(F1,F2) be the collection of all bivariate distributions F whose first
marginal distribution is F1, the second marginal distribution is F2, and F is
positive quadrant dependent. It is not hard to show that the set MPQD(F1,F2) is
convex. See Bhaskara Rao, Krishnaiah, and Subramanyam (1987). We will
now focus on subsets of MPQD(F1,F2). Suppose F1 has support {1,2,…,m} with
probabilities p1,p2,…, pm and F2 has support {1,2,…,n} with probabilities
q1,q2,…,qn. The set MPQD(F1, F2) can be denoted simply by MPQD(p1,p2,…,pm,
q1,q2,…,qn). Every member of MPQD(p1,p2,…,pm,q1,q2,…,qn) can be identified as
a matrix P=(pij) of order m×n such that each entry pij is non-negative with row
sums equal to p1,p2,...,pm, column sums equal to q1,q2,…,qn, and the joint
distribution is positive quadrant dependent. The property of positive quadrant
dependence translates into a bunch of inequalities involving pij’s, pi’s, and qj’s.
Let us look at a simple example. Take m = 2 and n = 3. Let p1, p2, q1, q2, and q3
be five positive numbers given satisfying p1 + p2 = 1 = q1 + q2 + q3. Take any
matrix P=(pij) from MPQD(p1,p2,q1,q2,q3). The entries of P must satisfy the
following two inequalities.
p2q3 ≤ p23 ≤ p2 ∧ q3
(1)
and
(p2q2 + p2q3) ∨ p23 ≤ p22 + p23 ≤ p2 ∧ (q2 + p23),
(2)
where a ∨ b indicates the maximum of the numbers a and b and a ∧ b indicates
the minimum of the numbers a and b. Conversely, let p22 and p23 be any two
non-negative numbers satisfying the above inequalities. Construct the matrix
 q − p 2 + p 22 + p 23 q 2 − p 22 q 3 − p 23 

P =  1
p 2 − p 22 − p 23
p 23 
p 22

of order 2×3. One can show that P ∈ MPQD(p1,p2,q1,q2,q3). The implication of
this observation is that the numbers p22 and p23 determine whether or not the
matrix P constructed above belongs to MPQD(p1,p2,q1,q2,q3).
The two
inequalities (1) and (2) determine a simplex in the p22 – p23 plane. Let us look at
a very specific example: p1 = p2 = ½ and q1 = q2 = q3 = 1/3. The inequalities (1)
and (2) become
1/6 ≤ p23 ≤ 1/3
and
1/3 = p23 ∨ 1/3 ≤ p22 + p23 ≤ (1/2) ∧ (1/3 + p23) = ½.
There are four extreme points of the set MPQD(1/2, ½, 1/3, 1/3, 1/3) given by
1 / 6 1 / 6

1 / 6 1 / 6
1 / 3 0

 0 1/ 3
1/ 6
,
1 / 6 
1/ 3
.
1 / 6 
1 / 6 1 / 3 0 

,
1 / 6 0 1 / 3 
1 / 3 1 / 6 0 

, and
 0 1 / 6 1 / 3
C.R. Rao, M. B. Rao & D.N. Shanbhag
148
Every member P of MPQD(1/2,1/2,1/3,1/3,1/3) is a convex combination of these
four matrices. Even for moderate values of m and n, determining the extreme
points of MPQD(p1,p2,…,pm, q1,q2,…,qn) could become very laborious. A method
of determining the extreme points of MPQD(p1,p2,…,pm, q1,q2,…,qn) has been
outlined in Bhaskara Rao, Krishnaiah, and Subramanyam (1987).
3.
Distributions which are Totally Positive of Order Two
Let X and Y be two random variables each taking a finite number of values.
For simplicity, assume that X takes values 1,2,…,m and Y takes values 1,2,…,n.
Let p ij = Pr(X = i, Y = j), i = 1,2,…,m and j = 1,2,…,n. Let P = (pij) be the
matrix of order m×n, which gives the joint distribution of X and Y.
The random variables X and Y are said to be totally positive of order 2 (TP2)
(or, equivalently, P is said to be totally positive of order two) if the determinants
p i1 j1 p i1 j2
≥0
p i 2 j1 p i 2 j2
for all 1 ≤ i1 < i 2 ≤ m and 1 ≤ j1 < j2 ≤ n. In the literature, this notion also goes
by the name positive likelihood ratio dependence. See Lehmann (1966). For
some ramifications of this definition, see Barlow and Proschan (1972).
We now assume that m = 2. Let q1,q2,…,qn be n given positive numbers
with sum equal to unity. Let q = (q1,q2,…,qn). Let Mq(TP2) be the collection of
all matrices P of order 2×n with column sums equal to q1,q2,…,qn and P is
totally positive of order two. It is not hard to show that Mq(TP2) is convex. See
Subramanyam and Bhaskara Rao (1988). The extreme points of this convex sets
have been identified precisely. In fact, the convex set Mq(TP2) has n+1 extreme
0 0 0
points and are given by P0 = 
...  and
 q1 q 2 q n 
0 0
0
q q q
... , i = 1,2,…,n. The determination of extreme
Pi =  1 2 ... i ...
 0 0 0 q i +1 q i + 2 q n 
points in the general case of m×n matrices is still an open problem.
4.
Regression Dependence
Let X and Y be two random variables. Assume that X takes values 1,2,…,m
and Y takes values 1,2,…,n. Let p ij = Pr(X = i, Y = j), i = 1,2,…,m and j =
1,2,…,n. Let P = (pij). Say that Y is strongly positive regression dependent on
X (SPRD (Y X ) ) if PR (Y ≤ j X = i ) is non-increasing in i for each j. See
Lehmann (1966) and Barlow and Proschan (1981) for an exposition of this
notion. This notion also goes by the name that Y is stochastically increasing in
X. We now assume that m = 2. In this case, Y is strongly positive regression
On Convex Sets Of Multivariate…
149
dependent on X is equivalent to the notion that X and Y are positive quadrant
dependent. Let p1 and p2 be two given positive numbers with sum equal to
unity. Let MPQD(n,p1,p2) be the collection of all matrices P = ( pij ) of order
2×n such that the row sums are equal to p1 and p2 and P is positive quadrant
dependent. Equivalently, with respect to the joint distribution P, Y is strongly
positive regression dependent on X. We would like to report some new results
on the set MPQD(n,p1,p2).
Theorem 1: The set MPQD(n,p1,p2) is convex.
Theorem 2: The total number of extreme points of MPQD(n,p1,p2) is
n(n + 1) / 2 and these are given by
 p 1 0 0   p1 0 0 0   p 1 0 0 

... , 
... ,..., 
... ,
 p2 0 0  0 p2 0 0  0 0 p2 
 0 p1 0 0   0 p 1 0 

... ,..., 
... ,
0 p2 0 0 0 0 p2 
… … … …
 0 0 p1 
 ... .
0 0 p2 
Theorem 1 is not hard to establish. The proof of Theorem 2 is rather involved.
The details will appear elsewhere.
5.
An Application
Suppose X and Y are discrete random variables with unknown joint
distribution. Suppose we wish to test the null hypothesis H0: X and Y are
independent against the alternative H1: X and Y are positive quadrant dependent
but not independent. Suppose the marginal distributions of X and Y are known.
Let (X1,Y1), (X2,Y2),…,(XN,YN) be N independent realizations of (X,Y). Let T be a
statistic (a function of the given data) that is used to test the validity of H0
against H1. Let C be the critical region of the test. We want to compute the
power function of the test based on T. Let µ be a specific joint distribution of X
and Y under H1. The computation of Pr(T ∈ C) under the joint distribution µ of
X and Y is usually difficult. Let µ(1),µ(2),…,µ(k) be the extreme points of the set
of all distributions of X and Y with the given marginals which are positive
quadrant dependent. One can write µ = λ1 • µ (1) + λ 2 • µ ( 2) + ... + λ k • µ ( k ) for
some non-negative λ’s with sum equal to unity. The joint distribution of the
data is given by the product measure µN. One can write the joint distribution of
the data as a convex combination of joint distributions of the type
µ (n11) ⊗ µ (n22) ⊗ ... ⊗ µ (nkk) with n1,n2,…,nk non-negative integers with sum equal to
C.R. Rao, M. B. Rao & D.N. Shanbhag
150
N.
Generally, computing Pr(T ∈ C) is easy under such special joint
distributions. Also, Pr(T ∈ C) under µN will be a convex combination of
Pr(T ∈ C) ’s evaluated under such special joint distributions. The relevant
formula has been hammered out in Bhaskara Rao, Krishnaiah, and
Subramanyam (1987).
6.
References
Barlow , R.E. and Proschan, F. (1981). Statistical Theory of Reliability and Life Testing: Probability
Models, Holt, Reinhart, and Winston, Silver Spring, Maryland, USA.
Eaton, M.L. (1982). A Review of Selected Topics in Multivariate Probability Inequalities, Ann.
Statist., 10, 11-43.
Kellerer, H.G. (1984). Duality Theorems for Marginal Problems, Z. Wahrscheinkeitstheorie Verw.
Geb., 67, 399-432.
Lehmann, E.L. (1966). Some Concepts of Dependence, Ann. Math. Statist., 37, 1137-1153.
Rachev, S.T. and Ruschendorf, L. (1998). Mass Transportation Problems, Part I : Theory, Spinger,
New York.
Rachev, S.T. and Ruschendorf, L. (1998). Mass Transportation Problems, Part II : Applications,
Spinger, New York.
Rao, M. Bhaskara, Krishnaiah , P.R., and Subramanyam, K. (1987). “A Structure Theorem on
Bivariate Positive Quadrant Dependent Distributions and Tests for Independence in Two-Way
Contingency Tables,” J. Multivariate Anal., 23, 93-118.
Rao, C. Radhakrishna and Shanbhag, D.N. (1995). Choquet-Deny Type Functional Equations
with Applications to Stochastic Models, John Wiley, New York.
Subramanyam, K. and Rao, M. Bhaskara (1988). Analysis of Odds Ratios in 2×n Ordinal
Contingency Tables, J. Multivariate Anal., 27, 478-493.
Received: September 2001