Adding Unimodality or Independence Makes Interval Probability

Adding Unimodality or Independence Makes Interval
Probability Problems NP-Hard
Daniel J. Berleant
Electrical and
Computer Engineering
Iowa State University
Ames, IA 50011, USA
[email protected]
Olga Kosheleva
Hung T. Nguyen
NASA Pan-American Center for Dept. of Mathem. Sciences
Earth and Environmental Studies New Mexico State Univ.
(PACES)
Las Cruces, NM 88003, USA
University of Texas at El Paso
[email protected]
El Paso, TX 79968, USA
[email protected]
Abstract
In many real-life situations, we only
have partial information about probabilities. This information is usually
described by bounds on moments, on
probabilities of certain events, etc.
– i.e., by characteristics c(p) which
are linear in terms of the unknown
probabilities pj . If we know interval
bounds on some such characteristics
ai ≤ ci (p) ≤ ai , and we are interested in a characteristic c(p), then
we can find the bounds on c(p) by
solving a linear programming problem.
In some situations, we also have additional conditions on the probability distribution – e.g., we may know
that the two variables x1 and x2 are
independent, or that for each value
of x2 , the corresponding conditional
distribution for x1 is unimodal. We
show that adding each of these conditions makes the corresponding interval probability problem NP-hard.
Keywords: Interval Probability,
Unimodality, Independence, NPHard.
1
Introduction
Interval probability problems can be
often reduced to linear programming
(LP). In many real-life situations, in addition to the intervals [xi , xi ] of possible values
of the unknowns x1 , . . . , xn , we also have partial information about the probabilities of different values within these intervals.
This information is usually given in terms of
bounds on the standard characteristics c(p) of
the corresponding probability distribution p,
def R
such as the k-th moment Mk = xk · ρ(x) dx
(where ρ(x) is the probability density), the
values of the cumulative distribution
function
Rt
def
(cdf) F (t) = Prob(x ≤ t) = −∞ ρ(x) dx of
some of the variables, etc. Most of these characteristics are linear in terms of ρ(x) – and
many other characteristics like central moments are combinations of linear chractaristics: e.g., variance V can be expressed as
V = M2 − M12 .
A typical practical problem is when we know
the ranges of some of these characteristics
ai ≤ ci (p) ≤ ai , and we want to find the range
of possible values of some other characteristic
c(p). For example, we know the bounds on
the marginal cdfs for the variables x1 and x2 ,
and we want to find the range of values of the
cdf for x1 + x2 .
In such problems, the range of possible values of c(p) is an interval [a, a]. To find a
(correspondingly, a), we must minimize (correspondingly, maximize) the linear objective
function c(p) under linear constraints — i.e.,
solve a linear programming (LP) problem;
see, e.g., [15, 16, 17, 20].
Other simple examples of linear conditions
include bounds on the values of the density
function ρ(x); see, e.g., [14].
Comment. Several more complex problems
can also be described in LP terms. For example, when we select a new strategy for a
company (e.g., for an electric company), one
of the reasonable criteria is that the expected
monetary gain should be not smaller than the
expected gain for a previously known strategy. In many case, for each strategy, we
can estimate the probability of different production values – e.g., the probability F (t) =
Prob(x ≤ t) that we will produce the amount
≤ t. However, the utility u(t) corresponding
to producing t depends on the future prices
and is not well known; therefore, we cannot
predict
the exact value of the expected utility
R
u(x) · ρ(x) dx. One way to handle this situation is require that for every monotonic utility function u(t), the expected utility under
the new strategy – with probability density
function (pdf) ρ(x) and cdf F (x) – is larger
than or equal to the expected utility under the
old
strategy – with
pdf ρ0 (x) and cdf F0 (x):
R
R
u(x)·ρ(x) dx ≥ u(x)·ρ0 (x) dx. This condition is called first order stochastic dominance.
It is known that this condition is equivalent
to the condition that F (x) ≤ F0 (x) for all x.
Indeed, the condition is equivalent to
Z t
0
u(x) · (ρ(x) − ρ0 (x)) dx ≥ 0.
Integrating by part, we conclude that
−
Z t
0
u0 (x) · (F (x) − F0 (x)) dx ≥ 0;
since u(x) is non-decreasing, the derivative
u0 (x) can be an arbitrary non-negative function; so, the above condition is indeed equivalent to F (x) ≤ F0 (x) for all x.
Each of these inequalities is linear in terms of
ρ(x) – so, optimizing a linear objective function under the constraints F (x) ≥ F0 (x) is
also a LP problem.
This requirement may be too restrictive; in
practice, preferences have the property of risk
aversion: it is better to gain a value x with
probability 1 than to have either 0 or 2x with
probability 1/2. In mathematical terms, this
condition means that the corresponding utility function u(x) is concave. It is therefore
reasonable to require that for all such risk
aversion utility functions u(x), the expected
utility under the new strategy is larger than
or equal to the expected utility under the old
strategy. This condition is called second order
stochastic dominance (see, e.g., [7, 8, 18, 19]),
and itR known to beRequivalent to the condition
that 0t F (x) dx ≤ 0t F0 (x) dx.
Indeed, the condition is equivalent to
Z t
0
u(x) · (ρ(x) − ρ0 (x)) dx ≥ 0
for every concave function u(x). Integrating
by part twice, we conclude that
Z t
0
00
µZ x
u (x)·
0
F (z) dz −
Z x
0
¶
F0 (z) dz
dx ≥ 0.
Since u(x) is concave, the second derivative
u00 (x) can be an arbitrary non-positive function; so, Rthe above condition
is indeed equivR
alent to 0t F (x) dx ≤ 0t F0 (x) dx for all t.
The cdf F (x) is a linear combination
of the
R
values ρ(x); thus, its integral F (x) dx is also
linear linear in ρ(x), and hence the above condition is still linear in terms of the values ρ(x).
Thus, we again have a LP problem; for details,
see [2].
Most of the corresponding LP problems
can be efficiently solved. Theoretically,
some of these LP problems have infinitely
many variables ρ(x), but in practice, we can
discretize each coordinate and thus, get a LP
problem with finitely many variables.
There are known efficient algorithms and software for solving LP problems with finitely
many variables. These algorithms require
polynomial time (≤ nk ) to solve problems
with ≤ n unknowns and ≤ n constraints;
these algorithms are actively used in imprecise probabilities; see, e.g., [1, 1, 3, 4, 5, 6].
For example, for the case of two variables x1
and x2 , we may know the probabilities pi =
p(x1 ∈ [i, i + 1]) and qj = p(x2 ∈ [j, j + 1]) for
finitely many intervals [i, i + 1]. Then, to find
the range of possible values of, e.g.,
Prob(x1 + x2 ≤ k),
we can consider the following linear programming problem: the unknowns are
def
pi,j = p(x1 ∈ [i, i + 1] & x2 ∈ [j, j + 1]),
the constraints are pi,j ≥ 0, pi,1 + pi,2 + . . . =
pi , p1,j + p2,j + . . . = qj , and the objective
P
function is
pi,j .
i,j:i+j≤k
Comment. The only LP problems for which
there may not be an efficient solution are
problems involving a large amount of variables v. If we discretize each variable into n
intervals, then overall, we need nv unknowns
pi1 ,i2 ,...,iv (1 ≤ i1 ≤ n, 1 ≤ i2 ≤ n, . . . ,
1 ≤ iv ≤ n) to describe all possible probability distributions. When v grows, the number of unknowns grows exponentially with v
and thus, for large v, becomes unrealistically
large.
It is known (see, e.g., [12]) that this exponential increase in complexity is inherent to
the problem: e.g., for v random variables
x1 , . . . , xv with known marginal distributions,
the problem of finding the exact bounds on
the cdf for the sum x1 + . . . + xv is NP-hard.
Beyond LP. There are important practical
problems which lie outside LP. One example is
problems involving independence, when constraints are linear in p(x, y) = p(x) · p(y) and
thus, bilinear in p(x) and p(y). In this paper,
we show that the corresponding range estimation problem is NP-hard.
Another example of a condition which cannot
be directly described in terms of LP is the
condition of unimodality. For a one-variable
distribution with probabilities p1 , . . . , pn , unimodality means that there exists a value m
(“mode”) such that pi increase (non-strictly)
until m and then decreases after m:
p1 ≤ p2 ≤ . . . ≤ pm−1 ≤ pm ;
For a 1-D case, if we do not know the location of the mode, we can try all n possible
locations and solve n corresponding LP problems. Since each LP problem requires a polynomial time to run, running n such problems
still requires a polynomial time.
In the 2-D case, it is reasonable to consider
the situation when, e.g., for every value of x2 ,
the corresponding conditional distribution for
x1 is unimodal. In this case, to describe this
as a LP problem, we must select a mode for
every x2 . If there are n values of x2 , and at
least 2 possible choices of mode location, then
we get an exponential amount of 2n possible
choices. In this paper, we show that this problem is also NP-hard – and therefore, that, unless P=NP, no algorithm can solve it in polynomial time.
Comment. Other possible restrictions on
probability may involve bounds on the entropy of the corresponding probability distributions; such problems are also, in general,
NP-hard [11].
2
Adding Unimodality Makes
Interval Probability Problems
NP-Hard
Definition 1 Let n1 > 0 and n2 > 0 be given
integers.
• By a probability distribution, we mean
a collection of real numbers pi1 ,i2 ≥ 0,
1 ≤ i ≤ n1 , and 1 ≤ j ≤ n2 , such that
n1 P
n2
P
i1 =1 i2 =1
pi1 ,i2 = 1.
• We say that the distribution pi1 ,i2 is unimodal in the 1st variable (or simply unimodal, for short) if for every i2 from 1 to
n2 , there exists a value m such that pi1 ,i2
grows with i1 for i1 ≤ m and decreases
with i1 for i1 ≥ m:
pm ≥ pm+1 ≥ . . . ≥ pn−1 ≥ pn .
p1,i2 ≤ p2,i2 ≤ . . . ≤ pm,i2 ;
When the location of the mode is known, we
get several linear inequalities, so we can still
use efficient techniques such as LP; see, e.g.,
[9, 21].
pm,i2 ≥ pm+1,i2 ≥ . . . ≥ pn1 ,i2 .
• By a linear constraint on the probability
distribution, we mean the constraint of
the type b ≤
n1 P
n2
P
i1 =1 i2 =1
bi1 ,i2 · pi1 ,i2 ≤ b for
some given values b, b, and bi1 ,i2 .
• By an interval probability problem under
unimodality constraint, we mean the following problem: given a find list of linear
constraints, check whether there exists a
unimodal distribution which satisfies all
these constraints.
Theorem 1 Interval probability problem under unimodality constraint is NP-hard.
• if εi2 = 1, then we take p1,i2 = p2,i2 = 0
and p3,i2 = 1/n.
The resulting distribution is unimodal: indeed, for each i2 , its mode is the value 1 + εi2 .
Let us check that it satisfies all the desired
constraints. It is easy to check that for every
i2 , we have p2,i2 = 0 and p1,i2 + p2,i2 + p3,i2 =
1/n. Finally, due to our choice of pi1 ,i2 , we
1
conclude that −si2 ·p1,i2 +si2 ·p3,i2 = ·εi2 ·si2
n
and thus,
n2
X
Comment. So, under the unimodality constraint, even checking whether a system of linear constraints is consistent – i.e., whether the
range of a given characteristic is empty – is
computationally difficult (NP-hard).
Proof. We will show that if we can check,
for every system of linear constraints, whether
this system is consistent or not under unimodality, then we would be able to solve a
partition problem which is known to be NPhard [10, 13]. The partition problem consists of the following: given n positive integers s1 , . . . , sn , check whether exist n integers
εi ∈ {−1, 1} for which ε1 · s1 + . . . + εn · sn = 0.
Indeed, for every instance of the partition
problem, we form the following system of constraints: n1 = 3, n2 = n,
• p2,i2 = 0 for every i2 = 1, . . . , n2 ,
• p1,i2 + p2,i2 + p3,i2 = 1/n for every i2 =
1, . . . , n2 ;
•
n2
P
(−si2 · p1,i2 + si2 · p3,i2 ) = 0.
i2 =1
Let us prove that this system is consistent if
and only if the original instance of the partition problem has a solution.
“If ” part. If the original instance has a solution εi ∈ {−1, 1}, then, for every i2 from
1 to n2 , we can take p2+εi2 ,i2 = 1/n and
pi1 ,i2 = 0 for i1 6= 2 + εi2 . In other words:
• if εi2 = −1, then we take p1,i2 = 1/n and
p2,i2 = p3,i2 = 0;
n
(−si2 ·p1,i2 +si2 ·p3,i2 ) =
i2 =1
2
1 X
·
εi ·si = 0.
n i =1 2 2
2
“Only if ” part. Vice versa, let us assume
that we have a unimodal distribution pi1 ,i2 for
which all the desired constraints are satisfied.
Since the distribution is unimodal, for every
i2 , there exists a mode mi2 ∈ {1, 2, 3} for
which the values pi1 ,i2 increase for i1 ≤ mi2
and decrease for i1 ≥ mi2 . This mode cannot
be equal to 2, because otherwise, the value
p2,i2 = 0 will be the largest of the three values p1,i2 , p2,i2 , and p3,i2 hence all three values
will be 0 – which contradicts to the constraint
p1,i2 + p2,i2 + p3,i2 = 1/n. Thus, this mode is
either 1 or 3:
• if the mode is 1, then due to monotonicity, we have 0 = p2,i2 ≥ p3,i2 hence
p3,i2 = p2,i2 = 0;
• if the mode is 3, then due to monotonicity, we have p1,i2 ≤ p2,i2 = 0 hence
p1,i2 = p2,i2 = 0.
In both case, for each i2 , only one value of
pi1 ,i2 is different from 0 – the value pmi2 ,i2 .
Since the sum of these three values is 1/n,
this non-zero value must be equal to 1/n. If
def
we denote εi = mi − 2, then we conclude that
εi ∈ {−1, 1}. For each i2 , we have
−si2 · p1,i2 + si2 · p3,i2 = εi2 · si2 · (1/n),
hence from the constraint
n2
X
n
(−si2 ·p1,i2 +si2 ·p3,i2 ) =
i2 =1
2
1 X
·
εi ·si = 0,
n i =1 2 2
2
P
we conclude that
εi · si = 0, i.e., that the
original instance of the partition problem has
a solution.
b≤
n1
X
i=1
ai · pi +
n2
X
j=1
bj · qj +
n1 X
n2
X
ci,j · pi · qj ≤ b
i=1 j=1
The theorem is proven.
Comment. The above constraints are not
just mathematical tricks, they have a natural
interpretation if for x1 , we take the values −1,
0, and 1 as corresponding to i1 = 1, 2, 3, and
for and for x2 , we take the values s1 , . . . , sn .
Then:
• the constraint p2,i2 = 0 means that
Prob(x1 = 0) = 0;
• the constraint p1,i2 + p2,i2 + p3,i2 = 1/n
means that Prob(x2 = si ) = 1/n for all
n values si , and
• the constraint
n2
P
(−si2 ·p1,i2 +si2 ·p3,i2 ) =
i2 =1
0 means that the expected value of the
product is 0: E[x1 · x2 ] = 0.
So, the difficult-to-solve problem is to check
whether it is possible that E[x1 · x2 ] = 0 and
Prob(x1 = 0) = 0 for some unimodal distribution for which the marginal distribution on
x2 is “uniform”.
3
Adding Independence Makes
Interval Probability Problems
NP-Hard
In general, in statistics, independence makes
problems easier. We will show, however, that
for interval probability problems, the situation is sometimes opposite: the addition of
independence assumption turns easy-to-solve
problems into NP-hard ones.
Definition 2 Let n1 > 0 and n2 > 0 be given
integers.
• By an independent probability distribution, we mean a collection of real numbers pi ≥ 0, 1 ≤ i ≤ n1 , and qj , 1 ≤ j ≤
n2 , such that
n1
P
i=1
pi =
n2
P
j=1
qj = 1.
• By a linear constraint on the independent
probability distribution, we mean the constraint of the type
for some given values b, b, ai , bj , and ci,j .
• By an interval probability problem under independence constraint, we mean
the following problem: given a find list
of linear constraints, check whether there
exists an independent distribution which
satisfies all these constraints.
Comment. Independence means that pi,j =
pi · qj for every i and j. The above constraints
are linear in terms of these probabilities pi,j =
pi · qj .
Theorem 2 Interval probability problem under independence constraint is NP-hard.
Proof. To prove this theorem, we will reduce the problem in question to the same
known NP-hard problem as in the proof of
Theorem 1: to the partition problem.
For every instance of the partition problem,
we form the following system of constraints:
n1 = n2 = n,
• pi − qi = 0 for every i from 1 to n;
• Si · pi − pi · qi = 0 for all i from 1 to n,
where
def 2 · si
.
Si = P
n
sk
k=1
Let us prove that this system is consistent if
and only if the original instance of the partition problem has a solution.
Indeed, if the original instance has a solution
εi ∈ {−1, 1}, then, for every i from 1 to n, we
1 + εi
can take pi = qi =
· Si , i.e.:
2
• if εi = −1, we take pi = qi = 0;
• if εi = 1, we take pi = qi = Si .
n
P
Let us show that for this choice,
n
P
j=1
n
X
i=1
qj = 1. Indeed,
pi =
n
X
1 + εi
i=1
2
i=1
·Si =
n
n
1 X
1 X
·
Si + ·
εi ·Si .
2 i=1
2 i=1
2 · si
By definition of Si = P
n
k=1
i=1
n
P
Si = 2 ·
i=1
k=1
and
, we have
sk
n
P
n
X
si
= 2,
sk
n
P
n
X
εi · Si = 2 ·
i=1
Since
i=1
n
P
i=1
εi · si = 0, the second sum is 0, hence
In both cases εi = ±1, we have Si ·pi −pi ·qi =
0, so all the constraints are indeed satisfied.
Vice versa, if the constraints are satisfied, this
means that for every i, we have pi = qi and
Si · pi − pi · qi = pi · (Si − qi ) = pi · (Si −
pi ) = 0, so pi = 0 or pi = Si . Thus, the
value pi /Si is equal to 0 or 1, hence the value
def
εi = 2 · (pi /Si ) − 1 takes values −1 or 1. In
1 + εi
terms of εi , we have pi /Si =
, hence
2
n
P
1 + εi
pi =
· Si . Since
pi = 1, we conclude
2
i=1
that
i=1
pi =
n
n
1 X
1 X
·
Si + ·
εi · Si = 1.
2 i=1
2 i=1
n
n
X
1 X
We know that ·
Si = 1, hence
εi ·Si =
2 i=1
i=1
0. We know that this sum is proportional to
n
P
i=1
εi · si , hence
n
P
i=1
εi · si = 0 – i.e., the orig-
inal instance of the partition problem has a
solution.
The theorem is proven.
This work was supported in part by
NASA under cooperative agreement NCC5209, NSF grant EAR-0225670, NIH grant
3T34GM008048-20S1, and Army Research
Lab grant DATM-05-02-C-0046.
The authors are very thankful to the participants of the 4th International Symposium
on Imprecise Probabilities and Their Applications ISIPTA’05 (Carnegie Mellon University, July 20–24, 2005), especially to Mikelis
Bickis (University of Saskatchewan, Canada),
Arthur P. Dempster (Harvard University),
and Damjan Ŝkulj (University of Ljubljana,
Slovenia), for valuable discussions.
.
sk
pi = 1.
n
X
Acknowledgements
References
εi · si
i=1
n
P
k=1
n
P
pi =
[1] D. Berleant, M.-P. Cheong, C. Chu,
Y. Guan, A. Kamal, G. Sheblé, S. Ferson, and J. F. Peters, Dependable handling of uncertainty, Reliable Computing,
2003, Vol. 9, No. 6, pp. 407–418.
[2] D. Berleant, M. Dancre, J. Argaud, and
G. Sheblé, Electric company portfolio optimization under interval stochastic dominance constrraints, In: F. G. Cozman,
R. Nau, and T. Seidenfeld, Proceedings
of the 4th International Symposium on
Imprecise Probabilities and Their Applications ISIPTA’05, Pittsburgh, Pennsylvania, July 20–24, 2005, pp. 51–57.
[3] D. Berleant, L. Xie, and J. Zhang, Statool: a tool for Distribution Envelope
Determination (DEnv), an interval-based
algorithm for arithmetic on random variables, Reliable Computing, 2003, Vol. 9,
No. 2, pp. 91–108.
[4] D. Berleant and J. Zhang, Using Pearson
correlation to improve envelopes around
the distributions of functions, Reliable
Computing, 2004, Vol. 10, No. 2, pp. 139–
161.
[5] D. Berleant and J. Zhang, Representation and Problem Solving with the Distribution Envelope Determination (DEnv)
Method, Reliability Engineering and System Safety, 2004, Vol., 85, No. 1–3.
[6] D. Berleant and J. Zhang, Using Pearson
correlation to improve envelopes around
the distributions of functions, Reliable
Computing, 2004, Vol. 10, No. 2, pp. 139–
161.
[7] A. Borglin and H. Keiding, Stochastic
dominance and conditional expectation
an insurance theoretical approach, The
Geneva Papers on Risk and Insurance
Theory, 2002, Vol. 27, pp. 31-48.
[8] A. Chateauneuf, On the use of capacities
in modeling uncertainty aversion and risk
aversion, Journal of Mathematical Economics, 1991, Vol. 20, pp. 343-369.
[9] S. Ferson, RAMAS Risk Calc 4.0, CRC
Press, Boca Raton, Florida.
[10] M. R. Garey and D. S. Johnson, Computers and Intractability, a Guide to the
Theory of NP-Completeness, W.H. Freemand and Company, San Francisco, CA,
1979.
[11] G. J. Klir, G. Xiang, and O. Kosheleva, “Estimating information amount
under interval uncertainty: algorithmic
solvability and computational complexity”, Proceedings of the International
Conference on Information Processing and Management of Uncertainty
in Knowledge-Based Systems IPMU’06,
Paris, France, July 2–7, 2006 (to appear).
[12] V. Kreinovich and S. Ferson, Computing Best-Possible Bounds for the Distribution of a Sum of Several Variables is NPHard, International Journal of Approximate Reasoning, 2006, Vol. 41, pp. 331–
342.
[13] V. Kreinovich, A. Lakeyev, J. Rohn,
P. Kahl, Computational complexity and
feasibility of data processing and interval
computations, Kluwer, Dordrecht, 1997.
[14] V. G. Krymsky, Computing Interval Estimates for Components of Statistical Information with Respect to Judgements
on Probability Density Functions, In:
J. Dongarra, K. Madsen, and J. Wasniewski (eds.), PARA’04 Workshop on
State-of-the-Art in Scientific Computing,
Springer Lecture Notes in Computer Science, 2005, Vol. 3732, pp. 151–160.
[15] V. Kuznetsov, Interval Statistical Models, Radio i Svyaz, Moscow, 1991 (in
Russian).
[16] V. P. Kuznetsov, Interval methods
for processing statistical characteristics,
Proceedings of the International Workshop on Applications of Interval Computations APIC’95, El Paso, Texas, February 23–25, 1995 (a special supplement
to the journal Reliable Computing), pp.
116–122.
[17] V. P. Kuznetsov, Auxiliary problems of
statistical data processing: interval approach, Proceedings of the International
Workshop on Applications of Interval
Computations APIC’95, El Paso, Texas,
February 23–25, 1995 (a special supplement to the journal Reliable Computing),
pp. 123–129.
[18] D. Skulj, “Generalized conditioning in
neighbourhood models”, In: F. G. Cozman, R. Nau, and T. Seidenfeld, Proceedings of the 4th International Symposium
on Imprecise Probabilities and Their Applications ISIPTA’05, Pittsburgh, Pennsylvania, July 20–24, 2005.
[19] D. Skulj, A role of Jeffrey’s rule of conditioning in neighborhood models, to appear.
[20] P. Walley, Statistical Reasoning with Imprecise Probabilities, Chapman & Hall,
N.Y., 1991.
[21] J. Zhang and D. Berleant, Arithmetic
on random variables: squeezing the envelopes with new joint distribution constraints, Proceedings of the 4th International Symposium on Imprecise Probabilities and Their Applications ISIPTA’05,
Pittsburgh, Pennsylvania, July 20–24,
2005, pp. 416–422.