(multi)-set (multi)-cover and covering integer programs

Primal-dual RNC approximat ion algorithms
for (multi)-set (multi)-cover and covering integer programs
Vijay V. Vaeiranit
Indian Institute of Technology,
Delhi.
Sridhar Rajagopalan'
University of California,
Berkeley.
Abstract
1
Introduction
We build on the classical greedy sequential set cover
algorithm, in the spirit of the primal-dual schema, t o
obtain simple parallel approxima.tion algorithms for the
set cover problem and its generalizations. Our algorithms use randomization, and our randomized voting
lemmas may be of independent interest. Fast parallel approximation algorithms were known before for set
cover [BRS89] [LN93], though not for any of its generalizations.
Given a collection of sets over a universe containing n
elements, and a cost for each set,, the set cover problem asks for a minimum cost sub-collection that covers
all the n elements. Set multi-cover and multi-set multicover are natural generalizations in which elements have
specified coverage requirements, and the instance consists of multi-sets, i.e., elements may occur multiple
number of times in a set. Because of its generality, wide
applicability and good combinatorial structure, the set
cover problem occupies a central place in the theory of
approximation algorithms.
Set, cover was one of the first problems shown to be
NP-complete - in Karp's seminal paper [I<a72]. Soon after this, the natural greedy algorithm was shown to be
an H , factor approximation algorithm for this problem
( H , = 1 1 / 2 . . l / n ) - by Johnson [Jo74] and Lovasz [Lo751 for the unit cost case, and by Chvatal [Ch79]
for the arbitrary cost case. This approximation factor
was recently shown to be essentially tight by Lund and
Yannakakis [LY92]. The greedy algorithm is inherently
sequential, however, and can be made to take O ( n ) iterations on suitably constructed examples.
The first parallel algorithm for approximating set
cover w a s obtained by Berger, Rompel and Shor
[BRS89]. They gave an RNC5 algorithm, and they derandomized it to obtain an N C 7 algorithm. More recently, Luby and Nisan [LN93] obtained a (1 E ) factor
(for any constant E > 0), NC3 algorithm for positive
linear programs. Since the integrality gap for set cover
is O(1og n ) , this approximates the cost of the optimal set
cover to within an O(1ogn) factor. Furthermore, using
a randomized rounding technique [Ra88], they can obtain an integral solution achieving the same approximation factor, thus getting an RNC3 set cover algorithm.
For special set systems, where each element occurs in a
constant number of sets (this includes the vertex cover
problem), I<huller, Vishkin and Young [I<VY93] parallelize the a.pproximation algorithm of Hochbaum [Ho82].
In this paper, we present a new RNC3 approximation algorit,hm for set cover, and RNC4 approximation
+
'Supported by N S F PYI Award CCR 88-96202 and NSF grant IRI
91-20074 Part of this work was done when visitiiig IIT, Delhi
t Partial
support provided by DIMACS
+ .+
+
0272-5428/93 $03.00 0 1993 E E E
322
algorithms for set multi-cover, multi-set multi-cover and
covering integer programs (i.e., integer programs of the
'
, and the
form min c x, s.t. A x 2 b where x E Z
entries in the vectors c and b and the matrix A are all
non-negative), each achieving an approximation guarantee of O(logn). In addition, our algorithm for covering
integer programs achieves a better approximation factor
than the best known sequential guarantee [Do82].
Our algorithms utilize several ideas from the previous
works. As described below, the basis of our work is the
sequential greedy algorithm. We use the general idea
from [BRS89] of relaxing the greedy set picking condition so that rapid progress is guaranteed; however, our
relaxation criterion is new, and our algorithm is quite
different. Also, our rounding lemma 4.1.1 is similar to
that in [LN93], except that it is carried out in the integer
programming framework.
The greedy set cover algorithm can be viewed as a
primal-dual algorithm, and can be analyzed within the
powerful framework of LP-duality; this is the starting
point of our work. Primal-dual considerations and randomized voting lemmas enable us to build on the sequential algorithm to obtain the parallelization. For
details of the primal-dual schema in the exact and approximate settings, see (PS821 and [WGMV93].
Since the greedy sequential algorithm is based on very
general principles, i.e., the primal dual framework, and
not on properties that are specific to set cover, it extends easily to solving the generalizations of set cover
mentioned above. In turn, our parallel set cover algorithm also extends to these problems. The main new
ideas needed for parallelizing multi-set multi-cover are
in the analysis of the randomized step. These are best
shown in the simpler setting of set multi-cover. It turns
out that for both these problems, it is easier to parallelize the version in which each set can be picked a t most
once; the other version, in which sets can be picked arbitrarily many times, is an instance of covering integer
programs. Finally, we give an approximation preserving NC' reduction from covering integer programs to
multi-set multi-cover.
2
a t which it can be covered in the current iteration, i.e.,
def
cs
current-cost (e) = min -.
~ 3 (SI
e
Thus, in an iteration, this algorithm covers elements
having least current-cost.
Our parallelization involves two central ideas. The
first is a way of relaxing the criterion for choosing costeffective sets so that rapid progress is guaranteed, and
at the same time, the approximation guarantee does not
go up by more than a constant factor. Our parallel algorithm is also iterative; in an iteration, the algorithm first
identifies all sets that are cost-effective by the relaxed
criterion (for this, the current-cost of an element is as
defined above). However, there may be too many such
(overlapping) sets, and a good set cover cannot contain
them all. The second idea is the use of randomization
for picking a sub-collection of these sets such that on an
average, elements are not covered too many times. This
is done in a series of phases.
PARALLEL
SETCOV
Iteration:
For each element e, compute current-cost(e).
For each set S: include S in L if
(e)
CeES
current-cost(e) 2 %.
Phase:
(a)Permute C a t random.
(b)Each element e votes for first set S (in the
random order) such that e E S .
(c)If Et:votes s current-cost(e) 2
S IS added to the set cover.
(d)Delete covered elements from sets. If any set
fails to satisfy ( e ) , it is deleted from C.
Repeat until L: is empty.
Iterate until all elements are covered.
s,
Remark: Notice that current-cost(e) is computed only
at the beginning of an iteration, and is not updated as
the iteration proceeds.
0
We will prove that our algorithm requires a t most
O(1ogn) iterations, and O(1ogn) phases in each iteration, and achieves an approximation guarantee of 16H,.
Parallel set cover algorithm
3
Our parallel set cover algorithm builds on the classical sequential greedy algorithm. This (sequential) algorithm is iterative; in each iteration it picks a set that
covers new elements a t the minimum average cost, i.e.,
a set that minimizes cs/lSI, and removes these elements
from consideration (cs is the cost of set S ) . The algorithm terminates when all elements are covered. In an
iteration of this algorithm, for an uncovered element,
define its current-cost to be the minimum average cost
Key ideas behind our algorit hm
As stated in the introduction, our parallel set cover algorithm emerged from LP-duality principles. In this
section, we will first state the LP-relaxation of the set
cover problem and recall a simple proof of the sequent,ial algorithm. This will enable us to point out the ideas
that lead to the parallelization.
323
3.1
The set cover problem and its LP
relaxation
and y is dual feasible.
The value of the dual objective is
0
We can express the set cover problem as an integer program as follows. Here I S is the C / 1 variable indicating
whether set S is picked in the set cover.
Hence, the greedy algorithm finds an integral primal
feasible solution and a dual feasible solution such that
IP = H,DP. By the LP-duality theorem, DP 5
DP' = LP'
IP'. Therefore, IP 5 H,IP*. This
completes the proof of the following theorem.
IP :
MIN
s.t.
csCsxs
.cs2 1
I S
E {0,1}
The LP-relaxation and its dual are as follows.
LP :
DP :
MIN
CSCSXS
MAX
Cepe
s.t.
CS3e2 5 L 1 s.t.
Gees Ye L cs
I S LO
Ye L 0
The primal problem is a covering problem, and the
dual is a packing problem. Henceforth we will use IP.
LP and DP to refer to both, the problems and the
value of the objective achieved by the specific solution
being considered; the context will resolve any ambiguity.
The superscript * will be used for the optimal solution.
Hence, we want an approximation to IP' .
3.2
Theorem 3.1 [Ch79],[Jo7d],[Lo75]
The greedy algorithm as an H , factor approximation algorithm for set
cover.
Remark: Notice that theorem 3.1 and its proof will
hold even if H , is replaced by H k , where k is the cardinality of the largest set in the instance. This is true for
0
our parallel algorithm as well.
3.3
The greedy algorithm is inherently sequential, and it
is easy to construct examples on which it takes O ( n )
iterations. Notice that in an iteration, this algorithm
considers a set S for inclusion in the set cover only
if Ce E current-cost(e) = cs, and it picks any one
of these sets. Our central idea for achieving parallelism is to relax appropriately this criterion for choos
ing cost-effective sets so that rapid progress is guaranteed. Furthermore, we still want to attribute the
cost of picked sets to elements in such a way that the
amount charged to an element, called cost(e), is at most
a 'current-cost(e), in the iteration in which the element
was covered, where Q is a constant. Then, by essentially
the proof of lemma 3.2.1 , ye = cost(e)/aH, will again
be a dual feasible solution, and our algorithm would
achieve a factor of Q H, .
We first give a straightforward way of relaxing the
criterion for picking sets, together with an example to
show that the algorithm can be forced t o take @ ( n )
iterations.
Example: Number the sets arbitrarily. In each iteration, each element will vote for the lowest numbered set
votes will
covering it a t least cost, and any set with
add itself to the set cover. This process is repeated until
all elements are covered. (Notice that replacing
by
essentially gives us the greedy algorithm.)
This algorithm takes O ( n ) iterations on the following input. Let the universal set be U = {ti,, V i : i =
l . . . n }. Let the sets be U, = ( t i ~ , ~ i + ~ ,and
~i+~}
V, = { ~ ; , z L ~ + ~for
, vi ~
= +l .~. .} ( n - l ) . Finally,choose
have the same cost, and
costs of sets so that U;and
0
the cost of U, > the cost of U ; + l .
Proof of approximation guarantee
for the classical greedy algorithm
s
As mentioned above, the sequential greedy algorithm
picks a most cost-effective set in each iteration. When a
set S is picked, its cost, cs, is ascribed equally to each of
the new elements that S covers. Let cost(e) be the cost
ascribed to e. Notice that cost(e) is the current-cost of
e in the iteration in which e was covered, and the cost
of the set cover is the sum of the costs ascribed to all
the elements.
cost(e) = cost of set cover = IP
e
For each element e, let ye =
vector of all ye's.
Lemma 3.2.1 y
IS
Relaxing the set picking criterion,
and the primal-dual framework
q,
and ket y be the
dual feasible.
Proof: Consider an arbitrary set S . Let a l , a2,.. . be
the reverse order in which the elements in S were covered. When ai was covered, S had at least i uncovered
elements. Therefore, it was covering elements a t a cost
not larger than y . Since a, was covered at the least
possible cost, cost(a;) 5 y . Therefore,
1st
Since ye = cost(e)/Hn,
324
Our criterion for identifying cost-effective sets in an
current-cost(e) 2 y . These sets are
iteration is CeES
included in the collection C. As pointed out in Section 2, these sets may have too many overlaps. For
example, if the sets were all possible pairs of elements,
each with cost 1, then all sets would be identified as
cost-effective. Clearly, we will not be able to attribute
the cost of all these sets to elements so that elements
are not over-charged (i.e., cost(e) 5 a . current-cost(e),
for some constant a). Our algorithm gets around this
problem as follows: we will pick an ordering of the sets
in C, and then each element will give a vote of value
acurrent-cost(e) (for a suitable choice of a ) to the first
set in this ordering that contains this element. A set
gets picked in the set cover if it gets a substantial fraction of its cost as votes, and it attributes its cost to
the elements that voted for it. Clearly the first set in
the ordering will get picked; however, that may not be
sufficient to guarantee rapid progress. We do not know
of any deterministic way of picking the ordering so that
votes get concentrated on several sets. Instead, we show
that picking a random ordering works. This is done in
phases, until each set in L: is either picked, or is identified as not cost-effective any more (since elements that
get covered are removed from all sets). This is the only
use of randomness in our algorithm.
as a primal-dual algorithm.
It will be useful to view the greedy algorithm and
our parallelization in the framework of the primal-dual
schema. Such an algorithm starts with a primal infeasible solution and a dual feasible solution, and it iteratively improves the feasibility of the primal solution, and
the optimality of the dual solution, terminating when
it obtains a primal feasible solution. This framework
is particularly useful for obtaining approximation algorithms for NP-hard combinatorial optimization problems that can be stated as integer programming problems: the schema works with the LP-relaxation and
dual. and always seeks an integral extension to the primal solution. On termination, the primal integral feasible solution can be compared directly with the dual feasible solution obtained, since the latter is a lower bound
on the fractional, and hence integral, O P T (assuming we
started with a minimization problem). The framework
leaves sufficient scope for effectively using the combinatorial structure of the problem a t hand: in designing
the algorithm for the improvement steps, and carrying
out the proof of approximation guarantee.
4.1
Our criterion for picking sets can now be stated in
terms of the element dual variables, ye's. The main
difference between our criterion and the one given in
the abovestated example is that elements of different
current-costs can contribute to a set's being picked, and
their contribution is proportional to their current-costs.
This weighted mixing of diverse elements, and the corresponding better use of dual variables, appears to lend
power to our criterion.
Approximation guarantee and
running time
4
Lemma 4.0.1 PARALLEL
SETCOVis an 16H, approximation algorithm for set cover.
Proof: Each set that is included in the set cover ascribes its cost to the elements that voted for it. Because
of step (c) in the algorithm, this can be done without
charging an element e more than 16 . current-cost(e)
at the moment e was covered. Since C,cost(e) =
Cost of set cover and ye = c
3 is dual feasible, the
lemma follows.
0
Bounding the number of iterations
We will first show that the set costs can be modified so
that they lie in a polynomial range.
Lemma 4.1.1 It suffices to conszder the case, cs E
[I, n21.
Proof: Define /3 = max, mins3e c s . Then it is easy to
verify that p 5 IP' 5 np. Any set with weight more
than nd could not possibly be in the optimum set cover.
Hence, we can omit all such sets from consideration.
Further, if there are e and S such that e E S and cs 5
we cover e immediately using S. Since there are at
most n elements the total cost incurred is a t most an
additional 4. Since /3 is a lower bound on IP', this cost
0
is subsumed in the approximation.
E,
Lemma 4.1.2 I n
an iteratton, define a
mine current-cost(e). Then
a increases by at least a factor of 2 after each iteration.
dzf
Proof: At any point in the current iteration, consider
a set S that has not yet been included in the set cover
S1 is the number of
and such that cs 5 2aJSI where I
uncovered elements in S . Then by definition of a ,
The greedy algorithm can be viewed as a primal-dual
algorithm as follows. At any point in the algorithm, for
element e , let ye be cost(e)/H, if e is already covered,
and current-cost(e)/H, if e is not yet covered. Then,
by essentially the proof of lemma 3.2.1, it can be shown
that y is a dual feasible solution. The currently picked
sets constitute the primal solution. By replacing H, by
a . H, in setting ye'syour algorithm can also be viewed
CS
Ccurrent-cost(e) 2 l ~ l a2 2
eES
Thus S satisfies (e) and must have satisfied it a t the
inception of the iteration. Thus S E L. However, at
325
of all elements in S is a t least y , the total current-cost
of elements f such that (S,f) is not good is a t least
By lemma 4.2.1, each of the latter elements votes for S
with probability at least 1/2, conditioned on the event
that e votes for S, and so the expected vote that S gets
is a t least y. Since the total vote to S is bounded by
0
cs , the lemma follows Markov’s inequality.
the end of an iteration, t is empty. Thus for every
remaining S, c s 2 2alSI. This ensures that in the next
iteration, a increases by a factor of 2.
0
Since a is restricted to
n2] as an immediate c o n e
quence of lemma 4.1.1 above, and it doubles with each
iteration, we obtain the following lemma.
2.
[i,
Lemma 4.1.3 PARALLEL
SETCOVrequires at most
O(1ogn) iterations.
4.2
Lemma 4.2.3 O(1ognm) phases sufice to complete
any iterafaon.
Bounding the number of phases in
each iteration
Proof: (Sketch) We will consider the following potential
function:
An easy argument for bounding the number of phases
in an iteration by O(log2n) can be based on the following intuition: Let the degree of a n element be the
number of sets in C in which it occurs. Elements with
large degree will occur early in the random ordering of
13, and these early sets stand a good chance of being
picked. Hence such elements get covered with constant
probability, and the max degree decreases by a constant
factor after O(1og n) phases.
Below we show that the number of phases in an iteration is bounded by O(logn), by considering the relative
degrees of elements in each set and using a potential
function that weights each element to the extent of its
degree.
Call a set-element pair (S,e), e E S good if deg(e) 2
We will ascribe the decrease in 3 when an element e
votes for a set S and S is subsequently added t o the set
cover to the set-element pair (S,e). The decrease in 0
that will then be ascribed to (S,e) is deg(e), since 3
decreases by 1 for each set containing e. Since e voted
for only one set, any decrease in @ is ascribed to only
one (S,e) pair. Thus the expected decrease in 3 is a t
least.
prob(e voted for S, S was picked) . deg(e)
2
(see)
ith
deg(f) for a t least
of the elements f E S. For such
a pair, we will show that if e votes for S then S gets
picked with high probability. This is so because the low
degree elements in S are unlikely to occur in a set before
S in the random ordering of L.
2
(prob(e voted for S)
(S,e ) good
xprob(S was picked 1 e voted for S)
xWe))
Lemma 4.2.1 Let e, f E S such that deg(e) >_ deg(f).
Then,
prob [f votes S I e votes SI
1
= -(number
1
2
>-
15
Proof: Let N e ,N/ and Nb be the number of sets that
contain e but not f, f but not e, and both e and f
respectively. deg(e) 2 deg(f) implies that Ne 2 N,.
Hence,
prob[f votes S
I
e votes SI =
Ne+Nb
Ne+Nt,+Nj
of good (S,e) pairs)
ith
Finally since at least
fraction of all (S,e) pairs are
good, we observe that E ( A 0 ) 2 &3. Thus, 3 decreases
by a constant factor in each phase. Since 3 starts a t a
value of a t most nm, in O(1ognm) phases it will be
down to zero and the iteration will be over.
0
Clearly, each phase can be implemented in O(1ogn)
time on a linear number (in the size of the instance, i.e.,
sum of cardinalities of all sets) of processors. Hence we
get:
,1
2
0
Lemma 4.2.2 If (S,e) is good then
Theorem 4.1 PARALLEL
SETCOVis an RNC3 algorithm that approzimates set cover to within 16Hn, using
a linear number of processors.
1
prob( S is picked I e votes for S) 2 15
B,
Proof: For any element f E S, current-cost(f) 5
since S covers f a t this average cost. So, the total
current-cost of elements f such that (S,f) is good is
a t most 9. Furthermore, since the total current-cost
+
Comment: The constant 16 can be reduced to 2 c
for any e. The factor 2 is inherent is in the probabilistic
analysis of lemma 4.2.1.
0
326
Set multi-cover
5
Now, by ordering the elements covered by S first, and
the remaining elements in the reverse order in which
their requirement was satisfied, the RHS can be shown
0
to be bounded by c s .
The version of set multi-cover that we will first parallelize is the one in which each set can be picked a t most
once. Dobson [Do821 shows how the greedy set cover algorithm extends to this problem; however, he compares
the solution obtained to the integral optimal IP'. For
our purpose, we want to compare the solution obtained
to the fractional optimal LP', and we will first present
this argument.
Lemma 5.1.2 Cost of the set multi-cover, IP =
C(s,e)
cost(S, e) = DP . H,,
Proof: The first equality follows from the manner in
which the cost of the sets picked is attributed to the
set-element pairs. For the second equality, notice that
DP . H,,
=
r,max-cost(e)
Sequential analysis
5.1
Assume each element e has a coverage requirement re.
The greedy set cover algorithm extends as follows: say
that an element is alive until it is covered to the extent of
its requirement. In each iteration, pick a set that covers
alive elements a t the least average cost, and remove all
elements that are not alive from consideration. The LPrelaxation and dual for this problem are:
e
-
(max-cost(e) - cost(S, e))
S
=
e covered by
S
cost(S,e)
(s,e)
= IP
0
Theorem 5.1 The extended greedy algorithm j n d s a
set multi-cover within an H, factor of LP'.
5.2
When a set S is picked, its cost, cs is ascribed
to (S,e) pairs for each alive element e it covers,
i.e. for each such pair, cost(S,e) = cs/lSI, where
IS1 is the number of alive elements in S. For each
element e, define max-cost( e) = maxS { cost ( S ,e)},
and let ye = max-cost(e)/H,,.
If a set S is
not picked, let zs = 0, and othe-wise let zs =
covered by s(max-cost(e)-cost(s,e))
PARALLEL
SETMULTCOV
Iteration:
For each element e, compute current-cost(e).
For each set S: include S in C if
(0)
CeES
current-cost(e) 2 8.
Phase:
Initialization:
Let r(e) be the dynamic requirement, i.e.,
the requirement that is yet t o be satisfied.
Furthermore, .(e) is restricted to be a t most
deg(e) in L;if not, set r(e) = deg(e).
(a) Permute C at random.
(b) Each element e votes for first r(e) sets it
belongs t o in the random ordering of L.
(c) If E,
current-cost(e) 2 g,
S is added to the set cover.
(d) Delete sets that are picked or fail to
satisfy ( 0 ) from C.
Decrement r(e) to the smaller of the
dynamic requirement or deg(e) in C.
Repeat until C is empty.
Iterate until all elements are covered.
H,
Lemma 5.1.1 y,z is dual feasible,
Proof: Suppose S is not picked in the set cover. Then,
by ordering elements in the reverse order in which their
requirement was satisfied, and the proof of lemma 3.2.1,
it is easy to check that
Next assume S is picked in the set cover. Then,
Ye
- 2s
e€S
-
Parallel algorithm
max-cost(e)
e E
S ,and not
covered by S
5.3
Analysis
The main difference in the analysis is in bounding the
number of phases in an iteration. The other differences
327
are small, and we will sketch them first. When a set
is picked, it assigns to each set-element pair that voted
It is easy to see that if ye and zs
for it a cost of
are scaled down by a factor of 32, y, zis dual feasible
(by essentially the proof of lemma 5.1.1). Hence the
algorithm achieves an approximation factor of 32H,.
The number of iterations is bounded by O(1ognm).
The only change required for proving this is in the definition of p: Let 5'1, S2 . .- S , be the sets arranged in
increasing order of cost. Let Pe be the weight of the rth
set in this sequence containing e, and let p = max, De.
Then, p 5 IP' 5 mng.
The number of phases required in an iteration is a t
most O(log2 nm). The main differences in the proof are
in proving a lemma analogous to lemma 4.2.1, and in
extending the potential function t o take care of multiple
requirements.
Therefore.
Lemma 5.3.1 Let e, f E S , then
Corollary 5.3.2 Let e, f E S such that
v.
prob(e and
15
f
l (
1
deg(e)-l
'(e)
deg(f)-l
+
r(f)
prob(e does not vote for S
z . (deg(e) - 1 )
<
-
.(e)
Substituting this value back in ( l ) ,we obtain
prob(e and f vote for S )
+
Choosing the value of B to maximize the right hand side,
we get
prob(e and f vote S) 2
1
prob( f votes S
votes S ) 2
4
ith
for a t least
of the elements f E S. By essentially the proof of lemma 4.2.2, one can show that if
(S,e) is good, then
prob( S is picked I e votes for S ) 2
prob(e and f vote SlXs = z) dz
1
31
-
The potential function we use is:
prob (e and f vote S ( X s = z) dz
f? E [O,11
1 - prob(e does not vote SlXs = z)
Q,
(1)
E(AQ,)
2 prob(e voted for S ) x
I(e, T)
>_
XT<Z
It is easy to see that the espectation of Ye(.) is given
by E ( Y e ( z ) )= z (deg(e) - 1). Applying Markov's inequality to Ye(.) we get the following
1 X s = z) s
prob(S was picked 1 e voted for S)-d e d e )
.(e)
r(e) 1 d e d e )
deg(e) 31 .(e)
'
Hence, the expected decrease in Q, after a phase is at
least O(&),
and so the number of phases required
log r ) in expectation. Since
Theorem 5.2 PARALLEL
SETMULTCOV
is an RNC4
algorithm that approximates set multi-cover to within
32H,,, using a linear number of processors.
Finally, we notice that
(e does not vote for S
Hv(e)
The initial value of Q, is 5 mn log r , where r is the largest
requirement. The expected decrease in Q, ascribable to
a good set-element pair ( S ,e) is given by
f? is a parameter that we will choose later to maximize
the RHS. We will now focus on the right hand term
prob(e does not vote S I X s = z ). Define Ye(.) to be
the number occurrences of e in sets earlier than S.
ef
def
(s+)
-prob (f does not vote SJXs= z) dx
ye(.)
Ie
5 $-&.
1
Remark: This lemma has interpretations in other situations such as permutations of n white and m black
Q
balls and up-right walks in Manhattan grids.
Say that a set-element pair (S,e) is good if
1
prob(e and f vote S ) =
2
1
1
Q
Proof: Consider picking the random permutation of
L: by first choosing a real number Xs E [0,1] for each
S and then sorting sets by Xs. Define an indicator
variable I(e, S) E ( 0 , l ) . Here I(e, S ) = 1 iff e E S. We
begin by conditioning on the value of XS.
1'
I@
I@
1 Xs = z)
Ye(.) 2 .(e)
328
6
Multi-set multi-cover
In the multi-set multi-cover problem. the sets can have
multiple copies of an element; clearly, we may assume
that the multiplicity of an element is bounded by its requirement. We will consider the version of this problem
in which a set can be picked at most once. For technical reasons pointed out later, we will assume that the
requirement of each element is bounded by some fixed
polynomial in n. The IP formulation is as follows:
i n the figure 1, below. A rotation is performed by choosing a chunk of sets containing a t least r'(e,k) copies
of e enumerated from the rear of the permutation and
putting them in the front as shown. Since, we assume
that any set has a t most r'(e, k) copies of e, each chunk
has between r'(e, k) and 2r'(e, k) copies of e. So we can
- 1 times. Hence we have our
'rotate' a t least
1--24,
claimed
2 permutations, and the lemma fol0
lows, using the fact that r'(e, k) 5 &(e, k).
The only change needed in the sequential algorithm
for this problem from the set multi-cover case is that
multiplicities have to be taken into consideration for
evaluating the cost-effectiveness of a set. For the parallel
algorithm, in step (b), an element votes for the first r(e)
copies of itself, and the constant 32 in step (c) changes
to 128.
For the analysis, we will assume that the copies of an
element in a set are numbered, and the votes are given
in this order. If (e,k) is the kth copy of element e in
set S , define r'(e,k) = .(e) - k 1. Define deg'(e,k)
to be the degree of e in a set system where each set has
been truncated t o contain at most r'(e,k) copies of e.
Consider two copies of two elements (not necessarily different) in set S. The statement of lemma 5.3.1 changes
only in that r(e) and deg(e) is replaced by r'(e,k) and
deg'(e, k), and the proof is also identical. We need the
following additional lemma:
Figure 1: A rotation
+
Then,
prob((e, k) votes for Sl(f,I ) votes for S) 2
Proof: (Sketch) The lower bound follows from considerations similar to those in the proof of lemma 5.3.1. In
particular, we need to bound
7
''
1 - prob((e, k) does not vote SlXs = z) dz
deg'(e, k) - 1
-
(
.'(elk)
Covering integer programs
Covering integer programs are integer programs of the
following form:
prob ((e, k) votes S ) =
1'
1
8
-
The rest of the proof is identical to the set multi-cover
case, yielding an O(1ogn) factor RNC4 approximation
algorithm for multi-set multi-cover. The number of processors required is linear in the sum of cardinalities of
all sets, taking multiplicities into consideration.
Lemma 6.0.3 Let us consider the k t h copy of e , (e, k)
zn S. Then,
2
awj.
Corollary 6.0.4 Let (e, k) and (f,I ) be the kth and the
Ith copies of e and f in S , and &$$j >
CIP :
MIN
O2
s.t.Ve
)T
cscszs
M(S,e ) z s 2 R,
Is
"S
Choosing 8 appropriately, we get the lower bound.
For the upper bound we notice that with each permutation of the sets such that ( e , k ) votes S. we can
- 2 permutations such that
associate a t least
(e, k) does not vote for S . These permutations are constructed by repeatedly performing 'rotations' as shown
E z+
Here, iZ+ are the non-negative integers. Without loss
of generality, we may assume that M(S,e) 5 R e .
These programs differ from multi-set multi-cover in that
the multiplicities and requirements are arbitrary nonnegative real numbers, and sets can be picked any integral number of times.
329
largest entry in A , assuming that the smallest one is 1
[Do82].
Notice that the multi-set multi-cover problem with
the restriction that each set can be picked a t most once
is not a covering integer program, since we will have
negative coefficients. This raises the question whether
there is a larger class than covering integer programs
for which we can achieve (even sequentially) an O(log n )
approximation factor.
Lemma 7.0.5 There i s an approximation preserving
NC' reduction from covenng integer programs to multiset multi-cover.
Proof: Essentially, we need to reduce the element
requirements t o a polynomial range, and ensure that
the requirements and multiplicities are integral. Then,
replicating sets to the extent of the largest element requirement, we would get an instance of multi-set multicover. We will achieve the first requirement as follows:
we will obtain an (crude) approximation to O P T within
a polynomial factor, and then we will ensure that the
costs of sets are in a polynomial range around this estimate. Once this is done, we will show that rounding
down multiplicities and rounding up requirements will
not cause much loss in approximation guarantee.
Let pe = re . mins M(&) , and 4 = m u e b'e. Clearly,
p 5 CIP'. On the other hand it is possible to cover e
at cost 2be by the following reasoning. Let Se be the
Then, we can cover e by
set minimizing mins
References
,:a,,.
picking
copies of S e . The cost incurred is a t
most [*j]cs
which in turn is at most 28, since
re 2 M ( S , e ) . Therefore, all elements can be covered
at cost a t most 2 C e p , which is at most 2n4. Hence,
CIP' 5 2n4
If a set S has large cost, i.e, cs > 2734, then S will
not be used by O P T , and so we can eliminate it from
consideration. So, for the remaining sets, cs 5 2n4.
Define as =
We will clump together as copies
[&I.
of S. The cost of the set so created is a t least {. If any
element is covered to its requirement by a set of cost
cover the element using that set. The cost incurred
in the process is a t most 4 for all elements so covered,
and this is subsumed in the approximation factor. The
reason we require as to be integral is that we need to
map back solutions from the reduced problem to the
original problem. Henceforth, we can assume that the
costs satisfy cs E
2np1.
Since the cost of each set is a t least 1 and the optimum
value is at most 2n2, we can deduce that the number of
sets such that z
: > 0 is a t most 2nZ. We set M'(S, e ) =
w e also set r: = 4n3. N O W , it is easily
seen that the total rounding error introduced is a t most
(& per element). Since the value of the optimal is a t
least 1, this amount is subsumed in the approximation
0
factor. This completes the reduction.
E,
[BRS89]
Berger, B., Rompel, J., and Shor, P.
Efficient NC Algorithms for Set Cover
with applications t o Learning and Geometry. 30thIEEE Symposium on the Foundations of Computer Science (1989), proceedings pp 54-59.
[Ch79]
Chvatal, V. A greedy heuristic for the
set covering problem. Math. Oper. Res.
4 ~~233-235.
[Do821
Dobson, G. Worst-case Analysis of
Greedy Heuristics for Integer Programming with Non-Negative Data. Mafh.
Oper. Res.7 pp515-531.
[Ho82]
Hochbaum, D.S. Approximation algorithms for the set cover and vertex cover
problems. SIAM J . Comput. l l ( 3 ) pp555556.
[Jo74]
Johnson, D.S. Approximation Algorithms for Combinatorial Problems. J.
Comput. Sys. Sci. 9 pp256-278.
[Ka72]
Karp, R.M. Reducibility among combinatorial problems. Complexity of Computer Computations. R.E.Miller and J .W.
Thatcher (eds.) Plenum Press, New York.
~~85-103.
[LN93]
Luby, M. and Nisan, N . A parallel approximation algorithm for Positive Linear Programming. 25thACM Symposium
on Theory of Computing (1 998)) proceedings, pp 448-457.
[KVY931
Khuller, S., Vishkin, U. and Young, N. A
Primal-dual parallel approximation technique applied to weighted set and vertex
cover. Third IPCO Conference 1993, proceedings pp 333-341.
[Lo751
Lovasz, L. On the ratio of Optimal Integral and Fractional Covers. Discrete
Math. 13 ~ ~ 3 8 3 - 3 9 0 .
[E,
1-.
Theorem 7.1 There is an O(1ogn) factor RNC4 approtimatron algonthm for covenng integer programs, requinng O(nz#(A)) processors, where # ( A ) i s the number of non-zero entnes in A .
The previous best approximation factor for covering
integer programs was H n m o o ( ~where
),
m a z ( A ) is the
330
[LY92]
Lund, C. and Yannakakis, M. On the
hardness of approximating Minimization
Problems. 25thACM Symposium on Theo r y of Computing (1993). proceedings, pp
286-293.
[PS82]
Papadimitriou, C.H., and Steiglitz, E<.
Combinatorial Optimization: Algorithms
and Complexity Prentice Hall Inc., New
Jersey, 1982.
[Ra881
Raghavan. P. Probabilistic Construction
of Deterministic Algorithms: Approximating packing Integer Programs. J.
Comput. Sys. Sci. 37 pp 130-143.
[WGMV93]
Williamson, D.P., Goemans, M.X.,
Mihail, M., and Vazirani, V.V. A
Primal-Dual Approximation Algorithm
for Generalized Steiner Network Problems. 2SthACM Symposium on Theory of
Computing (1993), proceedings, pp 708717.
331