He_Yan_TOC_Final_Presentation

Complexity Theory in Practice
The Power of Randomization
Speaker: He Yan
November 17, 2008
1
Introduction (Traditional Algorithms)
Traditional algorithms have the following properties:
• They are always correct
• They are always precise-- the answer is not given as a range
• They are deterministic-- although there maybe multiple correct
outputs, the same instance of a problem always produces the
same output
• Each of them operates on the same efficiency for the same
instance of a problem
2
Introduction (Randomized Algorithms)
Randomized algorithms: the class of algorithms which employs a
degree of randomness as part of its logic. They can be:
• Nondeterministic-- they can make random but correct decisions
which means the same algorithm may behave differently when is
applied twice to the same instance of a problem.
• Not very precise always-- usually the more time is given, the
better precision can be guaranteed.
• Although randomized algorithms’ behavior is unpredictable for a
single execution, we can get a probabilistic characterization of
its behavior over a number of runs with different efficiencies.
• Incorrect sometimes and nonterminating sometimes: expect a
high probability of being correct; do not produce an answer at all.
Because: A randomized algorithm makes random choices during
the execution.
3
Why do we need randomized algorithms
• If an algorithm is confronted by a choice, it may be preferable to randomly
choose a course of action rather than spending time on figuring out which
alternative is the best.
• Sometimes, we do not have a better method than making random choices.
• One advantage is that if more than one correct answer exist, several
different ones may be obtained by running the probabilistic algorithm more
than once.
Comparison of the factors influencing the performance between two
• Randomized algorithm’s performance will be a random variable determined
by the random bits (algorithm itself) but not by data.
• Deterministic algorithm’s performance depends on data as well as on the
algorithm- it’s data that induces a probability distribution.
4
Expected V.S. Average time
1, 2, 3
1, 3, 2
2, 1, 3
2, 3, 1
3, 1, 2
3, 2, 1
A Sorting Algorithm
E.g. sorting 3 integers
T123
T132
T213
T231
T312
+ T321
6
The average time of a deterministic algorithm is computed by
considering each possible instance of a given size equally likely.
5
T1
T2
1, 2, 3
A Sorting Algorithm
T3
T4
T5
T6
Compute
the mean
The expected time of a probabilistic algorithm is defined
on each individual instance. It’s the mean time required
to solve the same instance again and again.
6
Pseudorandom Number Generations
(pseudorandom)
• In the randomized algorithms, we assume the availability of a
random number generator that can be called at unit cost.
• We assume that a call on uniform(i, j) returns an integer x that
is chosen randomly in the interval i ≤ x ≤ j.
• We assume that the distribution of x is uniform on the
interval and successive calls on the generator yield
independent values of x.
• However truly random generators are not usually available in
practice. Most of the time pseudorandom
7
Pseudorandom Number Generators (PRNG)
• An algorithm for generating a sequence of numbers that
approximates the properties of random numbers. The sequence
is not really random since it is completely decided by a set of
initial values called PRNG’s state.
NOTE:
• A sequence of calls to a pseudorandom generator will produce
values that appear to have the properties of a random sequence.
• A pseudorandom generator will need a seed. The same seed
will produce the same sequence of values.
• A sequence generated by a pseudorandom generator is periodic
with the period not exceeding the number of values between i
and j for uniform(i, j).
• A good pseudorandom generator can generate a sequence that
for most practical purposes is indistinguishable from a truly
random sequence.
8
• Common classes of PRNG
Linear congruential generators:
where Xn is the sequence of random values
0 < m: the "modulus"
0< a < m: the "multiplier"
0≤ c < m: the "increment"
0 ≤ X0 < m: the "seed" or "start value"
Integer constants which
specify the generator
Lagged Fibonacci generators:
m is usually a power of 2
Additive: uses addition as the ★ operator
Multiplicative: uses multiplication as the ★ operator
9
Simple example: Quicksort
• Quicksort is a familiar, commonly-used algorithm in
which randomness can be useful
• Best-case running time: O(nlogn)
• Worst-case running time: O(n2)
• Average-case running time: O(nlogn)
• If we choose the (first) element as the pivot, the
complexity depends on the input data: data-dependant
distribution
• However, Randomized quicksort (randomly select the
pivot) requires O(nlogn) expected time regardless of
the input since the worst case (O(n2 ) ) won’t be
triggered repeatedly by the same input elements
10
• Randomized algorithms have been largely used to
speed up existing solutions to tractable problems as
well as providing approximate solutions for hard
problems.
• It applies to a decision problem returns “yes” or “no”
with a probabilistic guarantee to ensure the
correctness of the answer (at least this probability can
be improved to any desired level)
11
• Here we still focus on decision problems,
however, REMEMBER: randomized algorithms
are also used to provide approximate solutions
for optimization problems.
• Go finding two famous randomized algorithms
now!
12
Monte Carlo algorithm (MC)
• MC methods are a class of computational algorithms that rely
on repeated random sampling to compute their results
• A random MC algorithm runs a polynomial time but may
provide error with probability less than some constant (say ½)
• A one-sided MC decision algorithms never errs when it returns
one type of answer (say “no”) and errs with probability less
than some constant (say ½) when returning “yes”
• The most interesting feature of MC algorithm is that it is often
possible to reduce the error probability arbitrarily at the cost of
a slight increase in computing time. This is called amplifying
the stochastic advantage.
13
MC example
• Monte Carlo Pi
We want to approximateπusing MC methods
2
2
The area of the circle: π r= π =1 π
1
2
The area of the square is (2r ) = 2 2= 4
The ratio of the area of the circle to the area of the square is:
Area of Circle
p=
=
Area of Square
2
π r
(2r ) 2
=
π
4
3.1415926
=
4
= 0.78539815
If we know the ratio, then we could multiple it by four to obtain π
One simple way: pick lattice points in the square and count how many of
them lie inside the circle: 812 points inside & 212 outside the circle p=
812/(812+212)= 0.79296975. Then area of circle ≈p * Area of circle = p * 4
= 0.79296975 * 4= 3.171875
14
MC method for π
• Randomly selecting points
in the
m
unit square and determining the ratio p= n ,
m is the number satisfying
• Sample size:1000, 787 points satis
m
787
-fying
, so p= n = 1000
and π≈0.784*4=3.148
15
An example of a one-sided MC algorithms
Example 8.3 (P336)
•
Given a Boolean function , we can construct a binary decision tree for it.
In a binary decision tree, each internal node represents a variable of the
function and has two children. One represents “true” variable while the
other represents “false” for the variable. Each leaf is labeled “true” or
“false”, representing the value of the function for the truth assignment
denoted by the path from the root to leaf. A binary decision tree example
(Fig. 8.10 P336).
One fundamental question is whether or not two trees represent the same
Boolean function This problem belongs to coNP: if two trees represent
different functions, then there is at least one truth assignment under which
the two functions will return different values, so that we can guess this truth
16
assignment and verify that the two binary decision trees return distinct values.
However, no deterministic polynomial algorithm has been found for such
problem, and nobody proves it coNP-complete. So in order not to guess a
truth assignment to the n variables and compute a Boolean value, we can use
a random assignment of integers in the range S= [0, 2n-1], and compute
(module p, where p is a prime no smaller than |S|) an integer as characteristic
of the entire tree under this assignment. If x is assigned value i, then we
assign 1-i (module p) to its complement, so that the sum of the value of x and
of x is 1. For each leaf of the tree labeled “true”, compute the product of the
values of the variables encountered along the path; then sum all the values.
Compare the two resulting numbers (one for each tree) then. If differ, it
means our algorithm concludes that the tree represent different functions,
17
• Otherwise it concludes that they represent the same function. The
algorithms gives the correct answer whenever two values differ but may err
n
when two values are equal. We claim that at least (|S| -1) of the possible
n
(|S|) assignments of values to the n variables will yield distinct values
when the two functions are distinct; this claim strongly implies that the
probability of error is bounded by
n
n
(2n-1)
(|S| - 1)
>1/2
=
n
n
|S|
(2n)
and that we have a one-sided Monte Carlo algorithm for this problem.
The claim trivially holds for functions of one variable; assume that it holds
for functions of n or fewer variables and consider two distinct functions f
and g so that we have f= x f x=0 + x f x=1. If f and g differ, then f x=0 and g x=0
differ, or f x=1 and g x=1 differ, or both. In order to have the value computed
18
• for f equal that computed for g, we should have:
(1-|x|) | f x=0 | + |x|| f x=1 | = (1-|x|) | g x=0 |+ |x| | g x=1 | (where we denote the
value assigned to x by |x| and for f by | f |). If | f x=0| and | g x=0 | differ, we
can write:
|x| (| f x=1 | - | f x=0 | - | g x=1 |+ | g x=0 | = | f x=0 | - | g x=0 |
which has at most one solution for |x| since the right-hand side is nonzero.
Thus we have at least (|S|-1) assignments for x that maintain the difference
in values for f and g given a difference in values for | f x=0 | and | g x=0 |---n
because the latter can be obtained by at least (|S|-1) assignments so that at
n+1
least (|S|-1) assignments will lead to different values whenever f and g
differ which is our wanted result!
19
Las Vegas Algorithm
• In computing, a Las Vegas algorithm is a randomized algorithm that never
gives incorrect results; that is, it always produces the correct result or it
informs about the failure
• Because of its nondeterministic nature, the run-time of a Las Vegas
algorithm is a random variable. It runs in polynomial time on average
assuming that all instances of size n are equally likely and running time on
-n
instance x is f(x), the expression ∑ x2 f(x), where the sum is taken over all
x of size n, bounded by polynomial in n
• Las Vegas algorithms are used to solve some of NP Complete problems,
Genetic algorithms, Evolution Strategies, Ant Colony Optimization and etc.
The time to solve these is random, which means the duration shouldn’t be
very big/long so another application of it is Cryptography application,
generation of very long prime numbers and etc.
• Example: Randomized quicksort discussed before, where the pivot is
chosen randomly, but at the end we will always get sorted data
20
Comparison
• Las Vegas Algorithm and Monte Carlo Algorithm are Randomized
algorithms
• Both are one-sided decision algorithms (either “yes” or “no” instance)
• A random MC algorithm runs a polynomial time but may provide error with
probability less than some constant (say ½)
• Unlike MC methods, a Las Vegas algorithm never returns a wrong answer
but may not run in polynomial time for all instances. It does not gamble
with the truth of the result --- it only gambles with the resources used for the
computation
• For MC method, given a “no” instance, all of leaves of the computation tree
are “no” leaves and for “yes” instance, at least half of the leaves of the
computation tree are “yes” leaves
• For Las Vegas, given a “no” instance, the computation tree has only “no”
instance, whereas given “yes” instance, it has at least one “yes” leaf
21
• We attempt to solve a problem in NP by using a randomized method i.e. by
producing & verifying a random certificate.
• If the answer returned by the algorithm is “yes” then the probability of error
is 0; otherwise, if the answer is “no”, then the probability or error will be
large.
|x|
• Particularly, there are 2 possible certificates and only one of them may
result in acceptance, so that the probability of error is bounded by (1-2-|x| )
times the probability that instance x is a “yes” instance.
• Since the bound is relied on the input size, we cannot achieve a fixed
probability of error by using a fixed number of trials.
• Generally speaking, we can conclude that a nondeterministic algorithm is a
generalization of a Monte Carlo algorithm (both are one-sided) with the latter
itself a generalization of a Las Vegas algorithm.
• We have a model of computation called random Turing machine which is
similar to a nondeterministic machine in that it has a choice of moves at every
step and thus make decisions. However, unlike its nondeterministic version, it
makes decision by tossing a fair coin.
22
• A RTM operates in polynomial time if the height of its computation tree is
bounded by a polynomial function of the instance size.
• If the computation is terminated after a polynomial number of moves, then
the machine will be prevented from reaching a conclusion.
• Leaves of a polynomial bounded computation tree are marked by one of
“yes”, “no”, or “don’t know”.
• Definition 8.15 (P339)
PP is the class of all decision problems for which there exists a polynomialtime random TM such that, for any instance x of ∏:
- if x is a “yes” instance, then the machine accepts x with probability larger
than 1/2 ;
- if x is a “no” instance, then the machine rejects x with probability larger
than 1/2.
23
• BPP is the class of all decision problems for which there exists a
polynomial time random Turing machine (PTRTM) and a
positive constant ε≤ 1/2 such that, for any instance x of ∏:
-- if x is a “yes” instance, then the machine accepts x with
probability no less than 1/2 +ε;
-- if x is a “no” instance, then the machine rejects x with
probability larger than 1/2 +ε;
(“B” indicates the probability is bounded away from 1/2.)
• RP is the class of all decision problems for which there exists a
polynomial time random Turing machine (PTRTM) and a
positive constant ε≤ 1 such that, for any instance x of ∏:
-- if x is a “yes” instance, then the machine accepts x with
probability no less than ε;
-- if x is a “no” instance, then the machine rejects x
24
• RP is a one-sided class whose complementary class can be
defined as coRP.
• The class RP∪coRP represents problems for which one-sided
Monte Carlo algorithms exist, whereas RP∩coRP corresponds to
problem for which Las Vegas algorithms exist.
• Lemma 8.1 (P339)
A problem ∏ belongs to RP ∩ coRP iff there exits a PTRTM
and a positive constant ε ≤ 1 such that,
-- the machine accepts or rejects an arbitrary instance with
probability no less than ε
-- the machine accepts only “yes” instances and rejects only
“no” instances.
25
• This new definition is quite similar to the definition of NP∩
coNP: the only change is to make εdependent upon the instance
rather than only upon the problem.
• We can conclude that RP∩coRP is a subset of NP∩coNP, RP is a
subset of NP, coRP is a subset of coNP, and BPP is a subset of
PP.
• Furthermore, since all computation trees are limited to
polynomial height, it’s obvious that all of these classes are
contained within PSPACE.
• Finally, since no computation tree is required to have all of
leaves labeled “yes” for a “yes” instance and labeled “no” for a
“no” instance. Additionally, P is contained within all of these
classes.
• Continuing examining the relationships among these classes, we
can find that the εvalue given in the definition of RP could as
easily have been specified larger than ½.
26
• Given a machine M with some εno larger than ½, we can
construct a machine M’ with an εlarger than ½ by making M’
iterate M for a series of trials- which is the main feature of MC
algorithms that the probability of error can be decreased to any
fixed value after a fixed-step trials.) Therefore the definition of
RP and coRP is just a strengthened (one side only) version of
BPP, so that both RP and coRP are within BPP.
27
• Theorem 8.27 (P340)
NP (also coNP) is a subset of PP.
Proof. We can use a random TM to approximate the
nondeterministic machine for a NP problem. Comparing
definitions for NP and PP, we find that the only thing to do is to
show how to take the nondeterministic machine M for our
problem and turn it into a suitable random machine M’. M
accepts a “yes” instance with probability larger than 0 but no
larger than any fixed constant. We need to make this probability
larger than ½, and this can be done through tossing a coin before
the starting any computation and accepting the instance a priori
if the toss produces heads. This procedure introduces an a priori
probability of acceptance called Pa , of ½, therefore the
probability of acceptance of “yes” instance x is now at least
-p(|x|)
½+2 and the probability of rejection of a “no” instance which
was exactly 1 without tossing coin will be now 1- Pa=1/2.
28
• The solution is quite straightforward: it’s enough to make Pa less
-p(|x|)
than ½ while still large enough so that Pa +2 >½. Tossing an
additional p(|x|) coins will satisfy: M’ accepts a priori exactly
when the 1st toss returns head and the next p(|x|) tosses do not
-p(|x|)-1
all return tails, so that Pa =½ -2
. Hence a “yes” instance is
-p(|x|)
-p(|x|)-1
accepted with probability Pa +2
=½+2
and
a “no”
-p(|x|)-1
instance is rejected with probability 1- Pa =½ + 2
.
Because M’ runs in polynomial time iff M does, it follows our
conclusion.
Q.E.D
29
The hierarchy of randomized
complexity classes
PSPACE
PP
BPP
co-NP
NP
co-R
R
NP ∩ co-NP
R ∩ co-R
P
• Resulting hierarchy of randomized classes and its
relation to P, NP, and PSPACE
30
•
There is one more class of complexity corresponding to the Las Vegas algorithms
(algorithms that always return the correct answer but have a random execution
time and a polynomial expectation time). The class of decision problems solvable
with this type of algorithms is represented by ZPP: “Z” stands for zero error
probability. It is no other than RP ∩ coRP.
• Theorem 8.28 (P342)
ZP equals RP ∩ coRP.
Proof. We prove containment in each direction
(ZPP  RP ∩ coRP) Given a machine M for a problem in ZPP, we construct a
machine M’ that answers the conditions for RP ∩ coRP by simply cutting the
execution of M after a polynomial time. This prevents M from returning a result
so that resulting machine M’, while running in polynomial time and never
returning a wrong answer, will have a small probability of not returning any
answer. It remains only to show that this probability is bounded above by some
constant  <1. Let q() be the polynomial bound on expected running time of M.
Define M’ by stopping M on all paths exceeding some polynomial bound r(),
where polynomials r() and r’() are chosen such that r(n) + r’(n) = q(n) and such
that r() provides the desired  .
31
•
Without loss of generality, we assume that all computations paths that lead to a
leaf within the bound r() do so in exactly r(n) steps. px represents the probability
that M’ doesn’t give any answer. On an instance of size n, the expected running
time of M will be given by (1- px )﹒r(n) + px ﹒tmax (n), where tmax (n) represents
the average number of steps on the paths that need more than polynomial time.
By hypothesis, this expectation is bounded by q(n)=r(n) + r’(n). Solving for px,
we get
r’(n)
px ≤
tmax (n) – r(n)
This equality is always smaller than 1, since the denominator is superpolynomial
by assumption. Because we can pick r() and r’(), we can make px smaller than
any

<0.

(RP ∩ coRP
ZPP) Given a machine M for a problem in RP ∩ coRP, we
construct a machine M’ that answers the conditions for ZPP. Let 1/k (k>1) be the
q(n)
bound on the probability that M doesn’t return an answer, let r() be the
polynomial bound on the running time of M, and k be a bound on the time
required to solve an instance of size n deterministically. On an instance of size n,
M’ simply runs M for up to q(n) trials. As soon as M returns an answer, M’
returns the same answer and stops; on the other hand, if none of the q(n)
32
•
successive runs of M returns answer, then M’ will deterministically solve the
instance. Since the probability that M doesn’t return any answer in q(n) trials is
k
-q(n)
﹒k
q(n)
-q(n)
, the expected running time of M’ is bounded by (1- k
)﹒r(n)+ k
-q(n)
-q(n)
=1 + (1- k
)﹒r(n). Hence the expected running time of M’ is bounded
by a polynomial in n.
Q.E.D
Because all known randomized algorithms are MC algorithms, Las Vegas
algorithms, or ZPP algorithms, the problems which we are able to tackle with
randomized algorithms appear to be confined to a subset of RP ∪ coRP.
Furthermore, as the membership of an NP-complete problem in RP would imply
NP= RP, an outcome considered unlikely, it follows that this subset of RP ∪coRP
does not include any NP-complete or coNP-complete problem. Therefore, in its
current state of development, randomization is far from being a panacea for the
hard problems !
The other two classes of randomized complexity: membership in BPP
indicates the existence of randomized algorithms that run in polynomial time
with an arbitrarily small, fixed probability of error.
33
• Theorem 8.29 (P343)
Let ∏ be a problem in BPP. Then, for any  >0, there exists a polynomialtime randomized algorithm that accepts “yes” instances and rejects “no”
instances of ∏ with probability at least 1- .

Proof. Since ∏ is in BPP, it has a polynomial-time randomized algorithm
A that accepts “yes” instances and rejects “no” instances of ∏ with
probability at least ½ +  , for some constant ε >0. Consider the
following new algorithm, where k is an odd integer to de defined shortly.
yes_count :=0;
for i: = 1 to k do
if A(x) accepts
then yes_count := yes_count +1
if yes_count > k div 2
then accept
else reject
34
• If x is a “yes” instance of ∏, then A(x) accepts with probability at least ½
+  ; thus probability of observing exactly j acceptance (and thus k-j
rejections) in k runs of A(x) is at least
j

( ) (1/2 + ) (1/2 - 
k
j
k-j
)
We can derive a simplified bound for this value when j does not exceed k/2
by equalizing the two powers to k/2:
( ) (1/2 + 
k
j
) (1/2 -  )
j
k-j
≤()
k
j (1/4
- )
2 k/2
Summing these probabilities for values of j not exceeding k/2, we an get
the probability that our new algorithm will reject a “yes” instance:
k/2 k
∑ j
j=0
( ) (1/2 +  )
≤ (1/4 - 4  )
2
j
(1/2 -
)
2 k/2 k/2
≤ (1/4 - ) ∑ ( )
k-j
j=0
k/2
k
j
Now we choose k in order to ensure (1-4  ) ≤ that gives us k ≥ 2log
2
/ log (1-4  ) so that k is a constant depending only on input constant  .
2

35
• Theorem 8.30 (P344)
p
p
BPP is a subset of ∑2 ∩ ∏2 (where these two classes are the nondeterministic
and co-nondeterministic classes at the second level of the polynomial
hierarchy discussed in Section 7.3.2)
If NP is not equal to coNP, then neither NP nor coNP is closed under
complementation, whereas BPP is of course; thus under standard conjecture,
BPP is not equal to NP or coNP. A result that we shall not prove states that
adding to a machine for the class BPP an oracle that solves any problem in
BPP itself does not increase the power of the machine; in our notation,
BPP
BPP
equals to BPP. By comparison, the same result holds trivially for
NP
class P, while it does not appear to hold for NP, since we know that NP
is a proper superset of NP. An immediate effect of this effect of this result
and of Theorem 8.30 is that: if we had NP  BPP, then the entire polynomial
hierarchy would collapse into BPP- something that would be so surprising.
Hence BPP does not appear to contain any NP-complete problem, so that the
36
scope of randomized algorithms is indeed fairly restricted.
Then what about the largest class, PP? Membership in PP is not likely to be of
much help, as the probabilistic guarantee on the error bound is so poor. The
amount by which the probability exceeds the bound of ½ may depend on the
instance of size n. Reducing the probability of error to a small fixed value for
such a problem needs an exponential number of trials. PP is quite closely
related to #P, the class of enumeration problems corresponding to decision
problems in NP. We are clear that a complete problem (under Turing
reductions) for #P is “How many satisfying truth assignments exist for a given
3SAT instance?” A similar problem “Do more than ½ of the possible truth
assignments satisfy a given 3SAT instance?” is complete for PP (Exercise
8.36). Thus, PP contains the decision version of the problem in #P- but not ask
for the number of certificates- the problems is concerned with whether
the number of certificates meets a certain bound. As a result, an oracle for PP
37
is as good as an oracle for #P, that is P PP is equal to P #P
Conclusion:
Randomized algorithms have the potential for providing efficient
and elegant solutions for many problems, as long as said
problems are not too hard. Whether or not a randomized
 the
algorithm indeed makes a difference remains unknown;
hierarchy of classes described earlier is not firm, since it is only
based on usual conjecture that all containments are proper (strict).
Randomized algorithms are dependent on the random bits
they use. However, in fact these bits are not really random, since
they are generated by pseudorandom number generator. In reality,
the randomized algorithms that we actually run are completely
deterministic for a fixed choice of seed.
38
Review
Classes of randomized (probabilistic) algorithms
• 1. Numerical probabilistic algorithms --- give an
approximation to the correct answer.
• 2. Monte Carlo Algorithms --- always give an answer,
but there is a probability of being completely wrong.
• 3. Las Vegas Algorithms --- sometimes fail to give an
answer, but if an answer is given, it is correct.
39
References
• B. M. Moret, The Theory of Computation, Chapter 8.4:
The Power of Randomization, Addison-Wesley, Reading,
Massachusetts, 1998, pp. 335 - 345.
• Wikipedia
• Gilles Brassard and Paul Bratley, “Randomized
Algorithms” in “Fundamentals of Algorithmics.”
• http://www.datastructures.info/the-las-vegasalgorithmmethod/
40
THANKS
41