ECS 120 Notes - UC Davis Canvas

ECS 120 Notes
David Doty
based on Introduction to the Theory of Computation by Michael Sipser
ii
c June 7, 2017, David Doty
Copyright No part of this document may be reproduced without the expressed written consent of
the author. All rights reserved.
Based heavily (some parts copied) on Introduction to the Theory of Computation, by
Michael Sipser. These are notes intended to assist in lecturing from Sipser’s book; they are
not entirely the original work of the author. Some proofs are also taken from Automata and
Computability by Dexter Kozen.
Contents
Introduction
0.1 Automata, computability, and complexity . . .
0.2 Mathematical Background . . . . . . . . . . . .
0.2.1 Sequences and Tuples . . . . . . . . . .
0.3 Proof by Induction . . . . . . . . . . . . . . . .
0.3.1 Proof by Induction on Natural Numbers
0.3.2 Induction on Other Structures . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1 Regular Languages
1.1 Finite Automata . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Formal Definition of a Finite Automaton (Syntax) . . . .
1.1.2 More Examples . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Formal Definition of Computation by a DFA (Semantics)
1.1.4 The Regular Operations . . . . . . . . . . . . . . . . . . .
1.3 Regular Expressions . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Formal Definition of a Regular Expression . . . . . . . . .
1.2 Nondeterminism . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 Formal Definition of an NFA (Syntax) . . . . . . . . . . .
1.2.2 Formal Definition of Computation by an NFA (Semantics)
1.2.3 More examples . . . . . . . . . . . . . . . . . . . . . . . .
1.2.4 Equivalence of DFAs and NFAs . . . . . . . . . . . . . . .
1.3 Regular Expressions (again) . . . . . . . . . . . . . . . . . . . . .
1.3.1 Equivalence with Finite Automata . . . . . . . . . . . . .
1.4 Nonregular Languages . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1 The Pumping Lemma for Regular Languages . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v
vi
viii
x
xiii
xiii
xiv
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
3
5
6
7
11
12
15
16
17
18
19
24
24
27
27
2 Context-Free Languages
33
2.1 Context-Free Grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.2 Pushdown Automata . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 The Church-Turing Thesis
37
3.1 Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.1.1 Formal Definition of a Turing machine . . . . . . . . . . . . . . . . . . . . . . 38
iii
iv
CONTENTS
3.2
3.1.2 Formal Definition of Computation by a Turing Machine . . . . . . . . . . . . 39
Variants of Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.1 Multitape Turing Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
7 Time Complexity
7.1 Measuring Complexity . . . . . . . . . . . . . . . . . . . .
7.1.1 Analyzing Algorithms . . . . . . . . . . . . . . . .
7.1.2 Complexity Relationships Among Models . . . . .
7.2 The Class P . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2.1 Polynomial Time . . . . . . . . . . . . . . . . . . .
7.2.2 Examples of Problems in P . . . . . . . . . . . . .
7.3 The Class NP . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 (More) Examples of Problems in NP . . . . . . . .
3.2 Variants of Turing Machines . . . . . . . . . . . . . . . . .
3.2.1 Nondeterministic Turing Machines . . . . . . . . .
7.3 The class NP . . . . . . . . . . . . . . . . . . . . . . . . .
7.3.1 Nondeterministic polynomial-time Turing machines
7.3.2 The P Versus NP Question . . . . . . . . . . . . .
7.4 NP-Completeness . . . . . . . . . . . . . . . . . . . . . . .
7.4.1 Polynomial-Time Reducibility . . . . . . . . . . . .
7.4.2 Definition of NP-Completeness . . . . . . . . . . .
7.4.3 The Cook-Levin Theorem . . . . . . . . . . . . . .
7.5 Additional NP-Complete Problems . . . . . . . . . . . . .
7.5.1 The Vertex Cover Problem . . . . . . . . . . . . .
7.5.2 The Subset Sum Problem . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
43
43
45
46
46
47
49
53
56
58
58
61
61
65
66
69
75
76
76
76
78
4 Decidability
83
4.1 Decidable Languages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
5 Reducibility
89
5.1 Undecidable Problems from “Language Theory” (Computability) . . . . . . . . . . . 89
5.1.1 A Non-c.e. Language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Decidability
99
4.2 The Halting Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.2 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
4.2.3 The Halting Problem is Undecidable . . . . . . . . . . . . . . . . . . . . . . . 102
Introduction
What This Course is About
What are the fundamental capabilities and limitations of computers?
The following is a rough sketch of what to expect from this course:
• In ECS 30/40/60, you programmed computers without studying them formally.
• In ECS 20, you formally studied things that are not computers.
• In ECS 120, you will use the tools of ECS 20 to formally study computers.
Newton’s equations of motion tell us that each body of mass obeys certain rules that cannot
be broken. For instance, a body cannot accelerate in the opposite direction in which force is being
applied to it. Of course, nothing in the world is a rigid body of mass subject to no friction, etc., so
Newton’s equations of motion do not exactly predict anything. But they are a useful abstraction
of what real matter is like, and many things in the world are close enough to this abstraction that
Newton’s predictions are reasonably accurate.
The fundamental premise of the theory of computation is that the computer on your desk obeys
certain laws, and therefore, certain unbreakable limitations. I often think of the field of computer
science outside of theory as being about proving what can be done with a computer, by doing
it. Much of research in theoretical computer science is about proving what cannot be done with
a computer. This can be more difficult, since you cannot simply cite your failure to invent an
algorithm for a problem to be a proof that there is no algorithm. But certain important problems
cannot be solved with any algorithm, as we will see.
We will draw no distinction between the idea of “formal proof” and more nebulous instructions
such as “show your work”/“justify your answer”/“explain”. A “proof” of a claim is an argument
that convinces an intelligent person who has never seen the claim before and cannot see why it
is true with having it explained. It does not matter if the argument uses formal notation or not
(though formal notation is convenient for achieving brevity), or if it uses induction or contradiction
or just a straightforward argument (though it is often easier to think in terms of induction or
contradiction). What matters is that there are no holes or counter-arguments that can be thrown
at the argument, and that every statement is precise and unambiguous.
Note, however, that one effective technique used by Sipser to prove theorems is to first give
a “proof idea”, helping you to see how the proof will go. The proof is easier to read because of
the proof idea, but the proof idea by itself is not a proof. In fact, I would go so far as to say
v
vi
INTRODUCTION
that the proof by itself is not a very effective proof either, since bare naked details and formalism,
without any intuition to reinforce it, do not communicate why the theorem is true any better than
the hand-waving proof idea. Both are usually necessary to accomplish the goal of the proof: to
communicate why the theorem is true. In this course, in the interest of time, I will often give the
intuitive proof idea only verbally, and write only the formal details on the board, since learning to
turn the informal intuition into a formal detailed proof is the most difficult part of this course. On
your homework, however, you should explicitly write both, to make it easy for the TA to understand
your proof and give you full credit.
0.1
Automata, computability, and complexity
Multivariate polynomials
√
x2 − y 2 = 2? Does this have a solution? Yes: x = 2, y = 0 What about an integer solution? No.
x2 + 2xy − y 3 = 13. Does this have an integer solution? Yes: x = 3, y = 2.
Equations involving multivariate polynomials which we ask for a integer solution are called
Diophantine equations.
Task A: write an algorithm that indicates whether a given multivariable polynomial equation
has any real solution.
Fact: Task A is possible.
Task A0 : write an algorithm that indicates whether a given multivariable polynomial equation
has any real integer solution.
Fact: Task A0 is impossible.1
Paths touring a graph
Does the graph G = (V, E) given in Figure 1 have a path that contains each edge exactly once?
(Eulerian path)
Task B: write an “efficient” algorithm that indicates if a graph G has a path visiting each edge
exactly once
Algorithm: Check whether G is connected (by breadth-first search) and every node has even
degree.
Task B 0 : write an “efficient” algorithm that indicates if a graph G has a path visiting each
edge node exactly once
Fact: Task B 0 is impossible (assuming P 6= NP).
Counting
A regular expression (regex) is an expression that matches some strings and not others. For example
(0(0 ∪ 1 ∪ 2)∗ ) ∪ ((0 ∪ 2)∗ 1)
1
We could imagine trying “all” possible integer solutions, but if there is no integer solution, then we will be trying
forever and the algorithm will not halt.
0.1. AUTOMATA, COMPUTABILITY, AND COMPLEXITY
vii
Figure 1: “Königsberg graph”. Licensed under CC BY-SA 3.0 via Commons – https://upload.
wikimedia.org/wikipedia/commons/5/5d/Konigsberg_bridges.png
matches any string of digits that starts with a 0, followed by any number of 0’s, 1’s, and 2’s, or
ends with a 1, preceded by any number of 0’s and 2’s.
If x is a binary string and a a symbol, let #a (x) be the number of a’s in x. For example,
#0 (001200121) = 4 and #1 (001200121) = 3.
Task C: write a regex that matches a ternary string x exactly when #0 (x) + #1 (x) = 3.
Answer: 2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗ (0 ∪ 1)2∗
Task C 0 : write a regex that matches a ternary string x exactly when #0 (x) − #1 (x) = 3.
Fact: Task C 0 is impossible
Rough layout of this course
The following are the units of this course:
Computability theory (unit 2): What problems can algorithms solve? (finding real roots of
multivariate polynomials, but not integer roots)
Complexity theory (unit 3): What problems can algorithms solve efficiently? (finding paths
visiting every edge, but not every node)
Automata theory (unit 1): What problems can algorithms solve with “optimal” efficiency (constant space and “real time”, i.e., time = size of input)? (finding whether the sum of # of 0’s
and # of 1’s equals some constant, but not the difference)
Sorting these by increasing order of the power of the algorithms studied: 1, 3, 2.
viii
INTRODUCTION
Historically, these were discovered in the order 2, 1, 3.
It is most common in a theory course to cover them in order 1, 2, 3.
We are going to cover them in order 1, first part of 2, 3, rest of 2.
The reasoning is this: Most of computer science is about writing programs to solve problems.
You think about a problem, you write a program, and you demonstrate that the program solves
the problem.
Now, in computability theory, most of the problems studied are questions about the behavior
of programs themselves.2 But what sort of objects might solve these problems? Other programs!
So we imagine writing a program P that takes as input the source code of another program Q.
Worse yet, the typical case involves proving that a certain problem is not solvable by any program.
Usually this done by writing a program P , which takes as input another program Q, and P outputs
a third program R.3 So in the interest of showing no program exists to solve a certain problem,
we introduce not one but three new programs, all of which are not solving that problem. It’s quite
remarkable that anyone was able to create such a chain of reasoning in the first place to prove limits
on the ability of programs, but at the same time, it adds many conceptual layers of abstraction
onto what most computer science students are used to doing.
Complexity theory, on the other hand, has the virtue that the most typically studied problems
are about more familiar objects such as lists of integers, graphs, and Boolean formulas. In complexity theory, we think about programs that take these objects as input and produce them as
output, and there is no danger of getting confused about which program is supposed to solve which
problem, because there’s just one program.
0.2
Mathematical Background
Reading assignment: Chapter 0 of Sipser.
4
Implication Statements
Given two boolean statements p and q 5 , the implication p =⇒ q is shorthand for “p implies q”, or
“If p is true, then q is true” 6 , p is the hypothesis, and q is the conclusion. The following statements
are related to p =⇒ q:
• the inverse: ¬p =⇒ ¬q
• the converse: q =⇒ p
2
The Diophantine equation problem above is a notable exception, but it took 70 years to prove it is unsolvable,
and even then it was done by creating a Diophantine equation that essentially “mimics” the behavior of a program
so as to connect the existence of its roots to a question about the behavior of certain programs.
3
Now, this sort of idea, of programs receiving and producing other programs, is not crazy in principle. The
C compiler, for example, is itself a program, which takes as input the source code of a program (written in C)
and outputs the code of another program (machine instructions, written in the “language” of the host machine’s
instruction set). The C preprocessor takes as input C programs and produces C programs as output.
4
This is largely material from ECS 20.
5
e.g., “Hawaii is west of California”, or “The stoplight is green.”
6
e.g., “If the stoplight is green, then my car can go.”
0.2. MATHEMATICAL BACKGROUND
ix
• the contrapositive: ¬q =⇒ ¬p7
If an implication statement p =⇒ q and its converse q =⇒ p are both true, then we say p if and
only if (iff) q, written p ⇐⇒ q. Proving a “p ⇐⇒ q” theorem usually involves proving p =⇒ q
and q =⇒ p separately.
Sets
A set is a group of objects, called elements, with no duplicates.8 The cardinality of a set A is the
number of elements it contains, written |A|. For example, {7, 21, 57} is the set consisting of the
integers 7, 21, and 57, with cardinality 3.
For two sets A and B, we write A ⊆ B, and say that A is a subset of B, if every element of A
is also an element of B. A is a proper subset of B, written A B, if A ⊆ B and A 6= B.
We use the following sets throughout the course
• the natural numbers N = {0, 1, 2, . . .}
• the integers Z = {. . . , −2, −1, 0, 1, 2, . . .}
p • the rational numbers Q =
p ∈ Z, q ∈ N, q 6= 0
q • the real numbers R
The unique set with no elements is called the empty set, written ∅.
To define sets symbolically,9 we use set-builder notation: for instance, { x ∈ N | x is odd } is
the set of all odd natural numbers.
We write ∀x ∈ A as a shorthand for “for all elements x in the set A ...”, and ∃x ∈ A as a
shorthand for “there exists an element x in the set A ...”. For example, (∃n ∈ N) n > 10 means
“there exists a natural number greater than 10”.
Given two sets A and B, A ∪ B = { x | x ∈ A or x ∈ B } is the union of A and B, A ∩ B =
{ x | x ∈ A and x ∈ B } is the intersection of A and B, and A \ B = { x ∈ A | x 6∈ B } is the
difference between A and B (also written A − B). A = { x | x 6∈ A } is the complement of A. 10
7
The contrapositive of a statement is logically equivalent to the statement itself. For example, it is equivalent to
state “If someone is allowed to drink alcohol, then they are at least 21” and “If someone is under 21, then they are
not allowed drink alcohol”. Hence a statement’s converse and inverse are logically equivalent to each other, though
not equivalent to the statement itself.
8
Think of std::set.
9
In other words, to express them without listing all of their elements explicitly, which is convenient for large finite
sets and necessary for infinite sets.
10
Usually, if A is understood to be a subset of some larger set U , the “universe” of possible elements, then A is
understood to be U \ A. For example if we are dealing only with N, and A ⊆ N, then A = { n ∈ N | n 6∈ A }. In
other words, we used “typed” sets, in which case each set we use has some unique superset – such as {0, 1}∗ , N, R, Q,
the set of all finite automata, etc. – that is considered to contain all the elements of the same type as the elements of
the set we are discussing. Otherwise, we would have the awkward situation that for A ⊆ N, A would contain not only
nonnegative integers that are not in A, but also negative integers, real numbers, strings, functions, stuffed animals,
and other objects that are not elements of A.
x
INTRODUCTION
Given a set A, P(A) = { S | S ⊆ A } is the power set of A, the set of all subsets of A. For
example,
P({2, 3, 5}) = {∅, {2}, {3}, {5}, {2, 3}, {2, 5}, {3, 5}, {2, 3, 5}}.
Given any set A, it always holds that ∅, A ∈ P(A), and that |P(A)| = 2|A| if |A| < ∞.
0.2.1
11 12
Sequences and Tuples
A sequence is an ordered list of objects 13 . For example, (7, 21, 57, 21) is the sequence of integers
7, then 21, then 57, then 21.
A tuple is a finite sequence.14 (7, 21, 57) is a 3-tuple. A 2-tuple is called a pair.
For two sets A and B, the cross product of A and B
Skis A i× B = { (a, b) | a ∈ A and b ∈ B }.
≤k =
and
A
For k ∈ N, we write Ak = |A × A ×
.
.
.
×
A
i=0 A .
{z
}
k times
For example, N2 = N × N is the set of all ordered pairs of natural numbers.
Functions and Relations
A function f that takes an input from set D (the domain) and produces an output in set R (the
range) is written f : D → R. 15 Given A ⊆ D, define f (A) = { f (x) | x ∈ A }; call this the image
of A under f .
Given f : D → D, k ∈ N and d ∈ D, define f k : D → D by f k (d) = f (f (. . . f ( d)) . . .)) to be f
| {z }
k times
composed with itself k times.
If f might not be defined for some values in the domain, we say f is a partial function.16 If f
is defined on all values, it is a total function.17
11
Why?
Actually, Cantor’s theory of infinite set cardinalities makes sense of the claim that |P(A)| = 2|A| even if A is an
infinite set. The furthest we will study this theory in this course is to observe that there are at least two infinite set
cardinalities: that of the set of natural numbers, and that of the set of real numbers, which is bigger than the set of
natural numbers according to this theory.
13
Think of std::vector.
14
The closest Java analogy to a tuple, as we will use them in this course, is an object. Each member variable of an
object is like an element of the tuple, although Java is different in that each member variable of an object has a name,
whereas the only way to distinguish one element of a tuple from another is their position. But when we use tuples,
for instance to define a finite automaton as a 5-tuple, we intuitively think of the 5 elements as being like 5 member
variables that would be used to define a finite automaton object. And of course, the natural way to implement such
an object in C++ by defining a FiniteAutomaton class with 5 member variables, which is an easier way to keep track
of what each of the 5 elements is supposed to represent than, for instance, using an void[] array of length 5.
15
Think of a static method; Integer.parseInt, which takes a String and returns the int that the String represents
(if it indeed represents an integer) is like a function with domain String and range int. Math.max is like a function
with domain int × int (since it accepts a pair of ints as input) and range int.
16
For instance, Integer.parseInt is (strictly) partial, because not all Strings look like integers, and such Strings
will cause the method to throw a NumberFormatException.
17
Every total function is a partial function, but the converse does not hold for any function that is undefined for
at least one value. We will usually assume that functions are total unless explicitly stated otherwise.
12
0.2. MATHEMATICAL BACKGROUND
xi
A function f with a finite domain can be represented with a table. For example, the function
f : {0, 1, 2, 3} → Q defined by f (n) = n2 is represented by the table
n f (n)
0
0
1
1
2
2
1
3
3
2
If
(∀d1 , d2 ∈ D) d1 6= d2 =⇒ f (d1 ) 6= f (d2 ),
then we say f is 1-1 (one-to-one or injective).18
If
(∀r ∈ R)(∃d ∈ D) f (d) = r,
then we say f is onto (surjective). Intuitively, f “covers” the range R, in the sense that no element
of R is left un-mapped-to by f .
f is a bijection (a.k.a. a 1-1 correspondence) if f is both 1-1 and onto.
A predicate is a function whose output is boolean.
Given a set A, a relation R on A is a subset of A × A. Intuitively, the elements in R are the
ones related to each other. Relations are often written with an operator; for instance, the relation
≤ on N is the set R = { (n, m) ∈ N × N | (∃k ∈ N) n + k = m }.
Graphs
See the textbook for review.
Boolean Logic
See the textbook for review.
LECTURE: end of day 1
Strings and Languages
An alphabet is any non-empty finite set, whose elements we call symbols or characters. For example,
{0, 1} is the binary alphabet, and
{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z}
18
Intuitively, f does not map any two points in D to the same point in R. It does not lose information; knowing
an output r ∈ R suffices to identify the input d ∈ D that produced it (through f ).
xii
INTRODUCTION
is the Roman alphabet. 19
A string over an alphabet is a finite sequence of symbols taken from the alphabet. We write
strings such as 010001, without the parentheses and commas. If x is a string, |x| denotes the length
of x.
If Σ is an alphabet, the set of all strings over Σ is denoted Σ∗ . For n ∈ N, Σn = { x ∈ Σ∗ | |x| = n }
is the number of strings in Σ∗ of length n. Similarly Σ≤n = { x ∈ Σ∗ | |x| ≤ n } and Σ<n =
{ x ∈ Σ∗ | |x| < n }.
The string of length 0 is written ε, and in some textbooks, λ. In most programming languages
it is written "".
Note in particular the difference between ε, ∅, and {ε}.20
Given n, m ∈ N, x[n . . m] is the substring consisting of the nth through mth symbols of x, and
x[n] = x[n . . n] is the nth symbol in x.
We write xy (or x◦y when we would like an explicit operator symbol) to denote the concatenation
of x and y, and given k ∈ N, we write xk = xx
. . . x}. 21
| {z
k times
Given two strings x, y ∈ Σ∗ for some alphabet Σ, x is a prefix of y, written x v y, if x is a
substring that occurs at the start of y. x is a suffix of y if x is a substring that occurs at the end
of y.
The length-lexicographic ordering (a.k.a. military ordering) of strings over an alphabet is the
standard dictionary ordering, except that shorter strings precede longer strings.22 For example,
the length-lexicographical ordering of {0, 1}∗ is
ε, 0, 1, 00, 01, 10, 11, 000, 001, 010, 011, 100, 101, 110, 111, 0000, 0001, . . .
A language (a.k.a. a decision problem) is a set of strings. A class is a set of languages. 23
Given two languages A, B ⊆ Σ∗ , let AB = { ab | a ∈ A and b ∈ B } (also denoted A ◦ B).24
19
We always let the symbols in the alphabet have single-character names.
ε is a string, ∅ is a set with no elements, and {ε} is a set with one element. Intuitively, think of the following
Java code as defining these three objects
String epsilon = "";
Set empty set = new HashSet();
Set<String> set with epsilon = new HashSet<String>();
set with epsilon.add(lambda);
In C++, this corresponds to (assuming we’ve imported everything from the standard library std)
string epsilon("");
set empty set;
set<string> set with epsilon;
set with epsilon.insert(epsilon);
21
Alternatively, define xk inductively as x0 = ε and xk = xxk−1
22
It is also common to call this simply the “lexicographical ordering”. However, this term is ambiguous and is also
used to mean the standard alphabetical ordering. In length-lexicographical order, 1 < 00, but in alphabetical order,
00 < 1. We use the term length-lexicographical to avoid confusion.
23
These terms are useful because, without them, we would just call everything a “set”, and easily forget whether
it is a set of strings, a set of sets of strings, or even the dreaded set of sets of sets of strings (they are out there; the
arithmetical and polynomial hierarchies are sets of classes).
24
The set of all strings formed by concatenating one string from A to one string from B
20
0.3. PROOF BY INDUCTION
xiii
Similarly, for all n ∈ N, An = AA
. . . A},25 , A≤n = A0 ∪ A1 ∪ . . . ∪ An and A<n = A≤n \ An .26
| {z
n times
S
n 27 Note that A = A1 (hence A ⊆ A∗ ).
Given a language A, let A∗ = ∞
n=0 A .
Examples.
Then
Define the languages A, B ⊆ {0, 1, 2}∗ as follows: A = {0, 11, 222} and B = {000, 11, 2}.
AB = {0000, 011, 02, 11000, 1111, 112, 222000, 22211, 2222}
A2 = {00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222}
A∗ = {|{z}
ε , 0, 11, 222, 00, 011, 0222, 110, 1111, 11222, 2220, 22211, 222222, 000, 0011, . . .}
| {z } |
{z
} | {z }
A0
0.3
A1
A2
part of A3
Proof by Induction
28
0.3.1
Proof by Induction on Natural Numbers
Theorem 0.3.1. For every n ∈ N, |{0, 1}n | = 2n .
Proof. (by induction on n)
29
Base case: {0, 1}0 = {ε}.30 |{ε}| = 1 = 20 , so the base case holds.
Inductive case: Assume |{0, 1}n−1 | = 2n−1 .31 We must prove that |{0, 1}n | = 2n . Note that
every x ∈ {0, 1}n−1 appears as a prefix of exactly two unique strings in {0, 1}n , namely x0
and x1.32 Then
|{0, 1}n | =2 · |{0, 1}n−1 |
=2 · 2n−1
inductive hypothesis
n
=2 .
25
The set of all strings formed by concatenating n strings from A.
Note that there is ambiguity, since An could also mean the set of all n-tuples of strings from A, which is a
different set. We assume that An means n-fold concatenation whenever A is a language. The difference is that in
concatenation of strings, boundaries between strings are lost, whereas tuples always have the various elements of the
tuple delimited explicitly from the others.
27
The set of all strings formed by concatenating 0 or more strings from A.
28
The book discusses proof by construction and proof by contradiction as two alternate proof techniques. These
are both simply formalizations of the way humans naturally think about ordinary statements and reasoning. Proof by
induction is the only general technique among the three that really is a technique that must be taught, rather than a
name for something humans already intuitively understand. Luckily, you already understand proof by induction better
than most people, since it is merely the “proof” version of the technique of recursion you learned in programming
courses.
29
To start, state in English what the theorem is saying: For every string length n, there are 2n strings of length n.
30
Note that {0, 1}0 is not ∅; there is always one string of length 0, so the set of such strings is not empty.
31
Call this the inductive hypothesis, the fact we get to assume is true in proving the inductive case.
32
The fact that they are unique means that if we count two strings in {0, 1}n for every one string in {0, 1}n−1 , we
won’t double-count any strings. Hence |{0, 1}n | = 2 · |{0, 1}n−1 |
26
xiv
INTRODUCTION
P
1
Theorem 0.3.2. For every n ∈ Z+ , ni=1 i(i+1)
=
Pn
1
1
Proof. Base case (n = 1):
i=1 i(i+1) = 1(1+1) =
n
n+1 .
1
2
=
n
n+1 ,
so the base case holds.
Inductive case: Let n ∈ Z+ and suppose the theorem holds for n. Then
n+1
X
i=1
n
X
1
1
1
=
+
i(i + 1) (n + 1)(n + 2)
i(i + 1)
pull out last term
i=1
1
n
+
(n + 1)(n + 2) n + 1
1 + n(n + 2)
=
(n + 1)(n + 2)
n2 + 2n + 1
=
(n + 1)(n + 2)
(n + 1)2
=
(n + 1)(n + 2)
n+1
,
=
n+2
=
inductive hypothesis
so the inductive case holds.
0.3.2
Induction on Other Structures
33
Here is an inductive definition of the number of 0’s in a binary string x, denoted #0 (x).34

if x = ε;
(base case)
 0,
∗
#0 (w) + 1, if x = w0 for some w ∈ {0, 1} ; (inductive case)
#0 (x) =

#0 (w),
if x = w1 for some w ∈ {0, 1}∗ . (inductive case)
To prove a theorem by induction, identify the base case as the “smallest” object35 for which
the theorem holds.36
33
Induction is often taught as something that applies only to natural numbers, but one can write recursive algorithms that operate on data structures other than natural numbers. Similarly, it is possible to prove something by
induction on something other than a natural number.
34
We will do lots of proofs involving induction on strings, but for now we will just give an inductive definition. Get
used to breaking down strings and other structures in this way.
35
In the case of strings, this is the empty string. In the case of trees, this could be the empty tree, or the tree with
just one node: the root (just like with natural numbers, the base case might be 0 or 1, depending on the theorem).
36
The inductive step should then employ the truth of the theorem on some “smaller” object than the target object.
In the case of strings, this is typically a substring, often a prefix, of the target string. In the case of trees, a subtree,
typically a subtree of the root. Using smaller subtrees than the immediate subtrees of the root, or shorter substrings
than a one-bit-shorter prefix, is like using a number smaller than n − 1 to prove the inductive case for n; this is the
difference between weak induction (using the truth of the theorem on n − 1 to prove it for n) and strong induction
(using the truth of the theorem on all m < n to prove it for n)
0.3. PROOF BY INDUCTION
xv
Theorem 0.3.3. Every binary tree T of depth d has at most 2d leaves.
Proof. (by induction on a binary tree T ) For T a tree, let d(T ) be the depth of T , and l(T ) the
number of leaves in T .
Base case: Let T be the tree with one node. Then d(T ) = 0, and 20 = 1 = l(T ).
Inductive case: Let T ’s root have subtrees T0 and T1 , at least one of them non-empty. If only
one is non-empty (say Ti ), then
l(T ) =l(Ti )
≤2d(Ti )
=2
inductive hypothesis
d(T )−1
definition of depth
<2d(T ) .
If both subtrees are non-empty, then
l(T ) =l(T0 ) + l(T1 )
≤2d(T0 ) + 2d(T1 )
d(T0 )
≤ max{2
+2
ind. hyp.
d(T0 )
d(T1 )
,2
+2
d(T1 )
}
= max{2d(T0 )+1 , 2d(T1 )+1 }
=2max{d(T0 )+1,d(T1 )+1}
2n is monotone increasing
=2d(T ) .
definition of depth
xvi
INTRODUCTION
Chapter 1
Regular Languages
1.1
Finite Automata
door
front
pad
REAR,
BOTH,
NEITHER
FRONT,
REAR,
BOTH
FRONT
rear
pad
CLOSED
OPEN
NEITHER
state
CLOSED
OPEN
NEITHER
CLOSED
CLOSED
input signal
FRONT
REAR
OPEN CLOSED
OPEN
OPEN
BOTH
CLOSED
OPEN
Figure 1.1: left: Top view of an automatic door. right: State diagram for automatic door
controller. bottom: State transition table giving same information as state diagram.
Reading assignment: Section 1.1 in Sipser.
Suppose we have an automatic door for entering a building, with sensors for detecting if a
person is on the pad in front of the door, or the pad in the rear of the door, or both, or neither. It
swings open to the inside and would hit a person on the rear pad. So if a person is on the rear pad
and the door is closed, it should remain closed no matter what. If the rear pad is free, and there’s
a person on the front pad, then it should open if closed, or stay open if already open. Finally, if
there’s no one on either pad, then the door should close if it is opened, and remain closed if already
closed.
The door has two states, OPEN and CLOSED, and sensors to indicate one of four possible
scenarios. We have just described rules for handling any situation it encounters. We can write these
rules in a state diagram, shown in Figure 1.1. The state diagram can be equivalently represented
by a state transition table, also shown in Figure 1.1.
1
2
CHAPTER 1. REGULAR LANGUAGES
See Figure 1.2, a state diagram of a finite
automaton M1 .
• states: q1 , q2 , and q3
• start state: q1
0
q1
1
1
q2
0
0,1
q3
• accept state: q2
• input symbols: 0, 1
Figure 1.2: A finite automaton M1 with three
• transitions: arrows labeled with input states.
symbols
M1 accepts or rejects based on the state it is in after reading the input.1
If we give the input string 1101 to M1 , the following happens.
1. Start in state q1
2. Read 1, follow transition from q1 to q2
3. Read 1, follow transition from q2 to q2
4. Read 0, follow transition from q2 to q3
5. Read 1, follow transition from q3 to q2
6. Accept the input because M1 is in state q2 at the end of the input.
M1 also accepts 1, 01, and 010101. In fact all transitions labeled 1 go to q2 , so it accepts any
string ending in a 1.
Is that all the strings it accepts?
M1 also accepts 100, 0100, 110000, and any string that ends with an even number of 0’s following
the last 1.
Now that we’ve seen some examples, let’s formally define finite automata. They are a model of
computation; I think in almost any circumstance, you can take the phrase “model of computation”
to be synonymous with programming language. Of course, some of them look much different from
the languages you are used to! But they all involved some way of specifying some list of rules
governing what the program is supposed to do.
There are always two parts to defining any formal model of computation: syntax and semantics.
The syntax of a model says what an instance of the model is. The semantics of a model says what
an instance of the model does.
For example, I can say that a C++ program is any ASCII string that the gcc compiler can
compile with no errors: that’s the definition of the syntax of C++.2 But knowing that gcc
1
One way to interpret M1 is as an implementation of a programming language routine that takes a binary string
as input and returns a Boolean output.
2
Of course, that doesn’t do much to help you write syntactically valid C++ programs. Another more useful way
to define it is that it is any ASCII string that is produced by the C++ grammar (http://www.nongnu.org/hcb/).
We will cover grammars briefly later in the course, and you will understand better what it means for a string to be
produced by a grammar.
1.1. FINITE AUTOMATA
3
successfully compiled a program does little to help you understand what will happen when you
actually run the program. The definition of the semantics of C++ are more involved, telling you
what each line of code does: what effect does it have on the memory of the computer?
And C++ semantics are complex indeed. For example, a = b+c; has the following semantics:
add the integer stored in b to the integer stored in c and store the result in a... unless one of b
or c is a float or a double, in which case use floating point arithmetic instead, and if the other
argument is an int it is first converted to a float... unless b is actually an instance of a class that
overloads the + operator, in which case call the member function associated with that operator...
Hopefully this makes it clear why, to learn how to reason formally about computer programs,
we start with a much simpler model of computation than C++ or Python, one whose syntax and
semantics definitions fit on a page.
1.1.1
Formal Definition of a Finite Automaton (Syntax)
To describe how a finite automaton transitions between states, we introduce a transition function
δ. The goal is to express that if an automaton is in state q, and it reads the symbol 1 (for example),
and transitions to state q 0 , then this means δ(q, 1) = q 0 .
Definition 1.1.1. A (deterministic) finite automaton (DFA) is a 5-tuple (Q, Σ, δ, s, F ), where
• Q is a non-empty, finite set of states,
• Σ is the input alphabet,
• δ : Q × Σ → Q is the transition function,
• s ∈ Q is the start state, and
• F ⊆ Q is the set of accepting states.
LECTURE: end of day 2
For example, the DFA M1 of Figure 1.2 is defined M1 = (Q, Σ, δ, s, F ), where
• Q = {q1 , q2 , q3 },
• Σ = {0, 1},
• δ is defined
δ(q1 , 0) = q1 ,
δ(q1 , 1) = q2 ,
δ(q2 , 0) = q3 ,
δ(q2 , 1) = q2 ,
δ(q3 , 0) = q2 ,
δ(q3 , 1) = q2 ,
4
CHAPTER 1. REGULAR LANGUAGES
or more succinctly, we represent δ by the transition table
δ 0 1
q1 q1 q2
q2 q3 q2
q3 q2 q2
• q1 is the start state, and
• F = {q2 }.3
If M = (Q, Σ, δ, s, F ) is a DFA, how many transitions are in δ; in other words, how many entries
are in the transition table?
If A ⊆ Σ∗ is the set of all strings that M accepts, we say that M recognizes (accepts,decides)
A, and we write L(M ) = A.
Every DFA recognizes a language. If it accepts no strings, what language does it recognize? ∅
M1 recognizes the language
∗ w contains at least one 1 and an
L(M1 ) = w ∈ {0, 1} even number of 0’s follow the last 1
0
q1
1
0
1
q2
Figure 1.3: A DFA M2 .
Example 1.1.2. See Figure 1.3.
Formally, M2 = (Q, Σ, δ, q1 , F ), where Q = {q1 , q2 }, Σ = {0, 1}, F = {q2 }), and δ is defined
δ 0 1
q1 q1 q2
q2 q1 q2
What does this DFA do on input 1101?
L(M2 ) = { w ∈ {0, 1}∗ | w ends in a 1 } = {1, 01, 11, 001, 011, 101, 111, 0001, . . .}
Example 1.1.3. See Figure 1.4. L(M3 ) = { w ∈ {0, 1}∗ | w does not end in a 1 }
Example 1.1.4. See Figure 1.5. L(M4 ) = { w ∈ {a, b}∗ | |w| > 0 and w[1] = w[|w|] }
= {a, b, aa, bb, aaa, aba, bab, bbb, aaaa, . . .}
Show DFA simulator and file format.
3
The diagram and this formal description contain exactly the same information. The diagram is easy for humans
to read, and the formal description is easy to work with mathematically, and to program.
1.1. FINITE AUTOMATA
5
0
q1
1
1
q2
0
Figure 1.4: A DFA M3 .
a
b
q1
a
a
b
s
r1
b
a
b
b
r2
q2
a
Figure 1.5: A DFA M4 .
1.1.2
More Examples
a
0
a
1
a
2
Figure 1.6: A DFA recognizing whether its input string’s length is a multiple of 3.
Example 1.1.5. Design a DFA that recognizes the language
3n a n ∈ N = { w ∈ {a}∗ | |w| is a multiple of 3 } = {ε, aaa, aaaaaa, aaaaaaaaa, . . .}.
See Figure 1.6.
Example 1.1.6. Design a DFA that recognizes the language
3n+2 n ∈ N = { w ∈ {a}∗ | |w| ≡ 2 mod 3 } = {aa, aaaaa, aaaaaaaa, . . .}.
a
Just switch the accept state in Figure 1.6 from 0 to 2.
Example 1.1.7. Design a DFA that recognizes the language
{ w ∈ {0, 1}∗ | w represents a multiple of 2 in binary } .
6
CHAPTER 1. REGULAR LANGUAGES
For example it should accept 0, 1010, and 11100 and reject 1, 101, and 111.
This is actually M3 from Figure 1.4.
1
0
0
0
1
2
1
0
1
0
1
1
s
0
0
0*
0
0,1
1
1
2
0
1
r
0,1
Figure 1.7: Design of a DFA to recognize multiples of 3 in binary. If a natural number n has
b ∈ {0, 1} appended to its binary expansion, then this number is 2n + b. Top: a DFA with
transitions to implement “if the state k means that n ≡ k mod 3, after reading bit b, replacing
n with 2n + b, we change to state k 0 because 2n + b ≡ k 0 mod 3”. This assumes ε represents 0,
and that leading 0’s are allowed, e.g., 5 is represented by 101, 0101, 00101, 000101, . . . Bottom: a
modification of top DFA to reject ε and any positive integers with leading 0’s.
Example 1.1.8. Design a DFA that recognizes the language
{ w ∈ {0, 1}∗ | w represents a multiple of 3 in binary } .
See Figure 1.7.
Intuitively, we can keep track of whether the bits read so far represent an integer n ∈ N that is
of the form n = 3k for some k ∈ N (i.e., n is a multiple of 3, or n ≡ 0 mod 3), n = 3k + 1 (n ≡ 1
mod 3), or n = 3k + 2 (n ≡ 2 mod 3). Appending the bit 0 to the end of n’s binary expansion
multiplies n by 2, resulting in 2n, whereas appending the bit 1 results in 2n + 1.
For example, consider appending a 1 to a multiple of 3. Since n = 3k, then 2n + 1 = 2(3k) + 1 =
3(2k) + 1 ≡ 1 mod 3, so the transition function δ obeys δ(0, 1) = 1. Consider appending a 0 to n
where n ≡ 2 mod 3. Since n = 3k + 2, then 2n = 2(3k + 2) = 6k + 4 = 3(2(k + 1)) + 1, so 2n ≡ 1
mod 3, so δ(2, 0) = 1. The other four cases are similar.
1.1.3
Formal Definition of Computation by a DFA (Semantics)
Sipser defines DFA acceptance this way: We say M = (Q, Σ, δ, s, F ) accepts a string x ∈ Σn if there
is a sequence of states q0 , q1 , q2 , . . . , qn ∈ Q such that
1.1. FINITE AUTOMATA
7
1. q0 = s,
2. δ(qi , x[i + 1]) = qi+1 for all i ∈ {0, 1, . . . , n − 1}, and
3. qn ∈ F .
Otherwise, M rejects x.
For example, for the DFA in Figure 1.6, for the input x = aaaaaa, the sequence of states
q0 , q1 , q2 , q3 , q4 , q5 , q6 testifying that the DFA accepts x is 0, 1, 2, 0, 1, 2, 0.
Define the language recognized by M as L(M ) = { x ∈ Σ∗ | M accepts x } . We say a language
L is regular if some DFA recognizes it.
1.1.4
The Regular Operations
So far we have read and “programmed” DFAs that recognize particular regular languages. We now
study fundamental properties shared by all regular languages.
Example 1.1.9. Design a DFA M = (Q, {a}, δ, s, F ) to recognize the language {an | n ≡ 2
mod 3 or n ≡ 1 mod 5 or n ≡ 4 mod 3}.
• Q = { (i, j) | 0 ≤ i < 3, 0 ≤ j < 5 }
• s = (0, 0)
• δ((i, j), a) = (i + 1 mod 3, j + 1 mod 5)
• F = { (i, j) | i = 2 or j = 1 or j = 4 }
See Figure 1.8.
M is essentially simulating two DFAs at once: one that computes congruence mod 3 and the
other that computes congruence mod 5. We call this general idea of one DFA simultaneously
simulating two others, by having two parts of its state set, one to remember the state of one DFA
and the other to remember the state of the other DFA, the product construction, because the larger
DFA’s state set is the cross product of the smaller two.
LECTURE: end of day 3
For an example of the product construction on DFAs with a larger alphabet, as well as a different
way to visualize the product construction, see Figure 1.9.
We now generalize this idea.
Recall the following operations.
Definition 1.1.10. Let A and B be languages.
Union: A ∪ B = { x | x ∈ A or x ∈ B }4
4
The normal union operation that makes sense for any two sets, which happen to be languages in this case
8
CHAPTER 1. REGULAR LANGUAGES
0
a
1
a
2
a
3
a
4
a
0
(0,0)
(0,1)
(0,2)
(0,3)
(0,4)
(1,0)
(1,1)
(1,2)
(1,3)
(1,4)
(2,0)
(2,1)
(2,2)
(2,3)
(2,4)
a
1
a
a
2
Figure 1.8: Bottom left: a DFA recognizing the language L3 = {an | n ≡ 2 mod 3}. Top right:
a DFA recognizing the language L5 = {an | n ≡ 1 mod 5 or n ≡ 4 mod 5}. Bottom right: a
DFA recognizing the language L3 ∪ L5 . For readability, transitions are not labeled, but they should
all be labeled with a. The accept states are row 2 and columns 1 and 4. To recognize L3 ∩ L5 , we
choose the accept states to be {(2, 1), (2, 4)} instead, i.e., where row 2 meets columns 1 and 4.
Concatenation: AB = { xy | x ∈ A and y ∈ B }5 (also written A ◦ B)
S
n
6
(Kleene) Star: A∗ = ∞
n=0 A = { x1 x2 . . . xk | k ∈ N and x1 , x2 , . . . , xk ∈ A }
Each is an operator on one or two languages, with another language as output.
The regular languages are closed under each of these operations.7 This means that if A, B
are regular, A ∪ B, A ◦ B, and A∗ are all regular also.8 The regular languages are also closed
under complementation (this is easier to show) and intersection (which follows from closure under
complementation and intersection by DeMorgan’s laws).
The proof that the regular languages are closed under union generalizes the construction of the
“divisible by 3 or 5” DFA above: we show how to simulate two DFAs (that recognize the languages
A and B) with a third DFA (that recognizes A ∪ B).
Theorem 1.1.11. The class of regular languages is closed under ∪.
5
The set of all strings formed by concatenating one string from A to one string from B; only makes sense for
languages because not all types of objects can be concatenated
6
The set of all strings formed by concatenating 0 or more strings from A. Just like with Σ∗ , except now whole
strings may be concatenated instead of individual symbols.
7
Note that this is not the same as the property of a subset A ⊆ Rd being closed in the sense that it contains all
of its limit points. In particular, one typically does not talk about any class of languages being merely “closed,” but
rather, closed with respect to a certain operation such as union, concatenation, or Kleene star. The usage of the word
“closed” in real analysis can be seen as a special case of being closed with respect to the operation of taking limits
of infinite sequences from the set.
8
The intuition for the terminology is that if you think of the class of regular languages as being like a room,
and doing an operation as being like moving from one language (or languages) to another, then moving via union,
concatenation, or star won’t get you out of the room; the room is “closed” with respect to moving via those operations.
1.1. FINITE AUTOMATA
M1
q1
1
q2
0 0
0
1
DFA obtained by product construction on M1, M2
recognizing L(M1) ∪ L(M2)
1
1
1
q3
M2
9
q1,r1
0
q1,r2
q2,r1
0
0
1
r1
0
1
q3,r1
0
q2,r2
0
0
1
r2
1
1
1
0
q3,r2
Figure 1.9: The product construction applied to the DFAs M1 and M2 from Figures 1.2 and 1.3, to
obtain a DFA M with L(M ) = L(M1 ) ∪ L(M2 ). We can think of the states of M as being grouped
into three groups representing q1 , q2 , and q3 respectively, and within each of those groups, there’s
a state representing r1 and a state representing r2 . In this figure, accept states have bold outlines.
To recognize L(M1 ) ∩ L(M2 ) instead, we should choose only (q2 , r2 ) to be the accept state.
9
Proof. (Product Construction)10
Let M1 = (Q1 , Σ, δ1 , s1 , F1 ) and M2 = (Q2 , Σ, δ2 , s2 , F2 ) be DFAs. We construct the DFA
M = (Q, Σ, δ, s, F ) to recognize L(M1 ) ∪ L(M2 ) by simulating both M1 and M2 in parallel, where
• Q keeps track of the states of both M1 and M2 :
Q = Q1 × Q2 (= { (r1 , r2 ) | r1 ∈ Q1 and r2 ∈ Q2 })
• δ simulates moving both M1 and M2 one step forward in response to the input symbol. Define
δ for all (r1 , r2 ) ∈ Q and all b ∈ Σ as
δ( (r1 , r2 ) , b ) = ( δ1 (r1 , b) , δ2 (r2 , b) )
9
Proof Idea: (The Product Construction)
We must show that if A1 and A2 are regular, then so is A1 ∪ A2 . Since A1 and A2 are regular, some DFA M1
recognizes A1 , and some DFA M2 recognizes A2 . It suffices to show that some DFA M recognizes A1 ∪ A2 ; i.e., it
accepts a string x if and only if at least one of M1 or M2 accepts x.
M will simulate M1 and M2 . If either accepts the input string, then so will M . M must simulate them simultaneously, because if it tried to simulate M1 , then M2 , it could not remember the input to supply it to M2
10
The name comes from the cross product used to define the state set.
10
CHAPTER 1. REGULAR LANGUAGES
11
• s ensures both M1 and M2 start in their respective start states:
s = (s1 , s2 )
• F must be accepting exactly when either one, or the other, or both, of M1 and M2 are in an
accepting state:
F = { (r1 , r2 ) | r1 ∈ F1 or r2 ∈ F2 } .
12
Then, after reading an input string x that puts M1 in state r1 and M2 in state r2 , we have that the
state q = (r1 , r2 ) is in F if and only if r1 ∈ F1 or r2 ∈ F2 , which is true if and only if x ∈ L(M1 ) or
x ∈ L(M2 ), i.e., x ∈ L(M1 ) ∪ L(M2 ).
Theorem 1.1.12. The class of regular languages is closed under complement.
Proof Idea:
Swap the accept and reject states.
Theorem 1.1.13. The class of regular languages is closed under ∩.
Proof Idea:
DeMorgan’s Laws.
Proof. Let A, B be regular languages; then
A∩B =A∪B
is the complement of the union of two languages A and B that are the complements of regular
languages. By the union and complement closure properties, this language is also regular.
Alternately, we can modify the proof for ∪ and instead let F = { (r1 , r2 ) | r1 ∈ F1 and r2 ∈ F2 } ,
i.e., F = F1 × F2 . Because of this, we can say the product construction applies to ∩ as well.
We have only proved that the regular languages are closed under a single application of ∪ or
∩ to two regular languages A1 and A2 . But, we can use induction to prove that
Sk they are closed
under finite union and intersection; for instance, if A1 , . . . , Ak are regular, then i=1 Ai is regular,
for any k ∈ N. What about infinite union or intersection?
Theorem 1.1.14. The class of regular languages is closed under ◦.
11
This is difficult to read! Be sure to read it carefully and ensure that the type of each object is what you expect.
Q has states that are pairs of states from Q1 × Q2 . Thus, δ’s first input, a state from Q, is a pair of states (r1 , r2 ),
where r1 ∈ Q1 and r2 ∈ Q2 . Similarly, the output of δ must also be a pair from Q1 × Q2 . However, δ1 ’s first input,
and its output, are just single states from Q1 , and similarly for δ2 .
12
This is not the same as F = F1 × F2 . What would the automaton do in that case?
1.3. REGULAR EXPRESSIONS
11
In other words, we need to construct M so that, if M1 accepts x and M2 accepts y, then M
accepts xy. If we try to reason about this like with the union operation, there is one additional
complication. We are no longer simulating the machines on the same input; rather, x precedes y
in the string xy. We could say, “build M to simulate M1 on x, and after the x portion of xy has
been read, then simulate M2 on y.”
What is wrong with this idea? When x and y are concatenated to form xy, we no longer know
where x ends and y begins. So M would not know when to switch from simulating M1 to simulating
M2 .
The same difficulty arises when trying to show the following theorem:
Theorem 1.1.15. The class of regular languages is closed under ∗ .
Not only would a DFA M for A∗ , simulating a DFA N for A, have to know when to “reset”
N from one of its accept states to its start state, but M would furthermore have to know how
many times to do this: the input string could consist of any number of strings from A concatenated
together.
Both of these theorems are true, but it is not obvious why, certainly not as obvious as with ∪,
∩, and complement.
Rather than begin proving these theorems now, we will switch to studying another very simple
model of computation: regular expressions. Regular expressions have the operations of ◦ and ∗
built right into them, so it is trivial to show how to convert regular expressions for languages A and
B into regular expressions for A ◦ B and A∗ . (∪ is also built in to regular expressions.) However,
in contrast to DFAs, it is quite difficult to convert regular expressions to handle intersection and
complement.
After exploring the power of regular expressions, we will unite the two seemingly different models
of DFAs and regular expressions with a third model: nondeterministic finite automata (NFA), and
we will show that any of these can be converted into any of the others. As a result, they all define
exactly the same class of decision problems: the regular languages.
1.3
Regular Expressions
Reading assignment: Section 1.3 in Sipser.
A regular expression is a pattern that matches some set of strings, used in many programming
languages and applications such as grep. For example, the regular expression
(0 ∪ 1)0∗
matches any string starting with a 0 or 1, followed by zero or more 0’s.13
In fact, with a small substitution of symbols, the above expression is quite literally an expression
of the set of strings it matches. Simply replace each bit with a singleton set: ({0} ∪ {1}){0}∗ . which
equals {0, 1}{0}∗ = {0, 1, 00, 10, 000, 100, 0000, 1000, 00000, 10000 . . .}
The notation (0|1)0∗ is more common, but we use the ∪ symbol in place of | to emphasize the connections to set
theory, and because | looks too much like 1 when hand-written.
13
12
CHAPTER 1. REGULAR LANGUAGES
The next regular expression
(0 ∪ 1)∗
matches any string consisting of zero or more 0’s and 1’s; i.e., any binary string.
Let Σ = {0, 1, 2}. Then
(0Σ∗ ) ∪ (Σ∗ 1)
matches any ternary string that either starts with a 0 or ends with a 1. The above is shorthand for
(0(0 ∪ 1 ∪ 2)∗ ) ∪ ((0 ∪ 1 ∪ 2)∗ 1)
since Σ = {0} ∪ {1} ∪ {2}.
Each regular expression R defines a language L(R).
We now give a formal inductive definition of a regular expression. It has three base cases and
three inductive cases.
1.3.1
Formal Definition of a Regular Expression
Definition 1.3.1. Let Σ be an alphabet.
R is a regular expression (regex) if R is
1. b for some b ∈ Σ, defining the language {b},
2. ε, defining the language {ε},
3. ∅, defining the language {},
4. R1 ∪ R2 , where R1 , R2 are regex’s, defining the language L(R1 ) ∪ L(R2 ),
5. R1 R2 (or R1 ◦ R2 ), where R1 , R2 are regex’s, defining the language L(R1 ) ◦ L(R2 ), or
6. R∗ , where R is a regex, defining the language L(R)∗ .
The operators have precedence ∗ > ◦ > ∪. Parentheses may be used to override this precedence.
So we can thinking of a regular expression as itself being a string, with symbols from some
alphabet Σ, as well as the symbols ( ) * ∪
We sometimes abuse notation and write R to mean L(R), relying on context to interpret the
meaning.
For convenience, define R+ = RR∗ , for each k ∈ N, let Rk = RR
. . . R}, and given an alphabet
| {z
k times
Σ = {a1 , a2 , . . . , ak }, then Σ is shorthand for the regex a1 ∪ a2 ∪ . . . ∪ ak .
LECTURE: end of day 4
1.3. REGULAR EXPRESSIONS
ab*
13
(a ∪ b)(a*b ∪ bbab)
∪
∪
◦
a
◦
*
b
∪
a
∪
b
◦
*
◦
b
b
a
◦
b
◦
a
b
Figure 1.10: Recursive structure of the regular expression ab∗ ∪ (a ∪ b)(a∗ b ∪ bbab) viewed as a tree.
But a more structured way to think of a regular expression, based on the inductive definition,
is as having a tree structure (similar to the parse tree for an arithmetic expression such as (7 + 3) ·
9 + 11 · 5). The base cases are leaves, and the recursive cases are internal nodes, with one child in
the case of the unary operation ∗ and two children in the case of the binary operations ◦ and ∪.
See Figure 1.10 for an example.
Example 1.3.2. Let Σ be an alphabet.
• 0∗ 10∗ = { w | w contains a single 1 }.
• Σ∗ 1Σ∗ = { w | w has at least one 1 }.
• Σ∗ 001Σ∗ = { w | w contains the substring 001 }.
• 1∗ (01+ )∗ = { w | every 0 in w is followed by at least one 1 }.
• (ΣΣ)∗ = { w | |w| is even }.
• 01 ∪ 10 = {01, 10}.
• 0 (0∪1)∗ 0 ∪ 1 (0∪1)∗ 1 ∪ 0 ∪ 1 = { w ∈ {0, 1}∗ | w starts and ends with the same symbol }
• (0 ∪ ε)1∗ = 01∗ ∪ 1∗ .
• (0 ∪ ε)(1 ∪ ε) = {ε, 0, 1, 01}.
• ∅∗ = {ε}.
14
CHAPTER 1. REGULAR LANGUAGES
• R ∪ ∅ = R, where R is any regex.
• Rε = R, where R is any regex.
• R∅ = ∅, where R is any regex.
Example 1.3.3. Design a regex to match C++ float literals (e.g., 2, 3.14, -.02, 0.02, .02, 3.).
Let D = 0 ∪ 1 ∪ . . . ∪ 9 be a regex recognizing a single decimal digit.
(+ ∪ − ∪ ε)(D+ ∪ D+ .D∗ ∪ D∗ .D+ )
Automatic transformation of regex’s and DFAs
Now, consider this task: write a regex matching the complement of the previous language, i.e., all
strings over D ∪ {+, −, .} that are not C++ float literals.
It’s not obvious how to do it, right?
Or, consider these two regex’s over alphabet Σ = {0, 1}: R1 = Σ(ΣΣ)∗ , matching any oddlength binary string, and R2 = Σ∗ 0011Σ∗ , matching any string containing the substring 0011. The
regex R1 ∪ R2 matches their union: any string that is odd length or contains the substring 0011.
What about their intersection, odd length strings that contain the substring 0011?
We can consider that wherever 0011 appears, since it is even length, for the whole string to be
odd length, we must either have an even number of bits before 0011 and an odd number of bits
after, or vice versa. Thus, this regex works:
R3 =
(ΣΣ)∗ 0011 Σ(ΣΣ)∗
∪ Σ(ΣΣ)∗ 0011
(ΣΣ)∗
But this is ad-hoc, requiring us to think about the details of the two languages L(R1 ) and
L(R2 ). Consider changing to R1 = ΣΣΣ(ΣΣΣΣΣ)∗ matching all strings whose length is congruent
to 3 mod 5. We could do a similar case analysis of all the combinations of numbers of bits before
and after 0011 that would lead the whole length to be congruent to 3 mod 5. But this would be
tedious, and it similarly requires us to understand the details of the two languages.
Could devise something akin to the product construction for DFAs, which would automatically
process R1 and R2 based solely on their structure, not requiring any special reasoning specific to
their languages, which would tell us how to construct R3 with L(R3 ) = L(R1 ) ∩ L(R2 )?
Similarly, can we devise a way to automatically convert a regex R into one recognizing the
complement L(R)? These tasks are easy enough with DFAs, but it’s not clear for regex’s.
Conversely, we saw at the end of Section 1.1 that it is not obvious how to show the regular
languages are closed under ◦ or ∗ . Given two DFAs M1 and M2 , it seems difficult to create a DFA
M recognizing L(M1 ) ◦ L(M2 ). The obvious thing to try is to let M have all the states of both M1
and M2 (i.e., take the union of their states), and on input x, to start simulating M1 reading some
prefix y v x, and then at some point to switch from M1 to M2 , and have M2 process the remaining
suffix z of x, so that x = yz and M accepts if and only if M1 accepts y and M2 accepts z. But
how does M know when to switch? If x is the concatenation yz of y ∈ L(M1 ) and z ∈ L(M2 ),
there’s no delimiter to tell us where y ends and z begins. In fact, it might be possible to create x
1.2. NONDETERMINISM
15
in multiple ways, e.g., if L(M1 ) = {0, 00} and L(M2 ) = {0, 00}, then x = 000 could be obtained by
letting y = 0 and z = 00, or y = 00 and z = 0.
So to recap: DFAs can easily be combined for certain operations: ∪, ∩, complement. Regex’s
can trivially be combined for certain operations: ∪, ◦, ∗ (it’s right there in the definition). Some
closure properties that are easy to prove for one model seem difficult in the other.
But the claim we made earlier is that in fact, both of these models define the exact same set
of decision problems: the regular languages, which are claimed to be closed under all of these
operations. Thus, if there are DFAs M1 , M2 or regex’s R1 , R2 for languages L1 , L2 , then there are
DFAs and regex’s for all of L1 ∪ L2 , L1 ∩ L2 , L1 ◦ L2 , L∗1 , and L1 .
How would we go about proving something like this?
1.2
Nondeterminism
Reading assignment: Section 1.2 in Sipser.
We now reach a central motif in theoretical computer science:
Proof Technique: When you want to prove that an algorithm can solve a problem, brazenly
assume your algorithm has a magic ability that you can use to cheat. Then prove that adding
adding the magic ability does not improve the fundamental power of the algorithm.
Deterministic finite automata (DFA):
Nondeterministic finite automata (NFA):
realistic
(apparently) unrealistic
0,1
1
s
0
q0
ε
1
r0
0
1
0
1
0,1
q1
difficult to program
easy to program
1
q2
0
r1
Figure 1.11: A nondeterministic finite automaton (NFA) called N1 . Differences with a DFA are
highlighted in red: 1) There are two 1-transitions leaving state q0 , 2) There is an “ε-transition”
leaving state s, 3) There is no 1-transition leaving states s, r0 , or q2 and no 0-transition leaving
states r1 or q2 . Normally “missing” transitions are not shown in NFAs, but we highlight them here
to emphasize the difference with a DFA.
Differences between DFA’s and NFA’s:
• An NFA state may have any number (including 0) of transition arrows out of a state, for the
same input symbol.14
14
Unlike a DFA, an NFA may attempt to read a symbol a in a state q that has no transition arrow for a; in this
case the NFA is interpreted as immediately rejecting the string without reading any more symbols.
16
CHAPTER 1. REGULAR LANGUAGES
• An NFA may change states without reading any input (an ε-transition).
• If there is a series of choices (when there is a choice) to reach an accept state, then the NFA
accepts. If there is no series of choices that leads to an accept state, the NFA rejects.15
Example 1.2.1. Design an NFA to recognize the language {x ∈ {0, 1}∗ | x[|x| − 2] = 0}.
See Figure 1.12. This uses nondeterministic transitions (but no ε-transitions) to “guess” when
the string is 3 bits from the end.
0,1
q1
1
q2
0,1
q3
0,1
q4
Figure 1.12: An NFA N2 recognizing the language {x ∈ {0, 1}∗ | x[|x| − 2] = 1}.
1.2.1
Formal Definition of an NFA (Syntax)
For any alphabet Σ, let Σε = Σ ∪ {ε}.
Definition 1.2.2. A nondeterministic finite automaton (NFA) is a 5-tuple (Q, Σ, ∆, s, F ), where
• Q is a non-empty, finite set of states,
• Σ is the input alphabet,
• ∆ : Q × Σε → P(Q) is the transition function,
• s ∈ Q is the start state, and
• F ⊆ Q is the set of accepting states.
When defining ∆, we assume that if for some q ∈ Q and b ∈ Σε , ∆(q, b) is not explicitly defined,
then ∆(q, b) = ∅.
Example 1.2.3. Recall the NFA N1 in Figure 1.11. Formally, N1 = (Q, Σ, ∆, s, F ), where
• Q = {s, r0 , r1 , q0 , q1 , q2 },
• Σ = {0, 1},
• F = {q2 , r0 , r1 }, and
15
Note that accepting and rejecting are treated asymmetrically; if there is a series of choices that leads to accept
and there is another series of choices that leads to reject, then the NFA accepts.
1.2. NONDETERMINISM
17
• ∆ is defined
∆
s
r0
r1
q0
q1
q2
0
1
ε
{q0 }
∅
{r0 }
{r1 }
∅
∅
∅
{r0 }
∅
{q0 } {q0 , q1 }
∅
{q2 }
{q2 }
∅
∅
∅
∅
What are all the states N1 could be in after reading the following strings? (Accepting states
underlined.)
ε:
0:
1:
00:
01:
s,r0
q0 , r1
∅
q0
q0 , q1 , r0
10:
11:
000:
010:
011:
∅
∅
q0
q0 , q2 , r1
q0 , q1 , q2
N1 accepts a string if any of states it could be in after reading the string are accepting; it’s
okay for some of them to be rejecting as long as at least one is accepting. Thus, N1 accepts
ε, 0, 01, 010, 011 and rejects 1, 00, 10, 11, 000.
1.2.2
Formal Definition of Computation by an NFA (Semantics)
An NFA N = (Q, Σ, ∆, s, F ) accepts a string x ∈ Σ∗ if there are sequences y1 , y2 , . . . , ym ∈ Σε and
q0 , q1 , . . . , qm ∈ Q such that
1. x = y1 y2 . . . ym ,
2. q0 = s,
3. qi+1 ∈ ∆(qi , yi+1 ) for i ∈ {0, 1, . . . , m − 1}, and
4. qm ∈ F .
Note that if N is actually deterministic (i.e., no ε-transitions and |∆(q, b)| = 1 for all q ∈ Q
and b ∈ Σ), then the sequence of states leading to an accept state on a given input is unique, and
it is the same sequence of states as in the definition of DFA acceptance.
With NFAs, strings appear
• easier to accept, but
• harder to reject.16
16
The trick with NFAs is that it becomes (apparently) easier to accept a string, since multiple paths through the
NFA could lead to an accept state, and only one must do so in order to accept. But NFAs aren’t magic; you can’t
simply put accept states and ε-transitions everywhere and claim that there exist paths doing what you want.
By the same token that makes acceptance easier, rejection becomes more difficult, because you must ensure that
18
CHAPTER 1. REGULAR LANGUAGES
LECTURE: end of day 5
1.2.3
More examples
Example 1.2.4. Design an NFA recognizing the language {x ∈ {a}∗ | |x| is a multiple of 2 or 3 or
5 }.
See Figure 1.13. This uses ε-transitions to guess which of the three DFAs to simulate: one that
recognizes multiples of 2, or 3, or 5.
a
D1
q0
a
a
N
q1
q0
a
D2
r0
a
r1
a
r2
a
a
t1
a
t2
ε
s
q1
a
ε
D3
t0
a
r0
a
r1
a
a
ε
a
t3
a
t4
r2
t0
a
t1
a
t2
a
t3
a
t4
Figure 1.13: Example of the NFA union construction. This NFA recognizes the language L(N ) =
L(D1 ) ∪ L(D2 ) ∪ L(D3 ) = {x ∈ {a}∗ | |x| is a multiple of 2 or 3 or 5 }, using ε-transitions to guess
which of the three DFAs to simulate.
Example 1.2.5. If x ∈ {0, 1}∗ , let nx ∈ N be the integer x represents in binary (with εn = 0).
Design an NFA to recognize the language
A = {xy ∈ {0, 1}∗ | (nx ≡ 0
mod 3 or nx ≡ 1
mod 3) and ny ≡ 2
mod 3}.
For example, 1000101 ∈ A since 1002 = 4 ≡ 1 mod 3 and 01012 = 5 ≡ 2 mod 3. However, 01 6∈ A
since the possible values for x and y are (x = ε, y = 01), (x = 0, y = 1), or (x = 01, y = ε), and in
all cases ny 6≡ 2 mod 3.
See Figure 1.14. This simulates the DFA from Figure 1.7 twice, once on x and once on y: it
uses ε-transitions to guess where x ends and y begins (and it only will make that jump if the DFA
is currently accepting x).
no path leads to an accept state if the string ought to be rejected. Therefore, the more transitions and accept states
you throw in to make accepting easier, that much more difficult does it become to design the NFA to properly reject.
The key difference is that the condition “there exists a path to an accept state” becomes, when we negate it to define
rejection, “all paths lead to a reject state”. It is more difficult to verify a “for all” claim than a “there exists” claim.
1.2. NONDETERMINISM
D1
19
1
q0
q1
1
0
ε
N
1
q0
1
0
0
0
ε
0
q1
0
q2
D2
1
r0
1
0
q2
r0
1
0
r1
1
1
r2
0
0
r1
1
0
0
1
r2
1
Figure 1.14: Example of the NFA concatenation construction. An NFA N recognizing the language
{xy ∈ {0, 1}∗ | (nx ≡ 0 mod 3 or nx ≡ 1 mod 3) and ny ≡ 2 mod 3}. N essentially consists of
two copies, named D1 and D2 , of the DFA from Figure 1.7 (with appropriate accept states) with
ε-transitions to “guess” when to switch from D1 to D2 (but only when D1 is accepting). Then
L(N ) = L(D1 )L(D2 ).
Example 1.2.6. Letting A = {x ∈ {a, b}∗ | |x| > 0 and x[1] 6= x[|x|]}. Design an NFA to recognize
the language A∗ .
A is recognized by the DFA D in Figure 1.15. See N in Figure 1.15. N simulates D over and
over in a loop, using ε-transitions to guess when to start over (but only starting over if the DFA is
currently accepting x).
N
s'
ε
D
a
b
q1
a
a
b
s
r1
b
q2
a
a
b
r2
a
a
b
b
q1
ε a
b
s
r1
b
q2
a
b ε
b
r2
a
Figure 1.15: Example of the NFA Kleene star construction. An NFA N recognizing the language
A∗ , where A = {x ∈ {a, b}∗ | |x| > 0 and x[1] 6= x[|x|]}. D recognizes A. N simulates D over
and over in a loop, using ε-transitions to guess when to start over (but only when D is accepting).
Then L(N ) = L(D)∗ .
Although all three of the preceding examples started with DFAs, in fact, they all also work if
starting with NFAs as well. See textbook Figures 1.46, 1.48, and 1.50.
1.2.4
Equivalence of DFAs and NFAs
We say two finite automata D and N are equivalent if L(D) = L(N ).
20
CHAPTER 1. REGULAR LANGUAGES
Theorem 1.2.7. Every DFA has an equivalent NFA.
Proof. A DFA D = (Q, Σ, δ, s, F ) is an NFA N = (Q, Σ, ∆, s, F ) with no ε-transitions and, for
every q ∈ Q and a ∈ Σ, ∆(q, a) = {δ(q, a)}.
We now show the converse of Theorem 1.2.7.
Theorem 1.2.8. Every NFA has an equivalent DFA.
D
N
b
1
a
a,b
Ø
a
b
a
b
ε
a
2
a,b
3
{3}
{1}
a
{1,3}
b
{2}
b
a
{1,2}
a,b
b
{2,3}
a
{1,2,3}
a
b
Figure 1.16: Example of the subset construction. We transform an NFA N into a DFA D. Each
state in D represents a subset of states in N . The D states {1} and {1, 2} are unreachable from the
start state {1, 3}, so we can remove them without altering the behavior of D, but we leave them in
to show the full subset construction.
Proof. (Subset Construction) Let N = (QN , Σ, ∆, sN , FN ) be an NFA with no ε-transitions.17
Define the DFA D = (QD , Σ, δ, sD , FD ) as follows
• QD = P(QN ). Each state of D keeps track of a set of states in N , representing the set of all
states N could be in after reading some portion of the input.
• For all R ∈ QD (i.e., all R ⊆ QN ) and b ∈ Σ,
δ(R, b) =
[
∆(q, b),
q∈R
If N is in state q ∈ R after reading some portion of the input, then the states could it be in
after reading the next symbol b are all the states in ∆(q, b); since N could be in any state
q ∈ R before reading b, then we must take the union over all q ∈ R.
• sD = {sN }, After reading no input, N can only be in state sN .
• FD = { A ⊆ QN | A ∩ FN 6= ∅ } . Recall the asymmetric acceptance criterion; we want to
accept if there is a way to reach an accept state, i.e., if the set of states N could be in after
reading the whole input contains any accept states.
17
At the end of the proof we explain how to modify the construction to handle them.
1.2. NONDETERMINISM
21
Now we show how to handle the ε-transitions. For any R ⊆ QN and define
E(R) = { q ∈ QN | q is reachable from some state in R by following 0 or more ε-transitions } .
For example, in Figure 1.15, if we let R = {q1 , q2 }, then E(R) = {q1 , q2 , s0 , s}.
To account for the ε-transitions, D must be able to simulate
1. N following ε-transitions after each input-consuming transition, i.e., define


[
δ(R, b) = E 
∆(q, b) .
q∈R
2. N following ε-transitions before the first non-ε-transition, i.e., define sD = E({sN }).
We have constructed a DFA D such that the state D is in after reading string x is the subset of
states that N could be in after reading x. So D accepts x if and only if at least one of the states
N could be in is accepting, i.e., if and only if N accepts x. Thus L(D) = L(N ).
Note that the subset construction uses the power set of QN , which is exponentially larger than
QN . It can be shown (see Kozen’s textbook) that there are languages for which this is necessary;
the smallest NFA recognizing the language has n states, while the smallest DFA recognizing the
language has 2n states. For example, for any n ∈ N, the language { x ∈ {0, 1}∗ | x[|x| − n] = 0 }
has this property.
Corollary 1.2.9. The class of languages recognized by some NFA is precisely the regular languages.
Proof. This follows immediately from Theorems 1.2.7 and 1.2.8.
LECTURE: end of day 6
Theorem 1.2.10. The class of regular languages is closed under ◦.
Proof. Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) and N2 = (Q2 , Σ, ∆2 , s2 , F2 ) be NFAs. It suffices to define an
NFA N recognizing L(N1 ) ◦ L(N2 ) = { xy ∈ Σ∗ | x ∈ L(N1 ) and y ∈ L(N2 ) } .
See Figure 1.14 for an example. Intuitively, on input w ∈ Σ∗ , N first simulates N1 for some
prefix x v w, then N2 on the remaining suffix y (so w = xy), nondeterministically guessing when
to switch, but only when N1 is accepting. N accepts if N2 does, i.e., if N1 accepts x and N2 accepts
y.
To fully define N = (Q, Σ, ∆, s, F ):
• N has all the states of N1 and N2 , so Q = Q1 ∪ Q2 .
• N starts by simulating N1 , so s = s1 .
22
CHAPTER 1. REGULAR LANGUAGES
• N accepts if N2 accepts after N has switched to simulating N2 , so F = F2 .
• ∆ simulates ∆1 initially, at some point when N1 is accepting, guesses where to switch to the
start state of N2 , then simulates ∆2 . So for all q ∈ Q and b ∈ Σε ,

∆1 (q, b),



∆1 (q, b),
∆(q, b) =
∆ (q, b) ∪ {s2 },


 1
∆2 (q, b),
if
if
if
if
q
q
q
q
∈ Q1 \ F1 ;
∈ F1 and b 6= ε;
∈ F1 and b = ε;
∈ Q2 .
Detailed proof of correctness: To see that L(N1 ) ◦ L(N2 ) ⊆ L(N ). Let w ∈ Σ∗ . If there are
x ∈ L(N1 ) and y ∈ L(N2 ) such that w = xy (i.e., w ∈ L(N1 ) ◦ L(N2 ), then there is a sequence
of choices of N such that N accepts w (i.e., q ∈ L(N )): follow the choices N1 makes to accept x,
ending in a state in F1 , then execute the ε-transition to state s2 defined above, then follow the
choices N2 makes to accept y. This shows L(N1 ) ◦ L(N2 ) ⊆ L(N ).
To see the reverse containment L(N ) ⊆ L(N1 ) ◦ L(N2 ), suppose w ∈ L(N ). Then there is a
sequence of choices such that N accepts w. By construction, all paths from s = s1 to some state in
F = F2 pass through s2 , so N must reach s2 after reading some prefix x v w, and the remaining
suffix y of w takes N from s2 to a state in F2 , i.e., y ∈ L(N2 ). By construction, all paths from s = s1
to s2 go through a state in F1 , and those states are connected to s2 only by a ε-transition, so x takes
N from s1 to a state in F2 , i.e., x ∈ L(N1 ). Since w = xy, this shows that L(N ) ⊆ L(N1 ) ◦ L(N2 ).
Thus N recognizes L(N1 ) ◦ L(N2 ).
Theorem 1.2.11. The class of regular languages is closed under ∗ .
Proof.
Let N1 = (Q1 , Σ, ∆1 , s1 , F1 ) be an NFA. It suffices to define an NFA N recognizing L(N1 )∗ =
S∞
k
∗
k=0 L(N1 ) = { x1 x2 . . . xk ∈ Σ | k ∈ N and, for all 1 ≤ i ≤ k, xi ∈ L(N1 ) } . See Figure 1.15 for
an example.
N should accept w ∈ Σ∗ if w can be broken into several pieces x1 , . . . , xk ∈ Σ∗ such that
N1 accepts each piece. N will simulate N1 repeatedly, nondeterministically guessing when to εtransition from an accept state to the start state, (signifying a switch from xi to xi+1 ). Since
N must also accept ε, we add one new accepting state to N , which is N ’s start state, and an
ε-transition to the original start state.18
To fully define N = (Q, Σ, ∆, s, F ):
• N has the states of N1 plus the new start state s, so Q = Q1 ∪ {s}.
• N accepts whenever N1 does, and to ensure we accept ε, the new start state is also accepting:
F = F1 ∪ {s}.
18
It is tempting simply to make N1 ’s original start state s1 accepting. However, if s1 was rejecting before, and if
there are existing transitions into s1 , then this will add new strings to the language accepted by N1 , which we don’t
want. Adding a new start state just to handle ε solves this problem.
1.2. NONDETERMINISM
23
• For all q ∈ Q and b ∈ Σ ∪ {ε}:

∆1 (q, b),




 ∆1 (q, b),
∆1 (q, b) ∪ {s1 },
∆(q, b) =


 {s1 },


∅,
if
if
if
if
if
q
q
q
q
q
∈ Q1 \ F1 ;
∈ F1 and b 6= ε;
∈ F1 and b = ε;
= s and b = ε;
= s and b 6= ε.
Detailed proof of correctness: Clearly N accepts ε ∈ L(D)∗ , so we consider only nonempty
strings.
To see that L(D)∗ ⊆ L(N ), suppose w ∈ L(D)∗ \ {ε}. Then w = x1 x2 . . . xk , where each xj ∈
L(D) \ {ε}. Thus for each j ∈ {1, . . . , k}, there is a sequence of states sD = qj,0 , qj,1 , . . . , qj,|xj | ∈ Q,
where qj,|xj | ∈ F , and a sequence yj,0 , . . . , yj,|xj |−1 ∈ Σ, such that xj = yj,0 . . . yj,|xj |−1 , and for each
i ∈ {0, . . . , |xj | − 1}, qj,i+1 = δ(qj,i , yj,i ). The following sequence of states in QN testifies that N
accepts w:
(sN , sD , q1,1 , . . . , q1,|x1 | ,
sD , q2,1 , . . . , q2,|x2 | ,
...
sD , qk,1 , . . . , qk,|xk | ).
Each transition between adjacent states in the sequence is either one of the ∆(qj,i , yj,i ) listed above,
or is the ε-transition from sN to sD or from qj,|xj | to sD . Since qk,|xk | ∈ F ⊆ F 0 , N accepts w, i.e.,
L(D)∗ ⊆ L(N ).
To see that L(N ) ⊆ L(D)∗ , let w ∈ L(N )\{ε}. Then there are sequences s0 = q0 , q1 . . . , qm ∈ Q0
and y0 , . . . , ym−1 ∈ Σε such that qm ∈ F 0 , w = y0 y1 . . . ym−1 , and, for all i ∈ {0, . . . , m − 1},
qi+1 ∈ ∆0 (qi , y[i]). Since w 6= ε, qm 6= s0 , so qm ∈ F . Since the start state is s0 = q0 , which has
no outgoing non-ε-transition, the first transition from q0 to q1 is the ε-transition s0 to s. Suppose
there are k − 1 ε-transitions from a state in F to s in q1 , . . . , qm . Then we can write q1 , . . . , qm as
( sD , q1,1 , . . . , q1,|x1 | ,
sD , q2,1 , . . . , q2,|x2 | ,
...
sD , qk,1 , . . . , qk,|xk | ).
where each qj,|xj | ∈ F and has a ε-transition to s.19 For each j ∈ {1, . . . , k) and i ∈ {0, |xj | −
1}, let yj,i be the corresponding symbol in Σε causing the transition from qj,i to qj,i+1 , and let
xj = yj,0 yj,1 . . . yj,|xj |−1 . Then the sequences s, qj,1 qj,2 . . . qj,|xj | and yj,0 yj,1 . . . yj,mj −1 testify that D
accepts xj , thus xj ∈ L(D). Thus w = x1 x2 . . . xk , where each xj ∈ L(D), so w ∈ L(N )∗ , showing
L(N ) ⊆ L(D)∗ .
Thus N recognizes L(D)∗ .
19
Perhaps there are no such ε-transitions, in which case k = 1.
24
CHAPTER 1. REGULAR LANGUAGES
1.3
1.3.1
Regular Expressions (again)
Equivalence with Finite Automata
Theorem 1.3.1. A language is regular if and only if some regular expression defines it.
We prove each direction separately via two lemmas.
Lemma 1.3.2. If a language is defined by a regular expression, then it is regular.
Proof. Let R be a regular expression over an alphabet Σ. It suffices to construct an NFA N =
(Q, Σ, ∆, s, F ) such that L(R) = L(N ).
The definition of regex gives us six cases:
b
1. R = b, where b ∈ Σ, so L(R) = {b}, recognized by this NFA:
2. R = ε, so L(R) = {ε}, recognized by this NFA:
3. R = ∅, so L(R) = ∅, recognized by this NFA:
4. R = R1 ∪ R2 , so L(R) = L(R1 ) ∪ L(R2 ).
5. R = R1 ◦ R2 , so L(R) = L(R1 ) ◦ L(R2 ).
6. R = R1∗ , so L(R) = L(R1 )∗ .
For the last three cases, assume inductively that L(R1 ) and L(R2 ) are regular. Since the regular
languages are closed under the operations of ∪, ◦, and ∗ , L(R) is regular.
Thus, to convert a regex into an NFA, we replace the base cases in the regex with the simple 1or 2-state NFAs above, and then we connect them using the NFA union construction, concatenation
construction, and Kleene star construction, shown by example in Figures 1.13, 1.14, and 1.15.
An example of this conversion is shown in Figure 1.17.
Lemma 1.3.3. If a language is regular, then it is defined by a regular expression.
Proof. Let D = (Q, Σ, δ, s, F ) be a DFA. It suffices to construct a regex R such that L(R) = L(D).
P to define the language
Given P ⊆ Q and u, v ∈ Q, we will construct a regex Ruv
n
o
P
b x) and (∀i ∈ {1, 2, . . . , |x| − 1}) δ(u,
b x[0 . . i − 1]) ∈ P .
L(Ruv
) = x ∈ Σ∗ v = δ(u,
draw picture
P ) if and only if input x takes D from state u to state v “entirely through P ,
That is, x ∈ L(Ruv
P is inductive on P .20
except possibly at the start and end”. The construction of Ruv
0
That is, we assume inductively that the regex’s RuP0 v0 can be constructed for proper subsets P 0 ( P and all pairs
P
of states u0 , v 0 ∈ Q, and use these to define Ruv
).
20
1.3. REGULAR EXPRESSIONS (AGAIN)
25
a:
a
b:
b
ab:
a
ε
b
ε
a
ε
b
ε
a
ε
b
ab ∪a:
ε
ε
a
ε
a
(ab ∪a)*:
ε
Figure 1.17: Example of converting the regex (ab ∪ a)∗ to an NFA. The recursion is shown “bottomup”, starting with the base cases for a and b and building larger NFAs using the union construction,
concatenation construction, and Kleene star construction, as in Figures 1.13, 1.14, and 1.15.
Base case: Let P = ∅, and let b1 , . . . , bk ∈ Σ be the symbols such that v = δ(u, bi ). If u 6= v,
define
b1 ∪ . . . ∪ bk , if k ≥ 1;
∅
Ruv =
∅,
if k = 0.
If u = v, define
∅
Ruu
=
b1 ∪ . . . ∪ bk ∪ ε, if k ≥ 1;
ε,
if k = 0.
The ε represents “traversing” from u to u without reading a symbol, by simply sitting still.
Inductive case: Let P ⊆ Q be nonempty. Choose q ∈ P arbitrarily, and define
∗
P
P \{q}
P \{q}
P \{q}
P \{q}
Rqv
.
Ruv
= Ruv
∪ Ruq
Rqq
draw picture
That is, either x traverses from u to v through P without ever visiting q, or it visits q one or
more times, and between each of these visits, traverses within P without visiting q. 21
21
Suppose x follows a path from u to v through P . It either visits state q or not. If not, then the first sub-regex
P \{q}
Ruq
will match x. If so, then it will get to q, take 0 or more paths through P \ {q}, revisiting q after each one,
P
then proceed within P \ {q} from q to v. The second sub-regex expresses this. Therefore Ruv
matches x. Since these
P
two sub-regex’s express the only two ways for x to follow a path from u to v through P , the converse holds that Ruv
matching x implies x follows a path from u to v through P .
26
CHAPTER 1. REGULAR LANGUAGES
To finish the proof, observe that the regex
Q
Q
Q
R = Rsf
∪ Rsf
∪ . . . ∪ Rsf
,
1
2
k
where F = {f1 , f2 , . . . , fk }, defines L(D), since the set of strings accepted by D is precisely those
strings that follow a path from the start state s to some accept state fi , staying within the entire
state set Q.
Example: convert the DFA ({1, 2}, {a, b}, δ, 1, {2}) to a regular expression using the given procedure, where
δ a b
1 1 2
2 2 2
{1,2}
In the first step of recursion, we choose q = 2, i.e., we define R12
{1,2}\{2}
{1}
Ruv
= Ruv .
{1,2}
R12
{1}
{1}
= R12 ∪ R12
{1} ∗
R22
{1}
∅
∅
∅ ∗ ∅
= R12
∪ R11
(R11
) R12
{1}
∅
∅
∅ ∗ ∅
= R22
∪ R21
(R11
) R12
R12
R22
inductively in terms of
{1}
R22
∅
R12
= b
∅
R22
= a∪b∪ε
∅
R21
= ∅
∅
R11
= a∪ε
We use the identity below that for any regex X, (X ∪ ε)∗ = X ∗ (since concatenating ε any
number of times does not alter a string). We also use this observation: for any regex’s X and Y ,
X ∪XY Y ∗ = XY ∗ , since X means X followed by 0 Y ’s, and XY Y ∗ is X followed by 1 or more Y ’s,
so their union is the same as X followed by 0 or more Y ’s, i.e., XY ∗ . Similarly, X ∪ Y Y ∗ X = Y ∗ X.
∅
∅
∅
∅
Substituting the definitions of R11
, R12
, R21
, R22
, we have
{1}
R12
{1}
R22
= b ∪ (a ∪ ε)(a ∪ ε)∗ b
= (a ∪ ε)∗ b
since X ∪ Y Y ∗ X = Y ∗ X
= a∗ b
since (X ∪ ε)∗ = X ∗
= (a ∪ b ∪ ε) ∪ ∅(a ∪ ε)∗ b
= (a ∪ b ∪ ε)
{1}
{1}
since ∅X = ∅
Substituting the definitions of R12 , R22 , we have
1.4. NONREGULAR LANGUAGES
{1,2}
R12
{1}
27
{1}
= R12 ∪ R12
{1} ∗
R22
{1}
R22
= a∗ b ∪ (a∗ b)(a ∪ b ∪ ε)∗ (a ∪ b ∪ ε)
= a∗ b(a ∪ b ∪ ε)∗
since X ∪ XY Y ∗ = XY ∗
= a∗ b(a ∪ b)∗
since (X ∪ ε)∗ = X ∗
In other words, L(R) is the set of strings with at least one b.
LECTURE: end of day 7
1.4
Nonregular Languages
Reading assignment: Section 1.4 in Sipser.
Recall that given strings a, b, #(a, b) denotes the number of times that a appears as a substring
of b. Consider the languages
C = { x ∈ {0, 1}∗ | #(0, x) = #(1, x) } , and
D = { x ∈ {0, 1}∗ | #(01, x) = #(10, x) } .
C is not regular,22 but D is.
1.4.1
The Pumping Lemma for Regular Languages
The pumping lemma is based on the pigeonhole principle. The proof informally states, “If an input
to a DFA is long enough, then some state must be visited twice, and the substring between the two
visitations can be repeated without changing the DFA’s final answer.”23
b x) be the state M is in after
See the DFA M in Figure 1.18. For q ∈ Q and x ∈ Σ∗ , let δ(q,
reading x.
Note that if we reach, for example, q1 , then read the bits 1001, we will return to q1 . Therefore,
b x) = q1 , such as x = 0, it follows that δ(s,
b x1001) = q1 , and
for any string x such that δ(s,
i
b x10011001) = q1 , etc. i.e., for all i ∈ N, defining y = 1001, δ(s,
b xy ) = q1 .
δ(s,
Also notice that if we are in state q1 , and we read the string z = 110, we end up in accepting
b xz) = q5 , thus M accepts xz. But combined with the previous reasoning,
state q5 . Therefore, δ(s,
we also have that M accepts xy i z for all n ∈ N. In the examples x = 0, y = 1001, z = 110
22
Intuitively, it appears to require unlimited memory to count all the 0’s. But we must prove this formally; the
example D shows that a language can appear intuitively to require unbounded memory while actually being regular.
23
That is, we “pump” more copies of the substring into the full string. The goal is to pump until we change the
membership of the string in the language, at which point we know the DFA cannot decide the language, since it gives
the same answer on two strings, one of which is in the language and the other of which is not.
28
CHAPTER 1. REGULAR LANGUAGES
0
s
0
q1
1
1
q6
q2
0
q4
1
0
0
1
1
0,1
q3
0
q6
0
1
q5
0
Figure 1.18: A DFA M = (Q, Σ, δ, s, F ) to help illustrate the idea of the pumping lemma. Reacing
x reaches q1 , then reading y returns to q1 , then reading z reaches accepting state q5 , the strings xz,
xyz, xyyz, xyyyz, . . . are all accepted too. Their paths through M differ only in how many times
they follow the cycle (q1 , q2 , q3 , q4 , q1 ).
Note that the cycle in which a string gets trapped may depend on the string. Any long enough
string starting with 1 will repeatedly visit the state q6 , but will never visit q1 .
More generally, for any DFA M , any string w with |w| ≥ |Q| has to cause M to traverse a cycle.
The substring y read along this cycle can either be removed, or more copies added, and M will end
in the same state. Calling x the part of w before y and z the part after, we have that whatever is
D’s answer on w = xyz, it is the same on xy i z for any i ∈ N, since all of those strings take M to
the same state.
Pumping Lemma. If A is a regular language, there is a number p (the pumping length) so
that if w ∈ A and |w| ≥ p, then w may be divided into three substrings w = xyz, satisfying the
conditions:
1. for all i ∈ N, xy i z ∈ A,
2. |y| > 0, and
3. |xy| ≤ p.
Proof. Let A be a regular language, recognized by a DFA M = (Q, Σ, δ, s, F ). Let p = |Q|.
Let w ∈ A be a string with |w| = n ≥ p. Let r0 , r1 , . . . , rn be the sequence of n + 1 states M
enters while reading w, i.e., ri+1 = δ(ri , w[i + 1]) for 0 ≤ i < n. Note r0 = s and rn ∈ F . Since
n + 1 ≥ p + 1 > |Q|, by the pigeonhole principle, two states in the sequence must be equal. We call
them rj and rl , with j < l ≤ p + 1. Let x = w[1 . . j], y = w[j + 1 . . l], and z = w[l + 1 . . n].
b j , y) = rl = rj , for all i ∈ N, δ(r
b j , y i ) = rj . This example shows why, with rj = q3 :
Since δ(r
x
y
z
w = w[1] w[2] w[3] w[4] w[5] w[6] w[7] w[8] w[9]
q1
q7
q6
q2
q3
q5
q4
q3
q6
q2
1.4. NONREGULAR LANGUAGES
29
b 0 , xy i z) = rn ∈ F , so xy i z ∈ L(M ), satisfying (1). Since j 6= l, |y| > 0, satisfying
Therefore δ(r
(2). Finally, l ≤ p + 1, so |xy| ≤ p, satisfying (3).
24
Theorem 1.4.1. The language B = { 0n 1n | n ∈ N } is not regular.
Proof. Assume for the sake of contradiction that B is regular, with pumping length p. Let w =
0dp/2e 1dp/2e . Since w ∈ B and |w| ≥ p, by the Pumping Lemma, we can divide w into three
substrings w = xyz, where |y| > 0 and for all i ∈ N, xy i z ∈ B.
Since |y| > 0, there are three cases for y:
1. y is all 0’s. Then xyyz has more 0’s than 1’s, so xyyz 6∈ B, violating condition (1) of the
Pumping Lemma, a contradiction.
2. y is all 1’s. Then xyyz has more 1’s than 0’s, also a contradiction.
3. y has both 1’s and 0’s. Then xyyz has some 1’s before 0’s, so xyyz 6∈ B, a contradiction.
Theorem 1.4.2. The language C = {w | #(0, w) = #(1, w)} is not regular.
Bad proof. Assume for the sake of contradiction that C is regular, with pumping length p. Let
w = 0dp/2e 1dp/2e . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, we can divide w into three
substrings w = xyz, where |y| > 0, |xy| ≤ p, and for all i ∈ N, xy i z ∈ C.
Let x = ε, y = 0dp/2e , and z = 1dp/2e . Then xyyz = 02dp/2e 1dp/2e , which has more 0’s than 1’s,
so xyyz 6∈ C, a contradiction.
What’s wrong with this proof? We wanted y to be all 0’s so that when we pump a second copy
into w = xyz to create xyyz, we end up with more 0’s than 1’s. But just because we want y to be
all 0’s doesn’t mean we can simply declare it is all 0’s. What if x = z = ε and y = 0dp/2e 1dp/2e ? This
i
is totally consistent with the Pumping Lemma. But if this is the case, then xy i z = 0dp/2e 1dp/2e ,
which has an equal number of 0’s and 1’s. So xy i z ∈ C for all i, and we don’t get a contradiction.
Think of the Pumping Lemma as a contract: we give it w of length at least p, and it gives
back x, y, and z. It is guaranteed to fulfill the exact terms of the contract: |y| > 0, |xy| ≤ p, and
xy i z ∈ C for all i ∈ N. Nothing more, nothing less.
If we want y to have some extra property, such as being all 0’s, we have to prove that y has
that property, that this property follows from the fact that |y| > 0 and |xy| ≤ p. We might have to
be careful how we choose w so that any choice of x, y, and z satisfying the Pumping Lemma will
give us the property we want.
In this case, if we choose w to be a bit longer, we can use condition (3) to conclude that y is
indeed all 0’s.
24
The strategy for employing the Pumping Lemma to show a language L is not regular is: fix a DFA M , find a
long string in L, long enough that it can be pumped (relative to M ), then prove that pumping it “moves it out of
the language”. Thus M does not recognize L, since M either accepts both strings or rejects both, by the Pumping
Lemma.
30
CHAPTER 1. REGULAR LANGUAGES
Proof of Theorem 1.4.2. Assume for the sake of contradiction that C is regular, with pumping
length p. Let w = 0p 1p . Since w ∈ C and |w| ≥ p, by the Pumping Lemma, we can divide w into
three substrings w = xyz, where |y| > 0, |xy| ≤ p, and for all i ∈ N, xy i z ∈ C.
Since |xy| ≤ p and w starts with p 0’s, y is all 0’s. Then xyyz has more 0’s than 1’s. Thus
xyyz 6∈ C, contradicting condition (1) of the Pumping Lemma.
Here’s an alternate proof that does not directly use the Pumping Lemma. It uses closure
properties and appeals to the fact that we already proved that B = {0n 1n | n ∈ N} is not regular.
Alternate proof of Theorem 1.4.2. Assume for the sake of contradiction that C is regular. Since
the regular languages are closed under ∩, C ∩ 0∗ 1∗ = {0n 1n | n ∈ N} is regular, contradicting
Theorem 1.4.1.
LECTURE: end of day 8
Theorem 1.4.3. The language F = { ww | w ∈ {0, 1}∗ } is not regular.
Proof. Assume for the sake of contradiction that F is regular, with pumping length p. Let w =
0p 10p 1. Since w ∈ F and |w| ≥ p, by the Pumping Lemma, we can divide w into three substrings
w = xyz, where |y| > 0, |xy| ≤ p, and for all i ∈ N, xy i z ∈ B.
Since |xy| ≤ p, x and y are all 0’s. Let k = |y| > 0. Then xyyz = 0p+k 10p 1 6∈ F since p + k > p,
contradicting condition (1) of the Pumping Lemma.
Here is a nonregular unary language.
n 2 o
Theorem 1.4.4. The language D = 1n n ∈ N is not regular.
2
Proof. Assume for the sake of contradiction that D is regular, with pumping length p. Let w = 1p .
Since w ∈ D and |w| ≥ p, by the Pumping Lemma, we can divide w into three substrings w = xyz,
where |y| > 0, |xy| ≤ p, and for all i ∈ N, xy i z ∈ B.
2
Let u = 1(p+1) be the next biggest string in D after w; then no string with length strictly
between |w| and |u| is in D. Then
|u| − |w| = (p + 1)2 − p2
= p2 + 2p + 1 − p2
= 2p + 1
> p.
Since |y| ≤ p, |xyyz| − |w| ≤ p, so |xyyz| < |u|. Since |y| > 0, |w| < |xyyz| < |u|. Thus xyyz 6∈ D,
contradicting condition (1) of the Pumping Lemma.
The next example shows that “pumping down” (replacing xyz with xz) can be useful.
1.4. NONREGULAR LANGUAGES
31
Theorem 1.4.5. The language E = {0i 1j | i > j} is not regular.
Proof. Assume for the sake of contradiction that E is regular, with pumping length p. Let w =
0p+1 1p ∈ E. By the Pumping Lemma, we can divide w into three substrings w = xyz, where
|y| > 0, |xy| ≤ p, and for all i ∈ N, xy i z ∈ B.
Since |xy| ≤ p, y is all 0’s. Let k = |y| > 0. Then xz = 0p+1−k 1p 6∈ E since p + 1 − k ≤ p,
contradicting condition (1) of the Pumping Lemma.
32
CHAPTER 1. REGULAR LANGUAGES
Chapter 2
Context-Free Languages
Reading assignment: Section 2.1 of Sipser.
2.1
Context-Free Grammars
A → 0A1
A → B
B → :
A grammar consists of substitution rules (productions), one on each line. The single symbol on the
left is a variable, and the string on the right consists of variables and other symbols called terminals.
One variable is designated as the start variable, on the left of the topmost production. The fact
that the left side of each production has a single variable means the grammar is context-free. We
abbreviate two rules with the same left-hand variable as follows: A → 0A1 | B.
A and B are variables, and 0, 1, and : are terminals.
The idea is that we start with a single copy of the start variable. We pick a rule with the start
variable and replace the variable with the string on the right side of the rule. This gives a string
mixing variables and terminals. We pick some variable in it, pick a rule with that variable on the
left side, and again replace the variable with the right side of the rule. Do this until the string is
all terminal symbols.
Example 2.1.1. Write a grammar generating the language {0n 1n | n ∈ N}.
S → 0S1 | ε
Example 2.1.2. Write a grammar generating the language of properly nested parentheses.
S → (S) | SS | ε
Definition 2.1.3. A context-free grammar (CFG) is a 4-tuple (V, Σ, R, S), where
• V is a finite alphabet of variables,
33
34
CHAPTER 2. CONTEXT-FREE LANGUAGES
S
S
( (
S
S
S
S
ε
S
) ( (
ε
) )
)
Figure 2.1: A parse tree for a derivation of (()(())). Each internal node represents a variable to
which a production rule was applied, the children represent the variable(s) and terminal(s) on the
right side of the rule.
• Σ is a finite alphabet, disjoint from V , called the terminals,
• R ⊆ V × (V ∪ Σ)∗ is a finite set of rules, and
• S ∈ V is the start symbol.
By convention, if only the rules are listed, the variable on the left-hand side of the first rule
is the start symbol. Variables are often uppercase letters, but not necessarily: the variables are
whatever appears on the left-hand side of rules. It is common in CFGs for defining programming
languages for the variables to have long names, similar to programming language variable names,
such as AssignmentOperator or while_statement. Terminals are symbols other than uppercase
letters, such as lowercase letters, numbers, and other symbols such as parentheses.
A sequence of productions applied to generate a string of all terminals is a derivation. For
example, a derivation of 000:111 is
A ⇒ 0A1 ⇒ 00A11 ⇒ 000A111 ⇒ 000B111 ⇒ 000:111.
We also say that the grammar produces the string 000:111.
For the balanced parentheses grammar S → (S) | SS | ε, a derivation of (()(())) is
S ⇒ (S) ⇒ (SS) ⇒ ((S)S) ⇒ (()S) ⇒ (()(S)) ⇒ (()((S))) ⇒ (()(()))
We also may represent this derivation with a parse tree, shown in Figure 2.1.
The set of all-terminal strings generated by a CFG G is the language of the grammar L(G).
We also say G generates L(G).
Given the example CFG G above, its language is L(G) = { 0n :1n | n ∈ N }. A language
generated by a CFG is called context-free.
Example 2.1.4. This is a small portion of the CFG for the Python language:
2.1. CONTEXT-FREE GRAMMARS
compound_stmt
if_stmt
while_stmt
suite
→
→
→
→
35
if_stmt | while_stmt | for_stmt | funcdef | classdef
’if’ test ’:’ suite (’elif’ test ’:’ suite)* [’else’ ’:’ suite]
’while’ test ’:’ suite [’else’ ’:’ suite]
simple_stmt | NEWLINE INDENT stmt+ DEDENT
0
P
0
1
T
1
1
0
R
P
T
R
P
T
→
→
→
→
→
0T | 1R
0R | 1P
0P | 1R
ε
ε
Figure 2.2: Left: A DFA D. Right: A right-regular grammar generating L(D).
Theorem 2.1.5. Every regular language is context-free.
Proof. Let D = (Q, Σ, δ, s, F ) be a DFA. We construct a CFG G = (V, Σ, R, S) generating L(D).
See Figure 2.2 for an example.
V = Q, Σ is the same for each, and S = s. There is one rule for each transition in δ: if
δ(q, b) = r, then G has a rule q → br. There is also one rule for each accept state: if q ∈ F , then
G has a rule q → ε.
Throughout any derivation, there is always exactly one variable q. To eliminate this variable
and result in an all-terminal string, q must be accepting, so the string of terminals is a string
accepted by D. Conversely, any string x accepted by D, which visits states s = q0 , q1 , . . . , qn , can
be produced by applying these rules in order
q0
→ x[1]q1
q1
→ x[2]q2
...
qn−1
qn
→ x[n]qn
→ ε,
resulting in the derivation q0 ⇒ x[1]q1 ⇒ x[1]x[2]q2 ⇒ . . . ⇒ x[1..n − 1]qn−1 ⇒ xqn ⇒ x. Thus
x ∈ L(D) ⇐⇒ x ∈ L(G).
A grammar is called right-regular if all of its rules are of the form R → ε or R → bT for some
terminal b and variable T . In fact, every right-regular grammar can be interpreted as implementing
an NFA with no ε-transitions, so each right-regular grammar has an equivalent NFA. Thus, rightregular grammars are yet another way to characterize the regular languages, along with DFAs,
NFAs, and regex’s.
36
CHAPTER 2. CONTEXT-FREE LANGUAGES
2.2
Pushdown Automata
finite-state control
b
pushdown
automaton
(NPDA)
NPDA recognizing 0n1n
b
0 1 0 1 1
a
read-only input
⊥
0, push a
readwrite
stack
1, pop a
q0 ε, push ⊥ q1 1, pop a q2 ε, pop ⊥ q3
Figure 2.3: Top: Intuitive schematic of a (nondeterministic) pushdown automaton (PDA). Like
a DFA/NFA, it has a finite-state control, and it has access to input that it can read, but not write
to. Unlike a DFA/NFA, it also has access to an unlimited source of memory: a stack, where it can
push arbitrary symbols on top, read the top of the stack, and remove (pop) symbols from the top.
Bottom: NPDA recognizing {0n 1n | n ∈ N}.
Finite automata cannot recognize the language L = { 0n 1n | n ∈ N } because their memory is
limited and cannot count 0’s. A (nondeterministic) pushdown automaton (NPDA) is an NFA augmented with an infinite stack memory, which enables them to recognize some nonregular languages
such as L. See Figure 2.3.
Partially because of limited time in the quarter, and partially because they can be tedious to
deal with formally, we are not going to spend any more time talking about pushdown automata.
Facts to know about CFGs and NPDAs:
1. A language is context-free (generated by a CFG) if and only if it is recognized by some NPDA.
2. Deterministic PDAs (DPDAs) are strictly weaker than NPDAs, so some context-free languages are not recognized by any DPDA. An example is {x ∈ {0, 1}∗ | x = xR }.
3. Since a NPDA that does not use its stack is simply an NFA, every regular language is contextfree. The converse is not true; for instance, {0n 1n | n ∈ N} is context-free but not regular.
LECTURE: end of day 9
Chapter 3
The Church-Turing Thesis
3.1
Turing Machines
Reading assignment: Section 3.1 in Sipser.
A Turing machine (TM) is a finite automaton with an unbounded read/write tape memory.
finite-state control
Turing
machine
(TM)
0 1 0 1
...
unbounded read/write tape
Figure 3.1: Intuitive schematic of a Turing machine (TM).
Differences between finite automata and TMs:
• The TM can write on its tape and read from it.
• The read-write tape head can move left or right.
• The tape is unbounded.
• There are special accept and reject states that cause the machine to immediately halt; conversely, the machine will not halt until it reaches one of these states, which may never happen.
Example 3.1.1. Design a Turing machine to test membership in the language
A = { w:w | w ∈ {0, 1}∗ } .
1. Zig-zag to either side of the :, checking if the leftmost symbol on each side is equal. If not,
or if there is no :, reject. Cross off symbols as they are checked.
37
38
CHAPTER 3. THE CHURCH-TURING THESIS
2. When all symbols to the left of the : have been crossed off, check for remaining symbols to
the right of the :. If any remain, reject; otherwise, accept.
Show TMSimulator with w : w example TM.
3.1.1
Formal Definition of a Turing machine
Definition 3.1.2. A Turing machine (TM ) is a 7-tuple (Q, Σ, Γ, δ, s, qa , qr ), where
• Q is a finite set of states,
• Σ is the input alphabet, assumed not to contain the blank symbol xy,
• Γ is the tape alphabet, where xy ∈ Γ and Σ
Γ,
• s ∈ Q the start state,
• qa ∈ Q the accept state,
• qr ∈ Q the reject state, where qa 6= qr , and
• δ : (Q \ {qa , qr }) × Γ → Q × Γ × {L, R} is the transition function.
q1
0→x,R
0,1→R
q2
q8
:→R
x→R
:→R
1→x,R
x→R
0,1→R
:→R
_→R
q4
q3
q5
qa
x→R
1→x,L
0→x,L
q6
0,1,x→L
:→L
x→R
q7
0,1→L
Figure 3.2: TM that recognizes {w:w | w ∈ {0, 1}∗ }.
Example 3.1.3. We formally describe the TM M1 = (Q, Σ, Γ, δ, q1 , qa , qr ) described earlier, which
decides the language B = {w:w | w ∈ {0, 1}∗ }:
• Q = {q1 , q2 , q3 , q4 , q5 , q6 , q7 , q8 , qa , qr },
3.1. TURING MACHINES
39
• Σ = {0, 1, :} and Γ = {0, 1, :, x, xy},
• δ is shown in Figure 3.2.
3.1.2
Formal Definition of Computation by a Turing Machine
The TM starts with its input written at the leftmost positions on the tape, with xy written everywhere else.
A configuration of a TM M = (Q, Σ, Γ, δ, s, qa , qr ) is a triple (q, p, w), where
• q ∈ Q is the current state,
• p ∈ N is the tape head position,
• w ∈ Γ∗ is the tape content, the string consisting of the symbols starting at the leftmost
position of the tape, until the rightmost non-blank symbol, or the largest position the tape
head has scanned, whichever is larger.
We often represent the triple (q, p, w0 w1 . . . wn−1 ) as a string
w0 w1 . . . wp−1 q wp . . . wn−1 ,
where the position of the string representing the state, also communicates the position of the tape
head, by placing it immediately to the left of the tape symbol currently being scanned. For instance,
starting a TM in state q1 with input 0100 gives the initial configuration q1 0100. If the first action
of the TM is to write the bit 1, change to state q3 , and move right, then the configuration becomes
1 q3 100.
Given two configurations C and C 0 , we say C yields C 0 , and we write C → C 0 , if C 0 is the
configuration that the TM will enter immediately after C. Formally, given C = (q, p, w) and
C 0 = (q 0 , p0 , w0 ), C → C 0 if and only if δ(q, w[p]) = (q 0 , w0 [p], m), where
• w[i] = w0 [i] for all i ∈ {0, . . . , |w| − 1} \ {p}
• if m = L, then
– p0 = max{0, p − 1}
– |w0 | = |w|
• if m = R, then
– p0 = p + 1
– |w0 | = |w| if p0 < |w|, otherwise |w0 | = |w| + 1 and w0 [p0 ] = xy.
A configuration (q, p, w) is accepting if q = qa , rejecting if q = qr , and halting if it is accepting
or rejecting. M accepts input x ∈ Σ∗ if there is a finite sequence of configurations C1 , C2 , . . . , Ck
such that
1. C1 = (s, 0, x) (initial/start configuration),
40
CHAPTER 3. THE CHURCH-TURING THESIS
2. for all i ∈ {1, . . . , k − 1}, Ci → Ci+1 , and
3. Ck is accepting.
The language recognized (accepted) by M is L(M ) = { x ∈ Σ∗ | M accepts x } .
LECTURE: end of day 10
Definition 3.1.4. A language is called Turing-recognizable (Turing-acceptable, computably enumerable, recursively enumerable, c.e., or r.e.) if some TM recognizes it.
A language is co-Turing-recognizable (co-c.e.) if its complement is c.e.
On any input, a TM may accept, reject, or loop, meaning that it never enters a halting state.
If a TM M halts on every input string, then we say it is a decider, and that it decides the language
L(M ). We also say that M is total, meaning that the function φ : Σ∗ → {0, 1} that it computes
is total. φ is a partial function if M does not halt on certain inputs, since for those inputs, φ is
undefined.
Definition 3.1.5. A language is called Turing-decidable (recursive), or simply decidable, if some
TM decides it.
3.2
Variants of Turing Machines
Reading assignment: Section 3.2 in Sipser.
Why do we believe Turing machines are an accurate model of computation?
One reason is that the definition is robust: we can make all number of changes, including
apparent enhancements, without actually adding any computation power to the model. We describe
some of these, and show that they have the same capabilities as the Turing machines described in
the previous section.1
In a sense, the equivalence of these models is unsurprising to programmers. Programmers are
well aware that all general-purpose programming languages can accomplish the same tasks, since
an interpreter for a given language can be implemented in any of the other languages.
1
This does not mean that any change whatsoever to the Turing machine model will preserve its abilities; by
replacing the tape with a stack, for instance we would reduce the power of the Turing machine to that of a pushdown
automaton. Likewise, allowing the start configuration to contain an arbitrary infinite sequence of symbols already
written on the tape (instead of xy everywhere but the start, as we defined) would add power to the model; by writing
an encoding of an undecidable language, the machine would be able to decide that language. But this would not
mean the machine had magic powers; it just means that we cheated by providing the machine (without the machine
having to “work for it”) an infinite amount of information before the computation even begins.
3.2. VARIANTS OF TURING MACHINES
3.2.1
41
Multitape Turing Machines
A multitape Turing machine has more than one tape, say k ≥ 2 tapes, each with its own head for
reading and writing. Each tape besides the first one is called a worktape, each of which starts blank,
and the first tape, the input tape, starts with the input string written on it in the same fashion as
a single-tape TM.
Such a machine’s transition function is therefore of the form
δ : (Q \ {qa , qr }) × Γk → Q × Γk × {L, R, S}k
The expression
δ(qi , a1 , . . . , ak ) = (qj , b1 , . . . , bk , S, R, . . . , L)
means that, if the TM is in state qi and heads 1 through k are reading symbols a1 through ak , the
machine goes to state qj , writes symbols b1 through bk , and directs each head to move left or right,
or to stay put, as specified.
Theorem 3.2.1. For every multitape TM M , there is a single-tape TM S such that L(M ) = L(S).
Furthermore, if M is total, then S is total.
0 1 0 1
...
x x x y
...
y y 1
...
M
multitape TM
S
# 0 1 0 1 # x x x y # y y 1 #
...
single-tape TM
Figure 3.3: Single-tape TM simulating a multitape TM.
See Figure 3.3.
Proof. On input w = w1 w2 . . . wn , S begins by replacing the string w1 w2 . . . wn with the string
•
•
•
•
#w1 w2 . . . wn # xy#xy# . . . xy#
|
{z
}
k−1
on its tape. S represents the contents of the k tapes between adjacent # symbols, and the tape
head position of each by the dotted symbol.
Whenever a single transition of M occurs, S must scan the entire tape, using its state to store
all the symbols under each • (since there are only a constant k of them, this can be done in its finite
state set). This information can be stored in the finite states of S since there are a fixed number of
42
CHAPTER 3. THE CHURCH-TURING THESIS
tapes and therefore a fixed number of •’s. Then, S will reset its tape to the start, and move over
each of the representations of the k tapes of M , moving the • appropriately (either left, right, or
stay), and writing a new symbol on the tape at the old location of the •, according M ’s transition
function. Each time the • attempts to move to the right onto the right-side boundary # symbol,
the entire single tape contents to the right of the • are shifted right to make room to represent the
now-expanded worktape contents.
This allows S to represent the configuration of M and to simulate the computation done by M .
Note that S halts if and only if M halts, implying that S is total if M is total.
Some of the homework problems involve programming multitape Turing machines. One of the
goals of this is to instill a sense that, although they look very different from any popular programming language, they are in fact equally as powerful, if tedious to program.
From this point on, we will describe Turing machines in terms of algorithms in pseudocode,
knowing that if we can write an algorithm in some programming language to recognize or decide
a language, then we can write a Turing machine as well. We will only refer to low-level details of
Turing machines when it is convenient to do so; for instance, when simulating an algorithm, it is
easier to simulate a Turing machine than a Python program.
A friend once told me he was glad nobody programmed Turing machines anymore, because it’s
so much harder than a programming language. This is a misunderstanding of why we study Turing
machines. No one ever programmed Turing machines; they are a model of computation whose
simplicity makes them easy to handle mathematically (and whose definition is intended to model
a mathematician sitting at a desk with paper and a pencil), though this same simplicity makes
them difficult to program. We generally use Turing machines when we want to prove limitations on
algorithms. When we want to design algorithms, there is no reason to use Turing machines instead
of pseudocode or a regular programming language.
We will also break from the textbook’s ordering at this point and move on to computational
complexity theory.
Chapter 7
Time Complexity
Computability focuses on which problems are computationally solvable in principle. Computational
complexity focuses on which problems are solvable in practice.
It is common to write algorithms in terms of the data structures they are operating on, even
though these data structures must be encoded in binary before delivering them to a computer (or
in some alphabet Σ before delivering them to a TM). Given any finite object O, such as a string,
graph, tuple, or even a Turing machine, we use the notation hOi to denote the encoding of O as a
string in the input alphabet of the Turing machine we are using. To encode multiple objects, we
use the notation hO1 , O2 , . . . , Ok i to denote the encoding of the objects O1 through Ok as a single
string.
For example, to encode the graph G = (V, E), where V = {v1 , v2 , v3 , v4 } and
E = {{v1 , v2 }, {v1 , v3 }, {v2 , v3 }, {v3 , v4 }},
we can use an adjacency matrix

0
1

1
0
1
0
1
0
1
1
0
1

0
0
,
1
0
which can itself be encoded as a binary string by concatenating the rows: hGi = 0110101011010010.
7.1
Measuring Complexity
Reading assignment: Section 7.1 in Sipser.
Definition 7.1.1. Let M be a TM, and let x ∈ {0, 1}∗ . Define timeM (x) to be the number of
configurations M visits on input x (so that even a TM that immediately accepts takes 1 step rather
than 0)., If M is total, define the (worst-case) running time (or time complexity) of M to be the
function t : N → N defined for all n ∈ N by
t(n) = max timeM (x).
x∈{0,1}n
We call such a function t a time bound.
43
44
CHAPTER 7. TIME COMPLEXITY
LECTURE: end of day 11
Midterm
LECTURE: end of day 12
Definition 7.1.2. Given f, g : N → R+ , we write f = O(g) (or f (n) = O(g(n))), if there exists
c, n0 ∈ N such that, for all n ≥ n0 ,
f (n) ≤ c · g(n).
We say g is an asymptotic upper bound for f .1
Bounds of the form nc for some constant c are called polynomial bounds. (n, n2 , n3 , n2.2 , n1000 ,
δ
etc.) Bounds of the form √2n for some real constant δ > 0 are called exponential bounds. (2n , 22n ,
2
100
2100n , 20.01n , 2n , 2n , 2 n , etc.)
Definition 7.1.3. Given f, g : N → R+ , we write f = o(g) (or f (n) = o(g(n))), if
lim
n→∞
f (n)
= 0.
g(n)
The formal definitions above are “official”, but some shortcuts are handy to remember.
One way to see that f (n) = o(g(n)) is to find h(n) so that f (n) · h(n) = g(n) for some h(n) that
grows unboundedly. For example, suppose we want to compare n and n log n: letting f (n) = n,
g(n) = n log n, and h(n) = log n, since f (n) = g(n) · h(n), and h(n) grows unboundedly, then
f (n) = o(g(n), i.e., n = o(n log n).
Another useful shortcut is to ignore all but the largest terms, and to remove all constants. For
example, 10n7 + 100n4 + n2 + 10n = O(n7 ), and 2n + n100 + 2n = O(2n ). This is useful when
analyzing an algorithm, where you are counting steps from many portions of the algorithm, but
only the “slowest part” (contributing the most number of steps) really makes a difference in the
final analysis.
Finally, taking the log of anything, or taking the square root of anything (more more generally,
raising it to a power smaller than 1) makes it strictly smaller, and raising it to a power larger than 1
√
makes it bigger. So log n = o(n), and n = o(n). Also, log n4 = o(n4 ) (actually, log n4 = 4 log n =
O(log n)). It’s also handy to remember that
• 1 = o(log n),
√
• log n = o( n) (or more generally log n = o(nα ) for any α > 0, e.g., n1/100 ),
1
Because f and g are both positive, we could simplify the definition slightly and say f = O(g) if there exists c
such that, for all n ∈ N, f (n) = g(n). Why?
7.1. MEASURING COMPLEXITY
•
√
45
n = o(n),
√
• nc = o(nk ) if c < k ( n = o(n) is the special case of this for c = 0.5, k = 1),
• nk = o(2δn ) for any k > 0 and any δ > 0 (even if k is huge and δ is tiny, e.g., n100 = o(20.001n .)
f = O(g) is like saying f “≤” g.
f = o(g) is like saying f “<” g.
Based on this analogy, we introduce notations that are analogous to the three other comparison
operators ≥, >, and =:
• f = Ω(g) if and only if g = O(f ) (like saying f “≥” g),
• f = ω(g) if and only if g = o(f ) (like saying f “>” g), and
• f = Θ(g) if and only if f = O(g) and f = Ω(g). (like saying f “=” g; note it is not the same
as saying they are equal! e.g., n2 = Θ(3n2 ) but n2 6= 3n2 ; it’s saying they grow at the same
asymptotic rate)
Example 7.1.4. The following are easy to check.
√
1. n = o(n).
2. n = o(n log log n).
3. n log log n = o(n log n).
4. n log n = o(n2 ).
5. n2 = o(n3 ).
6. For every f : N → R+ , f = O(f ), but f 6= o(f ).
7.1.1
Analyzing Algorithms
The main utility of big-O notation is to make it easier to analyze the complexity of algorithms.
Rather than counting the exact number of steps, which would be tedious, we can “throw away
constants”. It is often much easier to quickly glance at an algorithm and get a rough estimate of
its running time in big-O notation, compared to computing the precise number of steps it takes on
a certain input.
Definition 7.1.5. Let t : N → N be a time bound.2 Define TIME(t) ⊆ P({0, 1}∗ ) to be the set of
decision problems
TIME(t) = { L ⊆ {0, 1}∗ | there is a TM with running time O(t) that decides L } .
2
We will prove many time complexity results that actually do not hold for all functions t : N → N. Many results
hold only for time bounds that are what is known as time-constructible, which means, briefly, that t : N → N has
the property that the related function ft : {0}∗ → {0}∗ defined by ft (0n ) = 0t(n) is computable in time t(n). This is
equivalent to requiring that there is a TM that, on input 0n , halts in exactly t(n) steps. The reason is to require that
for every time bound used, there is a TM that can be used as a “clock” to time the number of steps that another
TM is using, and to halt that TM if it exceeds the time bound. If the time bound itself is very difficult to compute,
then this cannot be done.
46
CHAPTER 7. TIME COMPLEXITY
Observation 7.1.6. For all time bounds t1 , t2 : N → N such that t1 = O(t2 ), TIME(t1 ) ⊆ TIME(t2 ).
The following result is the basis for all complexity theory. We will not prove it in this course
(take 220 for that), but it is important because it tells us that complexity classes are worth studying.
We aren’t just wasting time making up different names for the same class of languages. Informally,
it says that given more time, one can decide more languages.
Theorem 7.1.7 (Deterministic Time Hierarchy Theorem). Let t1 , t2 : N → N be time bounds such
that t1 (n) log t1 (n) = o(t2 (n)). Then TIME(t1 ) ( TIME(t2 ).
For instance, there is a language decidable in time n2 that is not decidable in time n, and
another language decidable in time n3 that is not decidable in time n2 , etc.
Sipser discusses three algorithms for deciding the language { 0n 1n | n ∈ N }, taking time O(n2 )
on a one-tape TM, then a better one taking time O(n log n) on a one-tape TM, then one taking
time O(n) on a two-tape TM, on pages 251-253. Read that section, and after proving the next
theorem, we will analyze the time complexity of algorithms at a high level (rather than at the
level of individual transitions of a TM), assuming, as in ECS 122A, that individual lines of code in
C++, Python, or pseudocode take constant time (so long as those lines are not masking loops or
recursion or something that would not take constant time, such as a Python list comprehension,
which is shorthand for a loop).
Note that since a TM guaranteed to halt in time t(n) is a decider, in complexity theory we
are back to the situation we had with finite automata, and unlike computability theory: we can
equivalently talk about the language recognized by a TM and the language decided by a TM.
7.1.2
Complexity Relationships Among Models
Theorem 7.1.8. Let t : N → N, where t(n) ≥ n. Then every t(n) time multitape TM can be
simulated in time O(t(n)2 ) by a single-tape TM.
Proof. A single-tape TM S can simulate a k-tape TM M with tape content such as (for k =
•
•
•
3) #0100#11xy#00xy10#. M ’s tape heads move right by at most t(n) positions, so S’s tape
)+ k
+ 1}
contents have length at most (
k
·
t(n)
= O(t(n)).
|{z}
| {z
|{z}
number of tapes max size of each of M ’s tape contents
number of #’s
Simulating one step of M requires moving S’s tape by this length and back, which is O(t(n)) time.
Since M takes t(n) steps total, and S takes O(t(n)) steps to simulate each step of M , S takes
O(t(n)2 ) total steps.
7.2
The Class P
Reading assignment: Section 7.2 in Sipser.
All “natural” time bounds we will study, such as n, n log n, n2 , 2n , etc., are time-constructible. Timeconstructibility is an advanced (and boring) issue that we will not dwell on, but it is worth noting that it is possible to
define unnatural time
boundsrelative to which unnatural theorems can be proven, such as a time bound t such that
TIME(t(n)) = TIME 22
2t(n)
. This is an example of what I like to call a “false theorem”: it is true, of course, but
its truth tells us that our model of reality is incorrect and should be adjusted. Non-time-constructible time bounds
do not model anything found in reality.
7.2. THE CLASS P
7.2.1
47
Polynomial Time
Definition 7.2.1.
P=
∞
[
TIME(nk ).
k=1
In other words, P is the class of languages decidable in polynomial time on a deterministic, one-tape
TM. But, for reasons we outline below, the easiest way to think of P is as the class of languages
decidable using only a polynomial number of steps in your favorite programming language.
• We consider polynomial running times to be “small”, and exponential running times to be
“large”.
• For n = 1000, n3 = 1 billion, whereas 2n > number of atoms in the universe
• Usually, exponential time algorithms are encountered when a problem is solved by brute force
search (e.g., searching all paths in a graph, looking for a Hamiltonian cycle). Polynomial
time algorithms are due to someone finding a shortcut that allows the solution to be found
without searching the whole space.
• All “reasonable” models of computation are polynomially equivalent: M1 (e.g., a TM) can
simulate a t(n)-time M2 (e.g., a C++ program) in time p(t(n)), for some polynomial p (usually
not much bigger than O(n2 ) or O(n3 )).
• Most “reasonable” encodings of inputs as binary strings are polynomially equivalent (see
discussion below).
• In this course, we generally ignore polynomial differences in running time. Polynomial differences are important, but we simply focus our lens farther out, to see where in complexity
√
theory the really big differences lie. In ECS 122A, for instance, a difference of a log n or n
factor is considered more significant. In practice, even a constant factor can be significant; for
example the fastest known algorithm for matrix multiplication take time O(n2.373 ), but the
O() hides a rather large constant factor. The naı̈ve O(n3 ) algorithm for matrix multiplication,
properly implemented to take advantage of pipelining and modern memory caches, tends to
outperform the asymptotically faster algorithms even on rather large matrices. So this is a
lesson: computational complexity theory should be the first place you check for an efficient
algorithm for a problem, but it should not be the last place you check.
• In ignoring polynomial differences, we can make conclusions that apply to any model of
computation, since they are polynomially equivalent, rather than having to choose one such
model, such as TM’s or C++, and stick with it. Our goal is to understand computation in
general, rather than an individual programming language.
• One objection is that some polynomial running times are not feasible, for instance, n1000 . In
practice, there are few algorithms with such running times. Nearly every algorithm known
to have a polynomial running time has a running time less than n10 . Also, when the first
polynomial-time algorithm for a problem is discovered, such as the O(n12 )-time algorithm
48
CHAPTER 7. TIME COMPLEXITY
for Primes discovered in 2002,3 it is usually brought down within a few years to a smaller
degree, once the initial insight that gets it down to polynomial inspires further research.
Primes currently is known to have a O(n6 )-time algorithm, and this will likely be improved
in the future.
1. Although TIME(t) is different for different models of computation, P is the same class of
languages, in any model of computation polynomially equivalent to single-tape TM’s (which is
all of them worth studying, except possibly for quantum computers, whose status is unknown).
2. P roughly corresponds to the problems feasibly solvable by a deterministic algorithm.4
We must be careful to use reasonable encodings with the encoding function h·i : D → {0, 1}∗ ,
that maps elements of a discrete set D (such as N, Σ∗ , or the set of all Java programs) to binary
strings. For instance, for n ∈ N, two possibilities are hni1 = 0n (the unary expansion of n)
and hni2 = the binary expansion of n. | hni1 | ≥ 2|hni2 | , so hni1 is a bad choice. Even doing simple
arithmetic would take exponential time in the length of the binary expansion, if we choose the wrong
encoding. Alternately, exponential-time algorithms might appear mistakenly to be polynomial time,
since the running time is a function of the input size, and exponentially expanding the input lowers
the running time artificially, even though the algorithm still takes the same (very large) number of
steps. Hence, an analysis showing the very slow algorithm to be technically polynomial would be
misinforming.
As another example, a reasonable encoding of a directed graph G = (V, E) with V = {0, 1, . . . , n−
1}, is via its adjacency matrix, where for i, j ∈ {1, . . . , n}, the (n · i + j)th bit of hGi is 1 if and only
if (i, j) ∈ E. For an algorithms course, we would care about the difference between this encoding
and an adjacency list, since sparse graphs (those with |E| |V |2 ) are more efficiently encoded by
an adjacency list than an adjacency matrix. But since these two representations differ by at most
a linear factor, we ignore the difference.
To be very technical and precise, the “input size” is the number of bits needed to encode an
input. For a graph G = (V, E), this means | hGi |, where hGi ∈ {0, 1}∗ . For an adjacency matrix
encoding, for example, this would be exactly n2 if n = |V |. However, for many data structures
there is another concept meant by “size”. For example, the “size” of G could refer to the number
of nodes |V |.
The “size” of a list of strings could mean the number of strings in it, which would be smaller
than the number of bits needed to write the whole list down. For example, the list (010, 1001, 11)
has 3 elements, but they have 9 bits in total. Furthermore, there’s no obvious way to encode the list
with only 9 bits; simply concatenating them to 010100111 could encode many lists: (010100111),
or (0, 1, 0, 1, 0, 0, 1, 1, 1). So even more bits are needed to encode them in a way that they can be
delimited; for instance, by “bit-doubling”: The list above can be encoded as 001100 01 11000011
01 1111, where bits 0 and 1 of the encoded string are represented as 00 and 11, respectively, and
3
Here, n is the size of the input, i.e., the number of bits needed to represent an integer p to be tested for primality,
which is ≈ log p.
4
Here, “deterministic” is intended both to emphasize that P does not take into account nondeterminism, which is
an unrealistic model of computation, but also that it does not take into account randomized algorithms, which is a
realistic model of computation. BPP, then class of languages decidable by polynomial-time randomized algorithms,
is actually conjectured to be equal to P, though this has not been proven.
7.2. THE CLASS P
49
01 is a delimiter to mark the boundary between strings. This increases the length of the string by
at most factor 3 (the worst case is a list of single-bit strings).
This frees us to talk about the “size” of an input without worrying too much about whether
we mean the number of bits in an encoding of the input, or whether we mean some more intuitive
notion of size, such as the number of nodes or edges (or both) in a graph, the number of strings in
a list of strings, etc.
In an algorithms course, we would care very much about the differences between these representations, because they could make a larger difference in the analysis of the running time. But in this
course, we observe that all of these differences are within a polynomial factor of each other.5 Therefore, if we can find an algorithm that is polynomial time under one encoding, it will be polynomial
time under the others as well.
We use pseudocode to describe algorithms, knowing that the running time will be polynomially
close to the running time on a single-tape TM.
So to summarize, we choose to study the class P for two main reasons:
1. Historically, when an “efficient” algorithm is discovered for a problem, it tends to be polynomial time.
2. By ignoring polynomial differences, we are able to cast aside detailed differences between
models of computation and particular encodings. This gives us a general theory that applies
to all computers and programming languages.
LECTURE: end of day 13
7.2.2
Examples of Problems in P
Some of the Python code in this chapter can be found in an iPython notebook:
https://github.com/dave-doty/UC-Davis-ECS120/blob/master/120.ipynb
If you just want to read the code without running it, you can look at it directly on GitHub by
clicking on the link above.
To try running it yourself, download the 120.ipynb file above. (Go to https://github.
com/dave-doty/UC-Davis-ECS120, right/option-click the file 120.ipynb, and select “Save link
as...”/“Download linked file as...”)
If you have ipython installed (there are a few ways to do this; I recommend Anaconda: https://
www.continuum.io/downloads), open a command line in the directory where 120.ipynb is stored
and type ipython notebook 120.ipynb
5
To say that two time bounds t and s are “within a polynomial factor” of each other means that (assuming s is
the smaller of the two) t = O(s), or t = O(s2 ), or t = O(s3 ), etc. For example, all polynomial time bounds are within
a polynomial factor of each other. Letting s(n) = 2n and t(n) = 23n , t and s are within a polynomial factor because
t(n) = 23n = (2n )3 = O(s3 ).
50
CHAPTER 7. TIME COMPLEXITY
If you don’t have ipython installed, go to https://try.jupyter.org/, click the “upload”
button, and choose the 120.ipynb file from your local computer. Once it’s uploaded, click on it in
your browser.
Paths in graphs. Define
Path = { hG, s, ti | G is a directed graph with a path from node s to node t } .
Theorem 7.2.2. Path ∈ P.
Proof. Breadth-first search:
1 import collections
2 def path (G ,s , t ) :
3
""" Check if there is a path from node s to node t in graph G . """
4
V,E = G
5
visited = []
6
queue = collections . deque ([ s ])
7
while queue :
8
node = queue . popleft ()
9
if node == t :
10
return True
11
if node not in visited :
12
visited . append ( node )
13
node_neighbors = [ v for (u , v ) in E if u == node ]
14
for neighbor in node_neighbors :
15
if neighbor not in visited :
16
queue . append ( neighbor )
17
return False
How long does this take? Assume we have n nodes and m = O(n2 ) edges. Each node goes into the
queue at most once, so the loop at line (7) executes O(n) iterations. It takes time O(n) to check
whether a node is in the list visited. It takes time O(m) for line (13) to find all the neighbors of
a given node. The inner loop at line (14) executes at most n times, and the check for membership
in visited takes O(n) time. Putting this all together, it takes time O(n · (n + m + n · n)) = O(n3 ),
which is polynomial time.
We can test this out on a few inputs:
1 V = [1 ,2 ,3 ,4 ,5 ,6 ,7 ,8]
2 E = [ (1 ,2) , (2 ,3) , (3 ,4) , (4 ,5) , (6 ,7) , (7 ,8) , (8 ,6) ]
3 G = (V , E )
4 print " path from {} to {}? {} " . format (4 , 1 , path (G , 4 , 1) )
5 print " path from {} to {}? {} " . format (1 , 4 , path (G , 1 , 4) )
Let’s try it on an undirected graph:
7.2. THE CLASS P
51
1 def make_undirected ( G ) :
2
""" Makes graph undirected by adding reverse edges . """
3
V,E = G
4
new_edges = []
5
for (u , v ) in E :
6
if (v , u ) not in E and (v , u ) not in new_edges :
7
new_edges . append (( v , u ) )
8
return (V , E + new_edges )
9
10 G = make_undirected ( G )
11 print " path from {} to {}? {} " . format (4 , 1 , path (G , 4 , 1) )
12 print " path from {} to {}? {} " . format (1 , 4 , path (G , 1 , 4) )
13 print " path from {} to {}? {} " . format (4 , 7 , path (G , 4 , 7) )
There are an exponential number of simple paths from s to t in the worst case ((n − 2)! paths
in the complete directed graph with n vertices), but we do not examine them all in a breadth-first
search. The BFS takes a shortcut to zero in on one particular path in polynomial time.
Of course, we know from studying algorithms that with an appropriate graph representation,
such as an adjacency list, and with a more careful analysis that doesn’t make so many worst-case
assumptions, this time can be reduced to O(n + m). But in computational complexity theory, since
we have decided in advance to consider any polynomial running time to be “efficient”, we can often
be quite lazy in our analysis and choice of data structures and still obtain a polynomial running
time.
Relatively prime integers. Given x, y ∈ Z+ , we say x and y are relatively prime if the largest
integer dividing both of them is 1. Define
RelPrime = { hx, yi | x and y are relatively prime } .
Theorem 7.2.3. RelPrime ∈ P.
Proof. Since the input size is | hx, yi | = O(log x + log y) (the number of bits needed to represent x
and y), we must be careful to use an algorithm that is polynomial in n = | hx, yi |, not polynomial
in x and y themselves, which would be exponential in the input size.
Euclid’s algorithm for finding the greatest common divisor of two integers works.
1 def gcd (x , y ) :
2
""" Euclid ’s algorithm for greatest common divisor . """
3
while y >0:
4
x = x % y
5
x,y = y,x
6
return x
7
8 def rel_prime (x , y ) :
9
return gcd (x , y ) == 1
52
10
11
12
CHAPTER 7. TIME COMPLEXITY
print gcd (24 ,60) , rel_prime (24 ,60)
print gcd (25 ,63) , rel_prime (25 ,63)
Each iteration of the loop of gcd cuts the value of x in half, for the following reason. If y ≤ x2 ,
then x mod y < y ≤ x2 . If y > x2 , then x mod y = x − y < x2 . Therefore, at most log x <
| hx, yi | iterations of the loop execute, and each iteration requires O(1) arithmetic operations, each
polynomial-time computable, whence rel_prime is polynomial time.
Note that there are an exponential (in | hx, yi |) number of integers that could potentially be
divisors of x and y (namely, all the integers less than min{x, y}), but Euclid’s algorithm does not
check all of them to see if they divide x or y; it uses a shortcut.
Connected graphs. Recall that an undirected graph is connected if every node has a path to
every other node. Define
Connected = { hGi | G is connected undirected graph } .
Theorem 7.2.4. Connected ∈ P.
Proof. G = (V, E) is connected if, for all s, t ∈ V , hG, s, ti ∈ Path:
1 import itertools
2 def connected ( G ) :
3
""" Check if G is connected . """
4
V,E = G
5
for (s , t ) in itertools . combinations (V ,2) :
6
if not path (G ,s , t ) :
7
return False
8
return True
Let n = |V | Since Path ∈ P, and connected calls path O(n2 ) times, connected takes polynomial
time.
Eulerian paths in graphs. Recall that a Eulerian path in a graph visits each edge exactly once.
Define
EulerianPath = { hGi | G is a directed graph with an Eulerian path } .
Theorem 7.2.5. EulerianPath ∈ P
Proof. Euler showed that an undirected graph has an Eulerian path if and only if exactly zero or
two vertices have odd degree, and all of its vertices with nonzero degree belong to a single connected
component. Thus, we can check the degree of each node to verify the first part, and make a new
graph of all the nodes with positive degree, ensuring it is connected, to check the second part.
7.3. THE CLASS NP
53
1 def degree ( node , E ) :
2
""" Return degree of node . """
3
return sum (1 for (u , v ) in E if u == node )
4
5 def eulerian_path ( G ) :
6
""" Check if G has an Eulerian path . """
7
V,E = G
8
num_deg_odd = 0
9
V_pos = [] # nodes with positive degree
10
for u in V :
11
deg = degree (u , E )
12
if deg % 2 == 1:
13
num_deg_odd += 1
14
if deg > 0:
15
V_pos . append ( u )
16
if num_deg_odd not in [0 ,2]:
17
return False
18
G_pos = ( V_pos , E )
19
return connected ( G_pos )
Let n = |V | and m = |E|. The loop executes n = |V | iterations, each of which takes O(m) time to
execute degree, which iterates over each of the m edges. So the loop takes O(nm) time, which is
polynomial. It also calls connected on a potentially smaller graph than G, which we showed also
takes polynomial time.
7.3
The Class NP
Reading assignment: Section 7.3 in Sipser.
Some problems have polynomial-time deciders. Other problems are not known to have polynomialtime deciders, but given a candidate solution to the problem (an informal notion that we will make
formal later), in polynomial time the solution’s validity can be verified.
A Hamiltonian path in a directed graph G is a path that visits each node exactly once. Define
HamPath = { hGi | G is a directed graph with a Hamiltonian path } .
HamPath is not known to have a polynomial-time algorithm (and it is generally believed not to
have one), but, a related problem, that of verifying whether a given path is a Hamiltonian path in
a given graph, does have a polynomial-time algorithm:
HamPathV = { hG, pi | G is a directed graph with the Hamiltonian path p } .
The algorithm simply verifies that each adjacent pair of nodes in p is connected by an edge in G
(so that p is a valid path in G), and that each node of G appears exactly once in p (so that p is
Hamiltonian).
54
CHAPTER 7. TIME COMPLEXITY
1 def ham_path_verify (G , p ) :
2
""" Verify that p is a Hamiltonian path in G . """
3
V,E = G
4
# verify each pair of adjacent nodes in p shares an edge
5
for i in range ( len ( p ) - 1) :
6
if ( p [ i ] , p [ i +1]) not in E :
7
return False
8
# verify p and V have same number of nodes
9
if len ( p ) != len ( V ) :
10
return False
11
# verify each node appears at most once in p
12
if len ( set ( p ) ) != len ( p ) :
13
return False
14
return True
Let’s try it out on some candidate paths:
1 V = [1 ,2 ,3 ,4 ,5]
2 E = [ (1 ,2) , (2 ,4) , (4 ,3) , (3 ,5) , (3 ,1) ]
3 G = (V , E )
4
5 p_bad = [1 ,2 ,3 ,4 ,5]
6 print ham_path_verify (G , p_bad )
7
8 p_bad2 = [1 ,2 ,4 ,3 ,1 ,2 ,4 ,3 ,5]
9 print ham_path_verify (G , p_bad2 )
10
11 p_good = [1 ,2 ,4 ,3 ,5]
12 print ham_path_verify (G , p_good )
Another problem with a related polynomial-time verification language is
Composites = hni n ∈ Z+ and n = pq for some integers p, q > 1 ,
with the related verification language
CompositesV = hn, pi n, p ∈ Z+ , p divides n, and 1 < p < n .
Actually, Composites ∈ P, so sometimes a problem can be decided and verified in polynomial time.
The algorithm deciding Composites in polynomial time is a bit more complex (not terribly so, see
https://en.wikipedia.org/wiki/AKS_primality_test#The_algorithm). Here’s the verifier:
1 def composite_verify (n , d ) :
2
""" Verify that d is a nontrivial divisor of n . """
3
if not 1 < d < n :
4
return False
5
return n % d == 0
7.3. THE CLASS NP
55
Let’s try it out on some integers:
1 print composite_verify (15 , 3)
2 for d in range (1 ,20) : # 17 is prime so these should all be false
3
print composite_verify (17 , d ) ,
LECTURE: end of day 14
We now formalize these notions.
Definition 7.3.1. A polynomial-time verifier for a language A is an algorithm V , where there are
polynomials p, q such that V runs in time p and
o
n
A = x ∈ {0, 1}∗ ∃w ∈ {0, 1}≤q(|x|) V accepts hx, wi .
That is, x ∈ A if and only if there is a “short” string w where hx, wi ∈ L(V ) (where “short” means
bounded by a polynomial in |x|). We call such a string w a witness (or a proof or certificate) that
testifies that x ∈ A.
Example 7.3.2. For a graph G with a Hamiltonian path p, hpi is a witness testifying that G has
a Hamiltonian path.
Example 7.3.3. For a composite integer n with a divisor 1 < d < n, hdi is a witness testifying
that n is composite. Note that n may have more than one such divisor; this shows that a witness
for an element of a polynomially-verifiable language need not be unique.
Definition 7.3.4. NP is the class of languages that have polynomial-time verifiers. We call the
language decided by the verifier the verification language of the NP language.
For instance, HamPathV is the verification language of HamPath.
What, intuitively, is this class NP? And why should anyone care about it? It is a very natural
notion, even if its mathematical definition looks unnatural.
In summary:
• P is the class of decision problems that can be decided quickly.
• NP is the class of decision problems that can be verified quickly, given a purported solution
(the “witness”).
NP is, in fact, a very natural notion, because it is far more common than not that real problems
we want to solve have this character that, whether or not we know how to find a solution efficiently,
we know what a correct solution looks like, i.e., we can verify efficiently whether a purported solution
is, in fact, correct.
56
CHAPTER 7. TIME COMPLEXITY
For instance, it may not be obvious to me, given an encrypted file C, how to find the decrypted
version of it. But if you give me the decrypted version P and the encryption key k, I can run the
encryption algorithm E to verify that E(P, k) = C.
Even problems that don’t look like decision problems (such as optimization problems) can be recast as decision problems, and when we do, they tend also to have this feature of being efficiently
checkable. For instance, suppose I want to find the cheapest flights from Sacramento to Rome.
Casting this as a decision problem is often done by adding a new input, a threshold value, and
asking whether there are flights costing under the threshold: e.g., given two cities C1 and C2 and
a budget of b dollars, we ask whether there is a sequence of flights from C1 to C2 costing at most b
dollars. It may not be obvious to me how to find an airline route from Sacramento to Rome that
costs less than $1200. But if you give me a sequence of flights with their ticket prices, I can easily
check that their total cost is ≤ $1200, that they start at Sacramento and end at Rome, and that
the destination airport of each intermediate leg is the starting airport of the next leg.
For whatever reason, this feature of “efficient checkability” occurs more often than not when
it comes to natural decision problems that we care about solving. It seems that most “natural”
problems can have their solutions verified efficiently (those problems are in NP). However, what we
care about is efficiently finding solutions to problems (showing a problem is in P). The P versus NP
question is essentially asking whether we are doomed to fail in some of the latter cases, i.e, whether
there are problems whose solutions can be verified efficiently (which is most of the problems we
encounter “in the wild”), but those solutions cannot be found efficiently.
NP stands for nondeterministic polynomial time, not for “non-polynomial time”. In fact, P ⊆
NP, so some problems in NP are solvable in polynomial time, in addition to being merely verifiable
in polynomial time. For instance, we can efficiently find a path p from s to t in a graph G, and
we can also easily verify that p is actually a path from s to t in G. The name comes from a
characterization of NP in terms of nondeterministic polynomial-time machines, which we discuss
after the next few examples of more problems in NP.
7.3.1
(More) Examples of Problems in NP
A clique in a graph is a subgraph in which every pair of nodes in the clique is connected by an
edge. A k-clique is a clique with k nodes. Define
Clique = { hG, ki | G is an undirected graph with a k-clique } .
Theorem 7.3.5. Clique ∈ NP.
Proof. The following is a polynomial-time verifier for Clique, taking a set C of nodes as the
witness:
1 import itertools
2 def clique_v (G , k , C ) :
3
V,E = G
4
# verify C is the correct size
5
if len ( C ) != k :
6
return False
7.3. THE CLASS NP
7
8
9
10
11
57
# verify each pair of nodes in C shares an edge
for (u , v ) in itertools . combinations (C , 2) :
if (u , v ) not in E :
return False
return True
Try it out on some examples:
1
2
3
4
5
6
7
8
9
10
V
E
G
G
=
=
=
=
[1 ,2 ,3 ,4 ,5 ,6]
[(1 ,2) , (1 ,3) , (1 ,4) , (2 ,3) , (2 ,4) , (3 ,4) , (4 ,5) , (5 ,6) , (4 ,6) ]
(V , E )
make_undirected ( G )
C = [1 ,2 ,3 ,4]
print " {} is a {} - clique in G ? {} " . format (C , clique_v (G , len ( C ) , C ) )
C = [3 ,4 ,5]
print " {} is a {} - clique in G ? {} " . format (C , clique_v (G , len ( C ) , C ) )
The SubsetSum problem concerns integer arithmetic: given a collection (set with repetition)
of integers x1 , . . . , xk and a target integer t, is there a subcollection that sums to t?
(
)
C = {x1 , . . . , xk } is a collection of
P
SubsetSum = hC, ti integers and (∃S ⊆ C) t =
.
y
y∈S
For example, h{4, 4, 11, 16, 21, 27}, 29i ∈ SubsetSum because 4+4+21 = 29, but h{4, 11, 16}, 13i 6∈
SubsetSum.
Theorem 7.3.6. SubsetSum ∈ NP.
Proof. The following is a polynomial-time verifier for SubsetSum:
1 import itertools
2 def subset_sum_v (C , t , S ) :
3
""" Verifies that S is a subcollection of C summing to t . """
4
if sum ( S ) != t : # check sum
5
return False
6
C = list ( C ) # make copy that we can change
7
for n in S : # ensure S is a subcollection of C
8
if n not in C :
9
return False
10
C . remove ( n )
11
return True
58
CHAPTER 7. TIME COMPLEXITY
Note that the complements of these languages, Clique and SubsetSum, are not obviously
members of NP (and are believed not to be). The class coNP is the class of languages whose
complements are in NP, so Clique, SubsetSum ∈ coNP. It is not known whether coNP = NP, but
this is believed to be false, although proving that is at least as difficult as proving that P 6= NP:
since P = coP, if coNP 6= NP, then P 6= NP.
3.2
Variants of Turing Machines
We skipped nondeterministic TMs on our first run through chapter 3.
3.2.1
Nondeterministic Turing Machines
We may define a nondeterministic TM (NTM) in an analogous fashion to an NFA. Its transition
function is of the form
δ : (Q \ {qa , qr }) × Γ → P(Q × Γ × {L, R})
A NTM accepts a string x if any of its computation paths on input x accepts, where a computation path for x is a sequence of configurations, starting with the initial configuration representing x, corresponding to one sequence of nondeterministic choices. Note that it may be possible for some NTM to halt on some computation paths for x, and not others. For example, if
δ(s, b) = {(s, b, R), (qA , b, R)}, then the NTM has both finite and infinite computation paths (no
matter the input).
An NTM is total if every computation path for every input is finite.
Rather than having only a single sequence of configurations, an NTM, together with an input,
defines a tree of configurations. See Figure 3.1.
Theorem 3.2.2. For every NTM N , there is a TM M such that L(N ) = L(M ). Furthermore, if
N is total, then M is total.
Proof. Let GN = (V, E) be a directed graph, where V = set of N ’s configurations, and (C, C 0 ) ∈ E
if C → C 0 .6 Let Cx be the initial configuration of N on input x ∈ Σ∗ . M on input x conducts a
breadth-first search of GN starting at Cx . If an accepting configuration is found, M accepts x.
Now suppos N is total. Then the connected component of Cx is finite. If no computation path
is accepting, M ’s search will terminate, and M rejects x. Since M accepts or rejects every string,
M is total.
The following Python code implements this idea. First, we need a function that can apply a
nondeterministic transition function to a configuration to get a list of all possible next configurations:
1 def next_configs ( delta , config ) :
2
""" Apply delta to configuration ( state , pos , content )
3
to get list of possible next configurations . """
6
Nodes with more than one out-neighbor represent nondeterministic choices, and nodes with no out-neighbors
represent halting configurations.
3.2. VARIANTS OF TURING MACHINES
59
initial configuration
time
nondeterministic
transition
deterministic
transition
YES
NO
NO
NO
NO
YES
Figure 3.1: Tree of configurations of a NTM. The root is the initial configuration, the leaves are
halting configurations, and multiple children of a node correspond to nondeterministic transitions.
4
5
6
7
8
9
10
11
12
13
14
state , pos , content = config # give names to each part of triple
configs = []
symbol = content [ pos ]
for ( new_state , new_symbol , move ) in delta ( state , symbol ) :
new_content = content [: pos ] + new_symbol + content [ pos +1:]
new_pos = max (0 , pos + move )
if new_pos >= len ( new_content ) :
new_content += " -"
config = ( new_state , new_pos , new_content )
configs . append ( config )
return configs
Now we apply Theorem 3.2.2 to a particular NTM, represented by its transition function. The
example NTM N is rather boring: it simply replaces its input string with a nondeterministically
chosen binary string of the same length and accepts. Moves are represented as 1 for right and −1
for left. it assumes N ’s start state is the string "q0" and the accept state is "qA".
1 import collections
2
3 def M ( x ) :
4
""" An NTM that replaces its input with a
5
nonde t e r m i n i s t i c a l l y chosen string of the same length
60
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
CHAPTER 7. TIME COMPLEXITY
accepts if it ever guessed a 1 , otherwise rejects """
init_config = ( " q0 " , 0 , x + " _ " )
def delta ( state , symbol ) :
# if in halting state , no next configuration
if state in [ " qA " , " qR " ]:
return []
if symbol == " _ " : # if blank , halt
if state == " q1 " :
return [( " qA " , " _ " , -1) ]
elif state == " q0 " :
return [( " qR " , " _ " , -1) ]
else :
# write a 0 or 1 over the current symbol
# remember if a 1 is written
if state == " q0 " :
return [ ( " q0 " , " 0 " , 1) , ( " q1 " , " 1 " , 1) ]
elif state == " q1 " :
return [ ( " q1 " , " 0 " , 1) , ( " q1 " , " 1 " , 1) ]
assert False
visited = []
queue = collections . deque ([ init_config ])
while queue :
config = queue . popleft ()
if config [0] == " qA " :
return True
if config not in visited :
visited . append ( config )
# get all possible next configurations ; see below
neighbors = next_configs ( delta , config )
for neighbor in neighbors :
if neighbor not in visited :
queue . append ( neighbor )
return False
Note we do not first build the graph of all configurations, and then do a breadth-first search
on it. Instead, we do the breadth-first search starting at the initial configuration, dynamically
generating new configurations from the transition function as they are reached.
Now, back to the realm of polynomial-time computations.
7.3. THE CLASS NP
7.3
7.3.1
61
The class NP
Nondeterministic polynomial-time Turing machines
The following NTM ham_path_ntm (with the nondeterminism encoded using Python’s random
library) decides HamPath in polynomial time.
1 import random
2
3 def ham_path_ntm ( G ) :
4
""" " NTM " ( implemented via random ) for deciding HamPath . """
5
V,E = G
6
# guess a path p that visits each node exactly once
7
p = list ( V )
8
random . shuffle ( p )
9
# verify each pair of adjacent nodes in pi shares an edge
10
for i in range ( len ( p ) - 1) :
11
if ( p [ i ] , p [ i +1]) not in E :
12
return False
13
return True
Note what ham_path_ntm does: it randomly guesses a witness p = (v1 , . . . , vn ) and then runs
the HamPath verification algorithm with hG, pi.
Implementing nondeterminism with the random library makes for short and easy-to-read Python
code; however, we must caution that randomness and nondeterminism are two different (though
related) concepts. In particular, running the above code, which generates a witness at random, is
very likely to guess incorrectly. However, running it enough times makes it likely that at least one
time will randomly guess a Hamiltonian path:
1
2
3
4
5
V =
E =
G =
for
[1 ,2 ,3 ,4 ,5]
[ (1 ,2) , (2 ,4) , (4 ,3) , (3 ,5) , (3 ,1) ]
(V , E )
_ in range (500) :
print ham_path_ntm ( G ) ,
Of course, it would be better simply to iterate through possible witnesses one by one, rather
than randomly guessing, because then we would be guaranteed not to repeat a witness wastefully.
However, Python (like most languages) has no “nondeterministic transition” functionality built-in,
which is why we choose the random library above to illustrate the idea. HOWEVER, that does not
mean that we have a “randomized algorithm for Hamiltonian path”. The program ham_path_ntm
has a large probability of an error when the answer is “yes”. If there is a Hamiltonian path, an
error corresponds to randomly guessing a sequence of nodes that is not a Hamiltonian path, which
is very likely in many cases, even though there is a small, but positive, probability of guessing the
Hamiltonian path. (aside question: what is the sort of graph in which any guess would actually
succeed?) Randomized algorithms, however, are typically algorithms that have a small probability
of an error for all inputs, unlike the above code.
62
CHAPTER 7. TIME COMPLEXITY
The structure of ham_path_ntm—nondeterministically guess witness, then run a verifier on the
input string and the witness—is no coincidence: polynomial-time verifiers and polynomial-time
NTMs define precisely the same class of languages, NP.
Theorem 7.3.1. A language is in NP (i.e., has a polynomial-time verifier) if and only if it is
decided by some polynomial-time NTM.
skip proof in lecture
Proof. We prove each direction separately.
(NTM =⇒ verifier): Let A be decided by a NTM N running in time mk on inputs of length
m.7
Define the verifier V as follows
V = “On input hx, wi,
1. Run N on input x, and whenever N comes to a nondeterministic choice, use the next
bit of w to make the choice.
2. If N accepts, accept, otherwise, reject.”
Let m = |x|. Then |w| ≤ mc for some constant c. Let n = | hx, wi | = 2m + 2 + |w| ≥ m.
N (x) runs in time mk , so as a function of V ’s input size n, V runs in time mk ≤ nk , so V
runs in polynomial time.
(verifier =⇒ NTM): The idea is illustrated in Figure 7.2.
Let A ∈ NP be verified by verifier V accepting a string x and a witness w of length at most
q(|x|) for some polynomial q and running in time nk , where n = | hx, wi |. Define the NTM
N as follows
N = “On input x,
1. Nondeterministically choose a string w of length at most q(|x|).8
2. Run V on input hx, wi.
3. If V accepts, accept, otherwise, reject.”
Let m = |x|. Suppose q(n) = nc for some constant c. Since V runs in time nk , and N
provides V with an input of length | hx, wi | = 2|x| + 2 + |w| ≤ 2m + 2 + mc , V runs in time
(2m + 2 + mc )k = O(mck ), which is polynomial in m. Therefore N runs in polynomial time
(in its own input length m).
7
First note that, although we have defined NTM’s so that they might make nondeterministically choose from
among more than two choices at a time, we can always model an NTM as if, each time it makes a choice, it is
choosing between exactly two possible paths. To simulate choosing among, for instance, six choices, we alter the
state set and transition function to break this one choice into three choices, each with only two options. The 3-bit
string representing the result of these three choices can encode all six possibilities of the original choice, and we can
simply let the extra two leftover strings indicate an immediate transition to the reject state.
8
Use nondeterminism to choose each bit, and also to choose whether to continue selecting bits at each step, so
that a shorter string can also be chosen.
7.3. THE CLASS NP
63
input x
Nondeterministic computation
vs. deterministic computation
with a witness
guess witness
A ∈ NP iff there is a
polynomial q and AV ∈ P
so that for all x,
time
w ∈ {0,1}≤q(|x|)
decide whether
(x,w) ∈ AV
x∈A
⇔
(∃w ∈ {0,1}≤q(|x|)) (x,w) ∈ AV
NO
YES
NO
YES
NO
NO
YES
Figure 7.2: Illustration of how a polynomial-time verifier is turned into a polynomial-time NTM.
The nondeterminism happens “up-front”: on input x, the NTM guesses a witness w (top part of
tree), then runs a deterministic verification algorithm on hx, wi.
That is, NTMs can be simulated by deterministic TMs with at most an exponential blowup in
the running time.9 Much of the focus of this chapter is in providing evidence that this exponential
blowup is fundamental, and not an artifact of the above proof that can be avoided through a clever
trick.
Definition 7.3.2. NTIME(t) = { L ⊆ {0, 1}∗ | L is decided by a O(t)-time NTM } .
S
k
Corollary 7.3.3. NP = ∞
k=1 NTIME(n ).
We will mostly use the verification characterization of NP.10
9
Compare this to the exponential blowup in states of the subset construction; the reasons are similar, in that n
binary choices leads to 2n possibilities.
10
Most proofs showing that a language is in NP that claim to use the NTM characterization actually use an algorithm
similar to that for HamPath: on input x, nondeterministically guess a witness w, then run the verification algorithm
on x and w. In other words, they implicitly use the verification characterization by structuring the algorithm as in
the proof of Theorem 7.3.1; the only point of the nondeterminism is to choose a witness at the start of the algorithm,
which is then provided to the verification algorithm.
I consider the verification characterization to be cleaner, as it separates these two conceptually different ideas –
producing a witness that a string is in the language, and then verifying that it is actually a “truthful” witness – into
two separate steps, rather than mingling the production of the witness with the verification algorithm into a single
algorithm with nondeterministic choices attempting to guess bits of the witness while simultaneously verifying it.
Verifiers also make more explicit what physical reality is being modeled by the class NP. That is, there is no such
64
CHAPTER 7. TIME COMPLEXITY
Theorem 7.3.4. Let t : N → N, where t(n) ≥ n. Then for every t(n) time NTM N , there is a
constant c such that N can be simulated in time 2ct(n) by a deterministic TM.
We can prove this two ways, one by using the NTM definition directly, and (assuming we only
care about NTMs running in polynomial time) the other using the verification characterization of
NP.
Skip general proof in lecture and go straight to example with Hamiltonian path.
Proof using NTM definition directly. We examine the construction in the proof of Theorem 3.2.2,
and show that it incurs an exponential overhead.
Let M be the deterministic TM simulating the NTM N , let x ∈ {0, 1}∗ , and let Tx be the BFS
tree created when searching the configuration graph of N on input x. If |x| = n, every branch of
Tx has length at most t(n). If each nondeterministic transition in δ has ≤ b choices, each node
in Tx has ≤ b children (“branching factor b”). A tree with branching factor b and depth ≤ t has
≤ bt = (2log b )t = 2t log b total nodes. We showed (with a very lazy algorithmand analysis) that BFS
on a graph with n nodes takes time O(n3 ), so this takes time O (2t log b )3 = O(2t3 log b ). Letting
c = 3 log b, this is O(2ct(n) ).
Proof using verification characterization of NP when t(n) is a polynomial. To solve a problem in
NP deterministically, iterate over all possible witnesses, and run the verifier on each of them. Let
A ∈ NP. Then there are polynomials t, q and a O(t(n))-time verifier VA such that, for all x ∈ {0, 1}∗
(letting n = |x|), x ∈ A ⇐⇒ (∃w ∈ {0, 1}≤q(|x|) ) V (x, w) accepts. The following algorithm decides
if the latter condition is true (assume both t(n) and q(n) are equal to nc for some constant c; if
one exponent is smaller, the argument still works if we make the exponent larger):
1 import itertools
2
3 def b i na r y _ s t r i n g s _ o f _ l e n g t h ( length ) :
4
""" Generate all strings of a given length m """
5
return map ( lambda lst : " " . join ( lst ) ,
6
itertools . product ([ " 0 " ," 1 " ] , repeat = length ) )
7
8 def A_decider ( x ) :
9
""" Exponential - time algorithm for finding witnesses . """
10
n = len ( x )
11
for m in range ( n ** c + 1) : # check lengths m in [0 ,1 ,... , n ^ c ]
12
for w in b i n a r y _ s t r i n g s _ o f _ l e n g t h ( m ) :
13
if V_A (x , w ) :
14
return True
15
return False
thing as a nondeterministic polynomial-time machine that can be physically implemented. Verification algorithms,
however, can be implemented, without having to introduce a new “magic” ability to guess nondeterministic choices;
the model simply states that the witness is provided as an input, without specifying the source of this input.
7.3. THE CLASS NP
65
We now analyze the running
The two loops iterate over all binary strings of length at most
Pnc time.
c
c
c
m
n , of which there are m=0 2 = 2n +1 − 1 = O(2n ). Each iteration calls VA (x, w), which takes
c
c
time t(n) = nc . Then the total time is O(nc · 2n ) = O(22n ).
This is shown by example for HamPath below. It is a bit more efficient than above: rather
than searching all possible witness binary strings up to some length, it uses the fact that we know
any potential witness will encode a list of nodes, that list will have length n = |V |, and it will be
a permutation of the nodes (each node in V will appear exactly once). (ham_path_verify is the
same as the verification algorithm we used above, except that the checks intended to ensure that
p is a permutation of V have been removed for simplicity, since we are calling ham_path_verify
only with p = a permutation of V )
1 import itertools
2 def ham_path_verify (G , p ) :
3
""" Verify that a permutation p of nodes in V is a Hamiltonian
4
path in G . """
5
V,E = G
6
# verify each pair of adjacent nodes in p shares an edge
7
for i in range ( len ( p ) - 1) :
8
if ( p [ i ] , p [ i +1]) not in E :
9
return False
10
return True
11
12 def ham_path ( G ) :
13
""" Exponential - time algorithm for finding Hamiltonian paths ,
14
which calls the verifier on all potential witnesses . """
15
V,E = G
16
for p in itertools . permutations ( V ) :
17
if ham_path_verify (G , p ) :
18
return True
19
return False
LECTURE: end of day 15
7.3.2
The P Versus NP Question
Exactly one of the two possibilities shown in Figure 7.3 is correct.
The best known method for solving NP problems in general is the one employed in the proof
of Theorem 7.3.4: a brute-force search over all possible witnesses, giving each as input to the
verifier. Since we require that each witness for a string of length n has length at most p(n) for some
66
CHAPTER 7. TIME COMPLEXITY
NP
P=NP
P
Figure 7.3: We know that P ⊆ NP, so either P ( NP, or P = NP.
polynomial p, there are at most 2p(n) potential witnesses, which implies that every NP problem can
be solved in exponential time by searching through all possible witnesses; that is,
NP ⊆ EXP =
∞
[
k
TIME 2n ,
k=1
No one has proven that NP 6= EXP. It is known that P ⊆ NP ⊆ EXP, and since the Time
Hierarchy Theorem tells us that P EXP, it is known that at least one of the inclusions P ⊆ NP
or NP ⊆ EXP is proper, though it is not known which one. It is suspected that they both are; i.e.,
that P NP EXP.
7.4
NP-Completeness
Reading assignment: Section 7.4 in Sipser.
We now come to the only reason that any computer scientist is concerned with the class NP:
the NP-complete problems. (More justification of that claim in Section 7.5.2)
Intuitively, a problem is NP-complete if it is in NP, and every problem in NP is reducible to it
in polynomial time. This implies that the NP-complete problems are, “within a polynomial error,
the hardest problems” in NP. If any NP-complete problem has a polynomial-time algorithm, then
all problems in NP do, including all the other NP-complete problems. The contrapositive is more
interesting because it is stated in terms of two claims we think are true, rather than two things we
think are false:11 if P 6= NP, then no NP-complete problem is in P.
This gives us a tool by which to prove that a problem is “probably” (so long as P 6= NP)
intractable: show that it is NP-complete. Just as with undecidability, it takes some big insights
to prove that one problem is NP-complete, but once this is done, we can use the machinery of
reductions to show that other languages are NP-complete, by showing how they can be used to solve
a problem already known to be NP-complete. Unlike with undecidability, in which we established
11
Statements such as “(Some NP-complete problem is in P) =⇒ P = NP” are often called, “If pigs could whistle,
then donkeys could fly”-theorems.
7.4. NP-COMPLETENESS
67
that ATM is undecidable before doing anything else, we will first define some problems and show
reductions between them, before finally proving that one of them is NP-complete.
The first problem concerns Boolean formulas. We represent True with 1, False with 0, And
with ∧, Or with ∨, and Not with ¬ (or with an overbar such as 0 to mean ¬0):
0∨0=0
0∨1=1
1∨0=1
1∨1=1
0 ∧ 0 = 0 ¬0 = 1
0 ∧ 1 = 0 ¬1 = 0
1∧0=0
1∧1=1
A Boolean formula is an expression involving Boolean variables and the three operations ∧, ∨, and
¬. For example,
φ = (x ∧ y) ∨ (z ∧ ¬y)
is a Boolean formula. More formally, given a finite set V of variables (we write variables as single
letters such as a and b, sometimes subscripted, e.g., x1 , . . . , xn for n variables), a Boolean formula
over V is either 0) (base case) a variable x ∈ V , or 1) (recursive case 1) ¬φ, where φ is a Boolean
formula over V (called the negation of φ), 2) (recursive case 2) φ ∧ ψ, where φ and ψ are Boolean
formulas over V , (called the conjunction of φ and ψ), 3) (recursive case 3) φ ∨ ψ, where φ and ψ
are Boolean formulas over V . (called the disjunction of φ and ψ), ¬ takes precedence over both,
and ∧ takes precedence over ∨. Parentheses may be used to override default precedence.
The following is a simple implementation of Boolean formulas as a Python class using the
recursive definition.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class Boolean_formula ( object ) :
""" Represents a Boolean formula with AND , OR , and NOT operations . """
def __init__ ( self , variable = None , op = None , left = None , right = None ) :
if not ( ( variable == None and op == " not " and left != None and right ==
None ) or
( variable == None and op == " and " and left != None and right !=
None ) or
( variable == None and op == " or " and left != None and right !=
None ) or
( variable != None and op == left == right == None ) ) :
raise ValueError ( " Must either set variable for base case " +\
" or must set op to ’ not ’, ’ and ’, or ’ or ’" +\
" and recursive formulas phi and psi " )
self . variable = variable
self . op = op
self . left = left
self . right = right
if self . variable :
self . variables = [ variable ]
elif op == ’ not ’:
self . variables = left . variables
elif op in [ ’ and ’ , ’ or ’ ]:
self . variables = list ( left . variables )
self . variables . extend ( x for x in right . variables if x not in left .
variables )
self . variables . sort ()
68
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
CHAPTER 7. TIME COMPLEXITY
def evaluate ( self , assignment ) :
""" Value of this formula with given assignment ,
names to Python booleans .
a dict mapping variable
Assignment can also be a string of bits , which will be mapped to variables
in alphabetical order .
Boolean values are interconverted with integers to make for nicer printing
(0 and 1 versus True and False ) """
if type ( assignment ) is str :
assignment = dict ( zip ( self . variables , map ( int , assignment ) ) )
if self . op == None :
return assignment [ self . variable ]
elif self . op == ’ not ’:
return int ( not self . left . evaluate ( assignment ) )
elif self . op == ’ and ’:
return int ( self . left . evaluate ( assignment ) and self . right . evaluate (
assignment ) )
elif self . op == ’ or ’:
return int ( self . left . evaluate ( assignment ) or self . right . evaluate (
assignment ) )
else :
raise ValueError ( " This shouldn ’t be reachable " )
The following code builds the formula φ = (x ∧ y) ∨ (z ∧ ¬y) and then evaluates it on all 23 = 8
assignments to its three variables.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# verbose way to build formula ( x and y ) or ( z and not y )
# since I don ’t feel like writing a parser for Boolean formulas
# if I want to do that in the future , go here :
#
http :// pyparsing . wikispaces . com / file / view / simpleBool . py
x = Boolean_formula ( variable = " x " )
y = Boolean_formula ( variable = " y " )
z = Boolean_formula ( variable = " z " )
x_and_y = Boolean_formula ( op = " and " , left =x , right = y )
not_y = Boolean_formula ( op = " not " , left = y )
z_and_not_y = Boolean_formula ( op = " and " , left =z , right = not_y )
formula = Boolean_formula ( op = " or " , left = x_and_y , right = z_and_not_y )
import itertools
num_variables = len ( formula . variables )
for assignment in itertools . product ([ " 0 " ," 1 " ] , repeat = num_variables ) :
assignment = " " . join ( assignment )
value = formula . evaluate ( assignment )
print " formula value = {} on assignment {} " . format ( value , assignment )
A Boolean formula is satisfiable if some assignment w of 0’s and 1’s to its variables causes the
formula to have the value 1. φ = (x ∧ y) ∨ (z ∧ y) is satisfiable because assigning x = 0, y = 0, z = 1
causes φ to evaluate to 1, written φ(001) = 1. We say the assignment satisfies φ. The Boolean
satisfiability problem is to test whether a given Boolean formula is satisfiable:
Sat = { hφi | φ is a satisfiable Boolean formula } .
7.4. NP-COMPLETENESS
69
Note that Sat ∈ NP, since the language
SatV = { hφ, wi | φ is a Boolean formula and φ(w) = 1 }
has a polynomial time decider, i.e., there is a polynomial-time verifier for Sat: if φ has n input
variables, then hφi ∈ Sat ⇐⇒ (∃w ∈ {0, 1}n ) hφ, wi ∈ SatV . The verifier is essentially the
method Boolean_formula.evaluate in the code above.
Finally, we state the reason for the importance of the Sat (as well as Clique, HamPath,
SubsetSum, and hundreds of problems that share this characteristic):
Theorem 7.4.1 (Cook-Levin Theorem). Sat ∈ P if and only if P = NP.
This is considered evidence (though not proof) that Sat 6∈ P.
7.4.1
Polynomial-Time Reducibility
Theorem 7.4.1 is proven by showing, in a certain technical sense, that Sat is “at least as hard” as
every other problem in NP. The concept of reductions is used to formalize what we mean when we
say a problem is at least as hard as another. Let’s discuss a bit what we might mean intuitively by
this. First, by “hard”, we mean with respect to the running time required to solve the problem.
If problem A can be decided in time O(n3 ), whereas problem B cannot be decided in time Ω(n7 ),
this means that B is harder than A: the fastest algorithm for A is faster than the fastest algorithm
for B.
But, we will also do what we have been doing so far and relax our standards of comparison to
ignore polynomial differences. So, suppose than B is actually decidable in time Θ(n7 ): there is an
O(n6 ) time algorithm for B, but no o(n6 ) time algorithm. Then, since n6 is within a polynomial
factor of n3 , because (n3 )2 = n6 (i.e., n6 “only” quadratically larger than n3 ), we won’t consider
this difference significant, and we will say that the hardness of A and B are close enough to be
“equivalent”, since they are within a polynomial factor of each other.12
However, suppose there is a third language C, and we could prove it requires time Ω(2n ) to
decide. (We have a hard time proving this for particular natural languages, but the Time Hierarchy
Theorem assures us that such problems must exist). Then we will say that C really is harder than
A or B, since 2n is not within a polynomial factor of either n3 or n6 . Even multiplying n6 by a
huge polynomial like n1000 gives the polynomial n1006 = o(2n ).
But, as we said, it is often difficult to take a particular natural problem that we care about and
prove that no algorithm with a certain running time can solve it. Obtaining techniques for doing
this sort of thing may well be the most important open problem in theoretical computer science,
and after decades we still have very few tools to do so. That is to say, it is difficult to pin down the
absolute hardness of a problem, the running time of the most efficient algorithm for the problem.
What we do have, on the other hand, is a way to compare the relative hardness of two problems.
What reducibility allows to do is to say, okay, I don’t know what’s the best algorithm for A, and I
12
By this metric, all polynomial-time algorithms are within a polynomial factor of each other. But there are others:
a 2n -time algorithm is within a polynomial factor of a n5 · 2n -time algorithm, or even a 4n -time algorithm, since
2
2
4n = (2n )2 . However, a 2n -time algorithm is not within a polynomial factor of a 2n -time algorithm, since 2n is not
2
bounded by a polynomial function of 2n ; i.e., there is no polynomial p such that 2n = O(p(2n )).
70
CHAPTER 7. TIME COMPLEXITY
also don’t know what’s the best algorithm for B, but what I do know is that, if A is “reducible” to
B, this means that any algorithm for B can be used to solve A with only a polynomial amount of
extra time. Therefore, whatever is the fastest algorithm for B (even if I don’t know what it is, or
how fast it is), I know that the fastest algorithm for A is “almost” as fast (perhaps a polynomial
factor slower). So in particular, if B is decidable in polynomial time, then so is A (perhaps with a
larger exponent in the polynomial).
Example 7.4.2. Recall a clique in a graph G = (V, E) is a subset C ⊆ V such that, for all pairs
u, v ∈ C of distinct nodes in C, {u, v} ∈ E. An independent set in G is a similar concept: I ⊆ V
is an independent set if, for all pairs u, v ∈ C of distinct nodes in C, {u, v} 6∈ E. Suppose we have
an algorithm that can decide, on input hG0 , ki, where G0 has an independent set of size k or not.
How can we use this to decide the problem, given input hG, ki, whether G has a clique of size k?
G = (V,E)
G' = (V,E)
Figure 7.4: A way to “reduce” the problem of finding a clique of some size in a graph G to finding
an independent set of the same size. We transform the graph G on the left to G0 on the right
by taking the complement of the set of edges. Each clique in G (such as the four shaded nodes)
becomes an independent set in G0 .
The idea is shown in Figure 7.4. We map the graph G = (V, E) to the graph G0 = (V, E 0 ),
where E 0 = E (i.e., for each pair of nodes in V , add an edge if there wasn’t one, and otherwise
remove the edge if there was one), and then we give hG0 , ki to the decider for the clique problem.
Since a clique in G becomes an independent set in G0 , for each k, G has a clique of size k if and
only if G0 has an independent set of size k. Thus, the decider for the independent set problem can
be used to answer our question about the clique problem, by transforming the input hG, ki into
hG0 , ki, and then passing the transformed input to the decider for independent set.
The next definition formalizes this idea.
Definition 7.4.3. A function f : {0, 1}∗ → {0, 1}∗ is polynomial-time computable if there is a
polynomial-time TM that, on input x, halts with f (x) on its tape, and xy’s everywhere else.
Definition 7.4.4. Let A, B ⊆ {0, 1}∗ . We say A is polynomial-time mapping reducible to B
(polynomial-time many-one reducible, polynomial-time reducible), written A ≤P
m B, if there is a
∗
∗
polynomial-time computable function f : {0, 1} → {0, 1} such that, for all x ∈ {0, 1}∗ ,
x ∈ A ⇐⇒ f (x) ∈ B.
f is called a (polynomial-time) reduction of A to B.
7.4. NP-COMPLETENESS
71
We interpret A ≤P
m B to mean that “B is at least as hard as A, to within a polynomial-time
error.”
{0,1}*
A
f
B
Figure 7.5: A mapping reduction f that reduces decision problem A ⊆ {0, 1}∗ to B ⊆ {0, 1}∗ .
A pictorial representation of a mapping reduction is shown in Figure 7.5. To implement the
mapping reduction of Example 7.4.2, we can use this:
1 import itertools
2 def r e d u c t i o n _ f r o m _ c l i q u e _ t o _ i n d e p e n d e n t _ s e t ( G ) :
3
V,E = G
4
Ec = [ {u , v } for (u , v ) in itertools . combinations (V ,2)
5
if {u , v } not in E and u != v ]
6
Gc = (V , Ec )
7
return Gp
Mathematically, if we define IndependentSet = {hG, ki | G has an independent set of size k},
then we showed that Clique ≤P
m IndependentSet.
Note that for any mapping reduction f computable in time p(n), and all x ∈ {0, 1}∗ , |f (x)| ≤
p(|x|).
Theorem 7.4.5. Suppose A ≤P
m B. If B ∈ P, then A ∈ P.
Proof. The idea of how to use a polynomial-time mapping reduction is shown in Figure 7.6.
Let MB be the algorithm deciding B in time p(n) for some polynomial p, and let f be the
reduction from A to B, computable in time q(n) for some polynomial q. Define the algorithm
1 def f ( x ) :
2
raise N o t Im p l em e n te d E rr o r ()
3
# TODO : code for f , reducing A to B , goes here
4
5 def M_B ( y ) :
6
raise N o t Im p l em e n te d E rr o r ()
7
# TODO : code for M , deciding B , goes here
8
72
CHAPTER 7. TIME COMPLEXITY
algorithm MA for A
algorithm MB for B
reduction f of A to B
x
f
f(x)
yes
MB
no
Figure 7.6: How to compose a mapping reduction f : {0, 1}∗ → {0, 1}∗ that reduces decision
problem A to decision problem B, with an algorithm MB deciding B, to create an algorithm MA
deciding A. If MB and f are both computable in polynomial time, then their composition MA is
computable in polynomial time as well.
9 def M_A ( x ) :
10
""" Compose reduction f from A to B , with algorithm M for B ,
11
to get algorithm N for A . """
12
y = f(x)
13
output = M_B ( y )
14
return output
N correctly decides A because x ∈ A ⇐⇒ f (x) ∈ B. Furthermore, on input x, N runs in time
q(|x|) + p(|f (x)|) ≤ q(|x|) + p(q(|x|)). N is then polynomial-time by the closure of polynomials
under composition.13
This is where things begin to get a bit abstract compared to previous definitions. The main
way that we use reductions is to show that certain efficient algorithms do not exist, by invoking
the contrapositive of Theorem 7.4.5:
Corollary 7.4.6. Suppose A ≤P
m B. If A 6∈ P, then B 6∈ P.
Theorem 7.4.5 tells us that if the fastest algorithm for B takes time p(n), then the fastest
algorithm for A takes no more than time q(n) + p(q(n)), where q is the running time of f ; i.e., q
is the “polynomial error” when claiming that “A is no harder than B within a polynomial error”.
Since we ignore polynomial differences when defining P, we conclude that A 6∈ P =⇒ B 6∈ P.
How to remember which direction reductions go: The most common mistake made
when using a reduction to show that a problem is hard is to switch the order of the problems in the
reduction, in other words, to mistakenly attempt to show (for example) that HamPath ≤P
m Sat
by showing that we can transform instances of Sat into instances of HamPath (a mistake because
we should instead go the other way and transform instances of HamPath into instances of Sat).
The ≤ sign is intended to be evocative of “comparing hardness” of the problems. So if we write
A ≤P
m B, this means B is at least as hard as A, or A is at least as easy as B. How do you show a
problem is easy? You give an algorithm for solving it! So the way to remember which problem the
reduction is helping you solve, is to ask, “which is the easier problem here?” The easier problem
13
i.e., because r(n) ≡ p(q(n)) is a polynomial whenever p and q are polynomials.
7.4. NP-COMPLETENESS
73
is the one on the left of the ≤P
m , A. So if the reduction is intended to be used with an existing
algorithm for one problem, in order to show that there is an algorithm for the other problem, then
then problem for which you are showing an algorithm exists is the easier one.
Of course, there’s a simpler mneumonic, which doesn’t really convey the why of reductions, but
that is easy to remember. We always write reductions in the same left-to-right order A ≤P
m B.
If you picture the computation of the reduction itself starting with x on the left, then processing
f
through f to end up with f (x) on the right (x −
→ f (x) as in Figure 7.6), then this will remind you
that you should start with an instance x of A, and end with an instance f (x) of B.
Unlike the previous cases, it is difficult to give concrete examples of MA , MB , and f above,
because for the types of problems we consider using these reductions, we believe that no efficient
algorithms MA and MB exist for the two problems A and B, respectively. There is, however, a
concrete, efficiently computable reduction f from A to B for many important choices of A and B.
We cover one of these choices next.
We now use ≤P
m -reductions to show that Clique is “at least as hard” (modulo polynomial
differences in running times) as a useful version of the Sat problem known as 3Sat.
A literal is a Boolean variable or negated Boolean variable, such as x or x. A clause is several
literals connected with ∨’s, such as (x1 ∨ x2 ∨ x3 ∨ x4 ). A conjunction is several subformulas
connected with ∧’s, such as (x1 ∧ (x2 ∨ x7 ) ∧ x3 ∧ x4 ). A Boolean formula φ is in conjunctive normal
form, called a CNF-formula, if it consists of a conjunction of clauses,14 such as
φ = (x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (x3 ∨ x5 ∨ x6 ) ∧ (x3 ∨ x6 ).
φ is a 3CNF-formula if all of its clauses have exactly three literals, such as
φ = (x1 ∨ x2 ∨ x3 ) ∧ (x3 ∨ x5 ∨ x6 ) ∧ (x3 ∨ x6 ∨ x4 ) ∧ (x4 ∨ x5 ∨ x6 ).
Note that any CNF formula with at most 3 literals per clause can be converted easily to 3CNF by
duplicating literals; for example, (x3 ∨ x5 ) is equivalent to (x3 ∨ x5 ∨ x5 ).
Define
3Sat = { hφi | φ is a satisfiable 3CNF-formula } .
We will later show that 3Sat is NP-complete, but for now we will simply show that it is reducible
to Clique.
Theorem 7.4.7. 3Sat ≤P
m Clique.
LECTURE: end of day 16
14
The obvious dual of CNF is disjunctive normal form (DNF), which is an Or of conjunctions, such as the formula
one would derive applying the sum-of-products rule to the truth table of a boolean function, but 3DNF formulas do
not have the same nice properties that 3CNF formulas have, so we do not discuss them further.
74
CHAPTER 7. TIME COMPLEXITY
x1
x2
x2
x1
x1
x1
x2
x2
x2
Figure 7.7: Example of the ≤P
m -reduction from 3Sat to Clique when the input is the 3CNF
formula φ = (x1 ∨ x1 ∨ x2 ) ∧ (x1 ∨ x2 ∨ x2 ) ∧ (x1 ∨ x2 ∨ x2 ). Observe that x1 = 0, x2 = 1 is a satisfying
assignment. Furthermore, if we pick exactly one node in each triple, corresponding to a literal that
satisfies the clause associated to that triple, it is a k-clique: since we are picking the literal that
satisfies each clause, we never pick both a literal and its negation, so every node we pick has an
edge to every other node.
Proof. Given a 3CNF formula φ, we convert it in polynomial time (via reduction f ) into a pair
f (hφi) = hG, ki, a graph G and integer k so that
φ is satisfiable ⇐⇒ G has a k-clique.
Let k = # of clauses in φ. Write φ as
φ = (a1 ∨ b1 ∨ c1 ) ∧ (a2 ∨ b2 ∨ c2 ) ∧ . . . ∧ (ak ∨ bk ∨ ck ).
G will have 3k nodes, one for each literal appearing in φ.15
See Figure 7.7 for an example of the reduction.
The nodes of G are organized into k groups of three nodes each, called triples. Each triple
corresponds to a clause in φ.
G has edges between every pair of nodes u and v, unless:
(1) u and v are in the same triple, or
(2) u = xi and v = xi , or vice-versa.
Each of these conditions can be checked in polynomial time, and there are
conditions, so this is computable in polynomial time.
15
|V |
2
= O(|V |2 ) such
Note that φ could have many more (or fewer) literals than variables, so G may have many more nodes than φ
has variables. But G will have exactly as many nodes as φ has literals. Note also that literals can appear more than
once, e.g., (x1 ∨ x1 ∨ x2 ) has two copies of the literal x1 .
7.4. NP-COMPLETENESS
75
We now show that
φ is satisfiable ⇐⇒ G has a k-clique.
( =⇒ ): Suppose that φ has a satisfying assignment. Then at least one literal is true in every
clause. To construct a k-clique C, select exactly one node from each clause labeled by a true
literal (breaking ties arbitrarily); this implies condition (1) is false for every pair of nodes in
C. Since xi and xi cannot both be true, condition (2) is false for every pair of nodes in C.
Therefore every pair of nodes in C is connected by an edge; i.e., C is a k-clique.
( ⇐= ): Suppose there is a k-clique C in G. No two of the clique’s nodes are in the same triple by
condition (1), so each triple contains exactly one node of C. If xi ∈ C, assign xi = 1 to make
the corresponding clause true. If xi ∈ C, assign xi = 0 to make the corresponding clause true.
Since no two nodes xi and xi are part of C by condition (2), this assignment is well-defined
(we will not attempt to assign xi to be both 0 and 1). The assignment makes every clause
true, thus satisfies φ.
Theorems 7.4.5 and 7.4.7 tell us that if Clique is decidable in polynomial time, then so is
3Sat. Speaking in terms of what we believe is actually true, Corollary 7.4.6 and Theorem 7.4.7
tell us that if 3Sat is not decidable in polynomial time, then neither is Clique.
7.4.2
Definition of NP-Completeness
Definition 7.4.8. A language B is NP-hard if, for every A ∈ NP, A ≤P
m B.
Definition 7.4.9. A language B is NP-complete if
1. B ∈ NP, and
2. B is NP-hard.
Theorem 7.4.10. If B is NP-complete and B ∈ P, then P = NP.
Proof. Assume the hypothesis and let A ∈ NP. Since B is NP-hard, A ≤P
m B. Since B ∈ P, by
Theorem 7.4.5 (the closure of P under ≤P
-reductions),
A
∈
P.
Since
A
was
arbitrary, NP ⊆ P.
m
Since P ⊆ NP, P = NP.
Corollary 7.4.11. If P 6= NP, then no NP-complete problem is in P.
16
Theorem 7.4.12. If B is NP-hard and B ≤P
m C, then C is NP-hard.
P
P
Proof. Let A ∈ NP. Then A ≤P
m B since B is NP-hard. Since ≤m is transitive and B ≤m C, it
P
follows that A ≤m C. Since A ∈ NP was arbitrary, C is NP-hard.
Corollary 7.4.13. If B is NP-complete, C ∈ NP, and B ≤P
m C, then C is NP-complete.
16
Since it is generally believed that P 6= NP, Corollary 7.4.11 implies that showing a problem is NP-complete is
evidence of its intractability.
76
CHAPTER 7. TIME COMPLEXITY
That is, NP-complete problems can, in many ways, act as “representatives” of the hardness of
NP, in the sense that black-box access to an algorithm for solving an NP-complete problem is as
good as access to an algorithm for any other problem in NP.
Corollary 7.4.13 is our primary tool for proving a problem is NP-complete: show that some
existing NP-complete problem reduces to it. This should remind you of our tool for proving that
a language is undecidable: show that some undecidable language reduces to it. We will eventually
show that 3Sat is NP-complete; from this, it will immediately follow that Clique is NP-complete
by Corollary 7.4.13 and Theorem 7.4.7.
7.4.3
The Cook-Levin Theorem
Theorem 7.4.14. Sat and 3Sat are NP-complete.17
We won’t prove Theorem 7.4.14 in this course (take ECS 220 for that).
7.5
Additional NP-Complete Problems
Reading assignment: Section 7.5 in Sipser.
To construct a polynomial-time reduction from 3Sat to another language, we transform the
variables and clauses in 3Sat into structures in the other language. These structures are called
gadgets. For example, to reduce 3Sat to Clique, nodes “simulate” variables and triples of nodes
“simulate” clauses. 3Sat is not the only NP-complete language that can be used to show other
problems are NP-complete, but its regularity and structure make it convenient for this purpose.
Theorem 7.5.1. Clique and IndependentSet are NP-complete.
Proof. We have already shown that Clique ∈ NP. 3Sat ≤P
m Clique by Theorem 7.4.7, and 3Sat
is NP-complete, so Clique is NP-hard.
Recall
that
we
also
showed Clique ≤P
m IndependentSet
via the reduction h(V, E), ki 7→ (V, E), k , so IndependentSet is NP-hard as well. We also
showed both Clique and IndependentSet are in NP, so they are NP-complete.
7.5.1
The Vertex Cover Problem
If G is an undirected graph, a vertex cover of G is a subset of nodes, where each edge in G is
connected to at least one node in the vertex cover. Define
VertexCover = { hG, ki | G is an undirected graph with a vertex cover of size k } .
Note that adding nodes to a vertex cover cannot remove its ability to touch every edge; hence if
G = (V, E) has a vertex cover of size k, then it has a vertex cover of each size k 0 where k ≤ k 0 ≤ |V |.
Therefore it does not matter whether we say “of size k” or “of size at most k” in the definition of
VertexCover.
17
By Theorem 7.4.10, Theorem 7.4.14 implies Theorem 7.4.1. In other words, because Sat is NP-complete, if
Sat ∈ P, then since all A ∈ NP are reducible to Sat, A ∈ P as well, i.e., P = NP. Conversely, if P = NP, then
Sat ∈ P (and so is every other problem in NP).
7.5. ADDITIONAL NP-COMPLETE PROBLEMS
V1
(variable
gadgets)
x1
x1
x2
x1
V2
(clause
gadgets)
x1
77
x2
x1
x2
x2
x1
x2
x2
x2
Figure 7.8: An example of the reduction from 3Sat to VertexCover for the 3CNF formula
φ = (x1 ∨ x1 ∨ x2 ) ∧ (x1 ∨ x2 ∨ x2 ) ∧ (x1 ∨ x2 ∨ x2 ).
Theorem 7.5.2. VertexCover is NP-complete.
Proof. We must show that VertexCover is in NP, and that some NP-complete problem reduces
to it.
(in NP): The language
VertexCoverV =
G is an undirected graph and
hhG, ki , Ci C is a vertex cover for G of size k
is a verification language for VertexCover, and it is in P: if G = (V, E), the verifier tests
whether C ⊆ V , |C| = k, and for each {u, v} ∈ E, either u ∈ C or v ∈ C.
(NP-hard): One elegant way to see this is that IndependentSet ≤P
m VertexCover by a particularly simple reduction hG, ki 7→ hG, n − ki, where n is the number of nodes. This works
because I is an independent set in G = (V, E) if and only if V \ I is a vertex cover. To
see the forward direction, note that no pair of nodes in I has an edge that needs to be
covered. Thus, if we pick all nodes in V \ I, they cover all edges between nodes in V \ I,
and since I is an independent set, all remaining edges must be between a node in I and a
node in V \ I, so the set V \ I covers these edges as well. Conversely, if V \ I is a vertex
cover, then no pair of nodes in I can have an edge between them (since it would be uncovered by V \ I), so I must be an independent set. In fact, this very same reduction also shows
VertexCover ≤P
m IndependentSet; they are nearly equivalent problems, in the same way
that Clique is nearly equivalent to IndependentSet (recall the reduction from Clique to
IndependentSet simply takes the complement of E).
To give an idea for how to do more sophisticated reductions, and in particular how to use the
structure of the 3Sat problem to do NP-hardness reductions, we show a more complex reduction from 3Sat to VertexCover. Given a 3CNF-formula φ, we show how to (efficiently)
transform φ into a pair hG, ki, where G = (V, E) is an undirected graph and k ∈ N, such that
φ is satisfiable if and only if G has a vertex cover of size k.
78
CHAPTER 7. TIME COMPLEXITY
See Figure 7.8 for an example of the reduction.
For each variable xi in φ, add two nodes labeled xi and xi to V , and connect them by an
edge; call this set of gadget nodes V1 . For each literal in each clause, we add a node labeled
with the literal’s value; call this set of gadget nodes V2 .
Connect nodes u and v by an edge if they are
1. in the same variable gadget,
2. in the same clause gadget, or
3. have the same label.
If φ has m input variables and l clauses, then V has |V1 |+|V2 | = 2m+3l nodes. Let k = m+2l.
Since |V | = 2m + 3l and |E| ≤ |V |2 , G can be computed in O(| hφi |2 ) time.
Now we show that φ is satisfiable ⇐⇒ G has a vertex cover of size k:
( =⇒ ): Suppose (∃x1 x2 . . . xm ∈ {0, 1}m ) φ(x1 x2 . . . xn ) = 1. If xi = 1 (resp. 0), put the
node in V1 labeled xi (resp. xi ) in the vertex cover C; then every variable gadget edge is
covered. In each clause gadget where this literal appears, if it appears in more than one
node, pick one node arbitrarily and put the other two nodes of the gadget in C; then
all clause gadget edges are covered.18 All edges between variable and clause gadgets are
covered, by the variable node if it was included in C, and by a clause node otherwise,
since some other literal satisfies the clause. Since |C| = m + 2l = k, G has a vertex cover
of size k.
( ⇐= ): Suppose G has a vertex cover C with k nodes. Then C contains at least one of
each variable node to cover the variable gadget edges, and at least two clause nodes to
cover the clause gadget edges. This is ≥ k nodes, so C must have exactly one node per
variable gadget and exactly two nodes per clause gadget to have exactly k nodes. Let
xi = 1 ⇐⇒ the variable node labeled xi ∈ C. Each node in a clause gadget has an
external edge to a variable node; since only two nodes of the clause gadget are in C,
the external edges of the third clause gadget node is covered by a node from a variable
gadget, whence the assignment satisfies the corresponding clause.
7.5.2
The Subset Sum Problem
Theorem 7.5.3. SubsetSum is NP-complete.
skip in lecture
Proof. Theorem 7.3.6 tells us that SubsetSum ∈ NP. We show SubsetSum is NP-hard by reduction from 3Sat.
18
Any two clause gadget nodes will cover all three clause gadget edges.
7.5. ADDITIONAL NP-COMPLETE PROBLEMS
79
Let φ be a 3CNF formula with variables x1 , . . . , xm and clauses c1 , . . . , cl . Construct the pair
hS, ti, where S = {y1 , z1 , . . . , ym , zm , g1 , h1 , . . . , gl , hl } is a collection of 2(m + l) integers and t is an
integer, whose decimal expansions are based on φ as shown by example:
(x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x3 ∨ x4 )
y1
z1
y2
z2
y3
z3
y4
z4
g1
h1
g2
h2
g3
h3
t
x1 x2 x3 x4 c1 c2 c3
1 0 0 0 1 0 1
1 0 0 0 0 1 0
1 0 0 0 1 0
1 0 0 1 0 0
1 0 1 1 0
1 0 0 0 1
1 0 0 1
1 0 0 0
1 0 0
1 0 0
1 0
1 0
1
1
1 1 1 1 3 3 3
The upper-left and bottom-right of the table contain exactly one 1 per row as shown, the
bottom-left is all empty (leading 0’s), and t is m 1’s followed by l 3’s. The upper-right of the table
has 1’s to indicate which literals (yi for xi and zi for xi ) are in which clause. Thus each column in
the upper-right has exactly three 1’s.
The table has size O((m + l)2 ), so the reduction can be computed in time O(n2 ), since m, l ≤ n.
We now show that
X
φ is satisfiable ⇐⇒ (∃S 0 ⊆ S) t =
n
n∈S 0
( =⇒ ): Suppose (∃x1 x2 . . . xm ∈ {0, 1}m ) φ(x1 x2 . . . xm ) = 1. Select elements S 0 ⊆ S as follows.
For each 1 ≤ i ≤ m, yi ∈ S 0 if xi = 1, and zi ∈ S 0 if xi = 0. Since every clause cj of
φ is satisfied, for each column cj , at least one row with a 1 in column cj is selected in the
upper-right. For each 1 ≤ j ≤ l, if needed gj and/or hj are placed in S 0 to make column cj
on the right side of the table sum to 3.19 Then S 0 sums to t.
P
( ⇐= ): Suppose (∃S 0 ⊆ S) t = n∈S 0 n. All the digits in elements of S are either 0 or 1, and each
column in the table contains at most five 1’s; hence there are no carries when summing S 0 .
To get a 1 in the first m columns, S 0 must contain exactly one of each pair yi and zi . The
19
One or both will be needed if the corresponding clause has some false literals, but at least one true literal must
be present to satisfy the clause, so no more than two extra 1’s are needed from the bottom-right to sum the whole
column to 3.
80
CHAPTER 7. TIME COMPLEXITY
assignment x1 x2 . . . xm ∈ {0, 1}m is given by letting xi = 1 if yi ∈ S 0 , and letting xi = 0 if
zi ∈ S 0 . Let 1 ≤ j ≤ l; in each column cj , to sum to 3 in the bottom row, S 0 contains at least
one row with a 1 in column j in the upper-right, since only two 1’s are in the lower-right.
Hence cj is satisfied for every j, whence φ(x1 x2 . . . xm ) = 1.
A Brief History of the P versus NP Problem
There is a false folklore of the history of the P versus NP problem that is not uncommon to
encounter,20 and the story goes like this: First, computer scientists defined the class P (that part
is true) and the class NP (that part is false), and raised this profound question asking whether
P = NP. Then Steven Cook and Leonid Levin discovered the NP-complete problems (that part is
true), and this gave everyone the tools to now finally attack the P 6= NP problem, since by showing
that one of the NP-complete problems is in P, this would show that P = NP.
There is a minor flaw and a major flaw to this story. The minor flaw is that it is widely believed
that in fact P 6= NP. Furthermore, the NP-complete problems are not needed to show this. The
existence of a single problem in NP that is not in P suffices to show that P 6= NP. The problem does
not need to be NP-complete for this to work.21 Cook and Levin were not after a way to resolve
P versus NP, which leads to the major flaw in the story: the NP-complete problems came first,
not the class NP. The class NP was only defined to capture a large class of problems that Sat is
complete for (and hopefully large enough to be bigger than P).
No one would care about the class NP if not for the existence of the NP-complete problems.
NP is a totally unrealistic model of computation. Showing that a problem is in NP does not
automatically lead to a better way to decide it, in any realistic sense.
Beginning in the late 1940s, researchers in the new field of computer science spent a lot of time
searching for algorithms for important optimization problems encountered in engineering, physics,
and other fields that wanted to make use of the new invention of the computer to automate some
of their calculation needs.22
Some of these problems, such as sorting and searching, have obvious efficient algorithms,23
although clever tricks like divide-and-conquer recursion could reduce the time even further for
certain problems such as sorting. Some problems, such as linear programming and maximum flow,
seemed to have exponential brute-force algorithms as their only obvious solution, although clever
tricks (Dantzig’s simplex algorithm in the case of linear programming, Ford-Fulkerson in the case
20
The first sentence of Section 7.4 of Sipser’s textbook appears to propagate this folklore.
In fact, Andrei Kolmogorov suggested that perhaps the problems that are now known to be NP-complete might
contain too much structure to facilitate a proof of intractability, and suggested less natural, but more “randomlooking” languages, as candidates for provably intractable problems, whose intractability would imply the intractability of the more natural problems.
22
Inspired in part by the algorithmic wartime work of some of those same researchers in World War II; Dantzig’s
simplex algorithm linear programming was the result of thinking about how to solve problems with linear constraints,
such as: “If transportation with trucks costs $10/soldier, $4/ton of food, subject to the constraint that each truck
can carry weight subject to the linear tradeoff of 1 ton of food for every 10 soldiers, and each soldier requires 20
pounds of food/week, how can we move the maximum number of soldiers in 30 trucks, for less than $50,000?”
23
Though it would not be until the late 1960’s that anyone would even think of polynomial time as the default
definition of “efficient”.
21
7.5. ADDITIONAL NP-COMPLETE PROBLEMS
81
of network flow)24 would allow for a much faster algorithm than brute-force search. Still other
problems, such as the traveling salesman problem and vertex cover, resisted all efforts to find an
efficient exact algorithm.
After decades of effort by hundreds of researchers to solve such problems failed to produce
efficient algorithms, many suspected that these problems had no efficient algorithms. The theory of
NP-completeness provides an explanation for why these problems are not feasibly solvable. They are
important problems regardless of whether they are contained in NP; what makes them most likely
intractable is the fact that they are NP-hard. In other words, they are intractable because they are
at least as hard as every problem in a class that apparently (though unfortunately not yet provably)
defines a larger class of problems than P. In better-understood models of computation such as finite
automata, a simulation of a nondeterministic machine by a deterministic machine necessitates an
exponential blowup (in states in the case of finite automata). Our intuition is that, though this has
not been proven, the same exponential blowup (in running time now instead of states) is necessary
when simulating a NTM with a deterministic TM. The NP-complete problems are believed to
be difficult to the extent that this intuition is correct. While not a proof that those problems
are intractable, it does imply that if those problems are tractable, then something is seriously
wrong with our current understanding and intuition of computation, if magical nondeterministic
computers could always be simulated by real computers for negligible extra cost.25
Prior to the theory of NP-completeness, algorithms researchers – and engineers and scientists
in need of those algorithms – were lost in the dark, only able to conclude that a decidable problem
was probably too difficult to have a fast algorithm if considerable cost invested in attempting to
solve it had failed to produce results. The theory of NP-completeness provides a systematic method
of, if not proving, providing evidence for the intractability of a problem: show that the problem is
NP-complete by reducing an existing NP-complete problem to it.26
Scott Aaronson’s paper NP-complete Problems and Physical Reality 27 contains some of the
most convincing arguments that I seen justifying the intuition that NP contains fundamentally
hard problems. Anyone who would considering taking the position that “The lack of a proof that
P 6= NP implies that there is a non-negligible chance that NP-complete problems are tractable”,
owes it to themselves to read that paper.
24
Although technically the original versions of both of those methods were exponential time in the worst case, they
nonetheless avoided brute-force search in ingenious ways, and led eventually to provably polynomial-time algorithms.
25
We have deeper reasons for thinking that P 6= NP besides the hand-waving argument, “C’mon! How could they be
equal?!” For example, cryptography as we know it would be impossible, and the discovery of proofs of mathematical
theorems could be automated. In a depressing philosophical sense, creativity itself could be automated: as Scott
Aaronson said, “Everyone who could appreciate a symphony would be Mozart; everyone who could follow a step-bystep argument would be Gauss; everyone who could recognize a good investment strategy would be Warren Buffett.”
http://www.scottaaronson.com/blog/?p=122
26
For this we probably have Richard Karp to thank more than Cook or Levin, who, one year after the publication
of Cook’s 1971 paper showing Sat is NP-complete, published a paper listing 21 famous, important and practical
decision problems, involving graphs, job scheduling, circuits, and other combinatorial objects, showing them all to
be NP-complete by reducing Sat to them (or reducing them to each other). After that, the floodgates opened, and
by the mid-1970’s, hundreds of natural problems had been shown to be NP-complete.
27
http://arxiv.org/abs/quant-ph/0502072
82
CHAPTER 7. TIME COMPLEXITY
Chapter 4
Decidability
4.1
Decidable Languages
Reading assignment: Section 4.1 in Sipser.
A decidable language is a language decided by some Turing machine (or equivalently, by some
C++ program). Sipser gives extensive examples of decidable languages, all of which are related to
finite automata or pushdown automata in some way. We won’t do this, for two reasons:
1. It gives the impression that even now that we are studying computability, the entire field is just
an elaborate ruse for asking more sophisticated questions about finite-state and pushdown
automata. This is not at all the case; almost everything interesting that we know about
computability has nothing to do with finite automata or pushdown automata.
2. You have extensive experience with decidable languages, because every time that you wrote
a C++ function with a bool return type, that algorithm was recognizing some c.e. language:
specifically, the set of all input arguments that cause the function to return true. If it had
no infinite loops, then it was deciding that language.
Because years of programming has already developed in you an intuition for what constitutes a
decidable language, we will not spend much time on this. It is sufficient that Chapter 3 convinced
you that Turing machines have the same fundamental computational capabilities as C++ programs,
and that therefore all the intuition you have developed in programming, algorithms, and other
classes in which you wrote algorithms, applies to Turing machines as well.
We now concentrate on the primary utility of Turing machines: using their apparent simplicity
to prove limitations on the fundamental capabilities of algorithms.
It is common to write algorithms in terms of the data structures they are operating on, even
though these data structures must be encoded in binary before delivering them to a computer (or
in some alphabet Σ before delivering them to a TM). Given any finite object O, such as a string,
graph, tuple, or even a Turing machine, we use the notation hOi to denote the encoding of O as a
string in the input alphabet of the Turing machine we are using. To encode multiple objects, we
use the notation hO1 , O2 , . . . , Ok i to denote the encoding of the objects O1 through Ok as a single
string.
83
84
CHAPTER 4. DECIDABILITY
4.2
The Halting Problem
Reading assignment: Section 4.2 in Sipser.
Sipser defines something he calls the halting problem ATM , but confusingly it is not a question
about halting, but about accepting. He then later defines a problem HALTTM that really is
a question about halting. We swap their roles so that we define HALTTM to be “the halting
problem”.
Define the Halting Problem (or Halting Language)
HALTTM = { hM, wi | M is a TM and M halts on input w } .
HALTTM is sometimes denoted K or 00 .
Note that HALTTM is c.e., via the following Turing machine Hr :
Hr = “On input hM, wi, where M is a TM and w is a string:
1. Simulate M on input w.
2. If M ever halts, accept.”
This can be implemented in Python via
1 def halt_recognizer ( M_src , w ) :
2
""" M_src is the source code of a Python function named M ,
3
and w is an input to M . """
4
exec ( M_src ) # this defines the function so we can execute it
5
M(w)
# this actually executes the function on input w
6
return True # this is only reached if M ( w ) halts
Here is halt_recognizer in action on a few strings that are the source code of Python functions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
M_src = """
def M ( w ) :
’’’ Indicates if x has > 5 occurences of the symbol " a ". ’ ’ ’
count = 0
for c in w :
if c == " a ":
count += 1
return count > 5
"""
w = " abcabcaaabcaa "
halt_recognizer ( M_src , w ) # will halt since M always halts
M_src = """
def M ( w ) :
’’’ Has a potential infinite loop . ’ ’ ’
count = 0
while count < 5:
4.2. THE HALTING PROBLEM
18
19
20
21
22
23
24
25
26
27
28
29
85
c = w [0]
if c == " a ":
count += 1
return count > 5
"""
w = " abcabcaaabcaa "
halt_recognizer ( M_src , w ) # will halt since w [0] == " a "
# WARNING : will not halt !
# After running this , you ’ ll need to kill the interpreter
w = " bcabcaaabcaa "
halt_recognizer ( M_src , w ) # won ’t halt since w [0] != " a "
LECTURE: end of day 17
A note about programs versus source code of programs. We have taken the convention
of representing “programs/algorithms” as Python functions. Above, we showed that there is a
Python function halt_recognizer that recognizes HALTTM taking as input two strings M_src
and w. The built-in Python function exec gives a way to start from Python source code defining a
function and convert it into an actual Python function that can be called. It is worth noting that
these are two different types of objects, similarly to the fact that a string hGi representing a graph
G = (V, E) is a different type of object than G itself. G is a pair consisting of a set V and a set
E of pairs of elements from V , whereas hGi is a sequence of symbols from the alphabet {0, 1}∗ .
However, the encoding function we defined allows us to interconvert between these two, and it is
sometimes more convenient to speak of G as though it is a graph, and sometimes more convenient
to speak of G as though it is a string representing a graph.
Similarly, Python gives an easy way to interconvert between Python functions that can be
called, and strings that contain properly formatted Python source code describing those functions.
To convert source code to a function, use exec as above. To convert a function to source code,
use the inspect library. The following code snippet shows how to do this, and even how to do a
simple modification on the source code that changes its behavior, which is reflected if the code is
“re-compiled” using exec.
1 def M ( w ) :
2
""" Indicates if | w | > 3 """
3
return len ( w ) > 3
4
5 print ’M ("0101") = {} ’. format ( M ( " 0101 " ) )
6
7 # use inspect . getsource to convert function to source code
86
CHAPTER 4. DECIDABILITY
8 import inspect
9 M_src = inspect . getsource ( M )
10 print " source code of M :\ n {} " . format ( M_src )
11
12 M_src = M_src . replace ( " len ( w ) > 3 " , " len ( w ) > 5 " )
13
14 # use exec to convert source code to function
15 exec ( M_src )
16
17 print ’M ("0101") = {} ’. format ( M ( " 0101 " ) )
18 print " source code of M :\ n {} " . format ( M_src )
From now on, we will only deal directly with source code strings when it is more convenient.
Since Python allows functions to be “first-class” objects that can be passed as data to other
functions, it is often more convenient to deal with them directly. This is analogous to our choice
in the time complexity chapter to encode an object such as a graph as a pair of lists, rather than
dealing with string encodings of graphs.
With this in mind, we can rewrite the above code for halt_recognizer to deal directly with
Python functions rather than source code, knowing that if we had source code instead, we could
simply use exec to convert it to a Python function.
1 def halt_recognizer (M , w ) :
2
""" M is a Python function , and w is an input to M . """
3
M(w)
# this executes the function M on input w
4
return True # this is only reached if M ( w ) halts
1 def M ( w ) :
2
""" Indicates if x has > 5 occurrences of the symbol " a ". """
3
count = 0
4
for c in w :
5
if c == " a " :
6
count += 1
7
return count > 5
8
9 w = " abcabcaaabcaa "
10 halt_recognizer (M , w ) # will halt since M always halts
11
12 def M ( w ) :
13
""" Has a potential infinite loop . """
14
count = 0
15
while count < 5:
16
c = w [0]
17
if c == " a " :
18
count += 1
19
return count > 5
4.2. THE HALTING PROBLEM
20
21
22
23
24
25
26
27
87
w = " abcabcaaabcaa "
halt_recognizer (M , w ) # will halt since w [0] == " a "
# WARNING : will not halt !
# After running this , you ’ ll need to shut down the interpreter
w = " bcabcaaabcaa "
halt_recognizer (M , w ) # won ’t halt since w [0] != " a "
We will show that HALTTM is not decidable:
Theorem 4.2.1. HALTTM is not decidable.
The proof technique, surprisingly, is 120 years old.
But before getting to it, we will first use the machinery of reductions, together with Theorem 4.2.1, to show that other problems are undecidable, by showing that if they could be decided,
then so could HALTTM , contradicting Theorem 4.2.1.
88
CHAPTER 4. DECIDABILITY
Chapter 5
Reducibility
Now that we know the halting problem is undecidable, we can use that as a shoehorn to prove
other languages are undecidable, without having to repeat the full diagonalization.
Although the textbook eventually gets into mapping reductions as a formal tool to prove problems are undecidable, we don’t formally use them in this chapter. For the curious, they are the
same as the mapping reductions used for NP-completeness, but without the polynomial-time constraint. We also allow more general reductions, in which problem A is reducible to problem B if
“an algorithm MB for B can be used as a subroutine to create an algorithm MA for A”. A mapping
reduction f reducing A to B is the special case when the algorithm MA is of the form “on input x,
return MB (f (x)).” We allow the reduction to change the answer, or to call MB more than once,
for example.
5.1
Undecidable Problems from “Language Theory” (Computability)
Reading assignment: Section 5.1 in Sipser.
Recall
HALTTM = { hM, wi | M is a TM and halts on input w } .
Define
ATM = { hM, wi | M is a TM and accepts w } .
Theorem 5.1.1. ATM is undecidable.
Proof. Suppose for the sake of contradiction that ATM is decidable by TM A. For any TM M , let
MO be the TM that is identical to M , but swaps accept and reject states (MO returns the Opposite
Boolean value that M returns). (See after the proof for a Python implementation.)
Define the TM H as follows on input hM, wi:
1. If A accepts hM, wi or hMO , wi, accept.
2. Otherwise, reject.
89
90
CHAPTER 5. REDUCIBILITY
If M (w) halts, then it either accepts or rejects. If M (w) rejects, then MO (w) accepts. Thus A
accepts at least one of hM, wi or hMO , wi, and H(hM, wi) accepts.
Conversely, if M (w) does not halt, then neither does MO (w), so in particular neither accepts,
so A rejects both hM, wi and hMO , wi, and H(hM, wi) rejects.
But H decides HALTTM , a contradiction since HALTTM is undecidable. Hence A does not
exist and ATM is undecidable.
We can implement this in Python with
1 def A (M , w ) :
2
""" Function that supposedly decides whether M ( w ) accepts ,
3
where M is a function and w an input to M . """
4
raise N o t Im p l em e n te d E rr o r ()
5
6 def opposite ( M ) :
7
""" Given M , a Python function returning a Boolean , returns
8
function M_O that acts like M , but returns the opposite value
9
( but also loops whenever M loops ) . """
10
def M_O ( w ) :
11
return not M ( w )
12
return M_O
13
14 def H (M , w ) :
15
""" Decider for HALT_TM that works assuming A works ,
16
giving a contradiction and proving A cannot be implemented . """
17
M_O = opposite ( M )
18
return A (M , w ) or A ( M_O , w )
The functions A and H do not exist, but we can see what the function opposite does:
1 def M ( x ) :
2
count = 0
3
for c in x :
4
if c == " a " :
5
count += 1
6
return count > 5
7
8 M_O = opposite ( M )
9
10 print " M ( ’ abababab ’)
= {} " . format ( M ( ’ abababab ’) )
11 print " M_O ( ’ abababab ’) = {} " . format ( M_O ( ’ abababab ’) )
which prints
M(’abababab’)
= False
M_O(’abababab’) = True
5.1. UNDECIDABLE PROBLEMS FROM “LANGUAGE THEORY” (COMPUTABILITY) 91
Essentially, HALTTM and ATM are undecidable “for the same reason”.
ε = { hM i | M is a TM and M (ε) halts }.
Define the empty string halting problem HALTTM
ε
Theorem 5.1.2. HALTTM
is undecidable.
ε was undecidable, but not whether HALT
First, suppose we knew already that HALTTM
TM was
ε
undecidable. Then we could easily use the undecidability of HALTTM
to prove that HALTTM is
undecidable, using the same pattern we have used already.
That is, suppose for the sake of contradiction that HALTTM was decidable by TM H. Then to
decide whether M (ε) halts, we use the decider H to determine whether hM, εi ∈ HALTTM . That
ε is a simple special case of an instance of HALT
1
is, an instance of HALTTM
TM , so HALTTM is at
ε
ε
least as difficult to decide as HALTTM . Proving that HALTTM is undecidable means showing that
ε .
HALTTM is not actually any more difficult to decide than HALTTM
ε is decidable by TM H . Define the TM
Proof. Suppose for the sake of contradiction that HALTTM
ε
H as follows on input hM, wi:
1. Define the TM NM,w based on M and w as follows:
(a) On every input x,
(b) Run M (w) and halt if M (w) does.
2. Run Hε (hNM,w i) and echo its answer.
Given M and w, we define NM,w so that, if M (w) halts, then NM,w halts on all inputs (including ε),
and if M (w) does not halt, then NM,w halts on no inputs (including ε). Then H decides HALTTM ,
a contradiction.
The following Python code implements this idea:
1 def H_e ( M ) :
2
""" Function that supposedly decides whether M ("") halts . """
3
raise N o t Im p l em e n te d E rr o r ()
4
5 def H (M , w ) :
6
""" Decider for HALT_TM that works assuming H_e works , giving
7
a contradiction and proving H_e cannot be implemented . """
8
def N_Mw ( x ) :
9
""" Halts on every string if M halts on w , and loops on
10
every string if M loops on w . """
11
M(w)
12
return H_e ( N_Mw )
1
Just as, for instance, determining whether two nodes are connected in a graph is a special case of finding the
length of the shortest path between them (the length being finite if and only if there is a path), which involves
determining whether there is a path as a special case of the general task of finding the optimal length of the path.
92
CHAPTER 5. REDUCIBILITY
Define
ETM = { hM i | M is a TM and L(M ) = ∅ } .
Theorem 5.1.3. ETM is undecidable.
Proof. For any TM M and string w, define the TM NM,w as follows:
NM,w on input x:
1. Run M (w)
2. accept
If M halts on input w, then L(NM,w ) = {0, 1}∗ 6= ∅. If M loops on input w, then L(NM,w ) = ∅.
Suppose for the sake of contradiction that ETM is decidable by TM E. Define the TM H as
follows on input hM, wi:
1. Run E(hNM,w i)
2. If E accepts hNM,w i, reject
3. If E rejects hNM,w i, accept
Therefore, for TM M and string w,
M halts on input w ⇐⇒ L(NM,w ) 6= ∅
defn of NM,w
⇐⇒ E rejects NM,w
since E decides ETM
⇐⇒ H accepts hM, wi .
lines (2) and (3) of H
Since H always halts, it decides HALTTM , a contradiction.
The following Python code implements this idea:
1 def E (M , w ) :
2
""" Function that supposedly decides whether L ( M ) is empty . """
3
raise N o t Im p l em e n te d E rr o r ()
4
5 def H (M , w ) :
6
""" Decider for HALT_TM that works assuming E works ,
7
giving a contradiction and proving E cannot be implemented . """
8
def N_Mw ( x ) :
9
""" Accepts every string if M halts on w , and loops on
10
every string if M loops on w . """
11
M(w)
12
return True
13
return not E ( N_Mw )
5.1. UNDECIDABLE PROBLEMS FROM “LANGUAGE THEORY” (COMPUTABILITY) 93
In general, the following is true: any question about the “eventual behavior” of an algorithm is
undecidable.
The following are all questions about the eventually behavior of an algorithm, which can be
proven undecidable by showing that the halting problem can be reduced to them. This includes
questions of the form, given a program P and input x:
• Does P accept x?
• Does P reject x?
• Does P loop on x?
• Does P (x) ever execute line 320?
• Does P (x) ever call subroutine f ?
• Does P (x) print the string 42?
• Does P (x) ever print a string?
Similar questions about general program behavior over all strings are also undecidable, i.e.,
given a program P :
• Does P accept at least one input?
• Does P accept all inputs?
• Is P total? (i.e., does it halt on all inputs?)
• Is there an input x such that P (x) returns the value 42? (This is undecidable even when we
restrict to total programs P ; so even if we knew in advance that P halts on all inputs, there
is no algorithm that can find an input x such P (x) returns the value 42)
• Does some input cause P to call subroutine f ?
• Do all inputs cause P to call subroutine f ?
• Does P decide the same language as some other program Q?
• Does P accept any inputs that represent a prime number?
• Does P decide the language {hni | n is prime}?
• Is P a polynomial-time machine? (this is not decidable even if we are promised that P is
total)
• Is P ’s running time O(n3 )?
However, there are certainly questions one can ask about a program P that are decidable. For
example, if the question is about the syntax, and not the eventual behavior, then the problem can
be decidable:
94
CHAPTER 5. REDUCIBILITY
• Does P have at least 100 lines of code? (Turing machine equivalent of this question would
be, “Does M have at least 100 states?”)
• (Interpreting P as a TM) Does P have 2 and 3 in its input alphabet?
• (Interpreting P as a TM) Does P halt immediately because its start state is equal to the
accept or reject state?
• Does P have a subroutine named f ?
• (Interpreting P as a TM) Do P ’s transitions always move the tape head right? (If so, then
we could decide whether P halts on a given input, because it either halts after moving the
tape head beyond the input upon reading enough blanks, or repeats a non-halting state while
reading blanks and will therefore move the tape head right forever.)
• Does P “clock itself” to halt after n3 steps on inputs of length n, guaranteeing that its running
3
time is O(n3 )? One way to do this is to use a worktape to write the string 0n and reset the
tape head to the left before doing anything else, then move that tape head right once per
step (in addition to the computation being done with all the other tapes), and immediately
halt when that worktape reaches the blank to the right of the last 0.2
Furthermore, many questions about the eventual behavior of P on input x are decidable as long
as the question contains a “time bound” that restricts the search, for example, given P and x:
• Does P (x) halt before |x|3 steps? Note this is not the same as asking whether P is an
O(n3 )-time program. It could take longer than 2n on some other inputs and therefore not be
O(n3 )-time-bounded, even though it halts in n3 steps on this one input x.
• Is q4 the fifth state P (x) visits?3 Equivalently, is 320 the fifth line of code P (x) executes?
• Does P (x) (interpreted as a single-tape TM) ever move its tape head beyond position n3 ?
(where n = |x|) This one takes a bit more work to see.
Suppose P (x) does not move its tape head beyond position n3 . Then P (x) has a finite
3
number of reachable configurations, specifically, |Q| · n3 · |Γ|n , since there are |Q| states, n3
3
possible tape head positions, and |Γ|n possible values for the tape content. If one of these
configurations repeats, then P runs forever. Otherwise, by the pigeonhole principle, P (x) can
3
visit no more than |Q| · n3 · |Γ|n configurations before halting.
So to decide if P (x) ever moves its tape head beyond position n3 , run it until it either moves
its tape head beyond position n3 (so the answer is “yes”) or repeats a configuration (so the
answer is “no”). If neither of these happens, it will halt and the answer is “no”.
2
Note this is not the same as asking if P has running time O(n3 ); it is asking whether this one particular method
is employed to guaranteeing the running time is O(n3 ). If P simply had three nested for loops, each counting up to
n, but no “self-clocking” as described, then the answer to this question would be “no” even though P is an O(n3 )
program.
3
Note that this question is even decidable: given P , is q4 the fifth state visited on every input? Although there are
an infinite number of inputs, the fifth transition can only read some bounded portion of the input, so by searching
over the finite number of possibilities for the scanned portion of the input, we can answer the question without having
to worry about what the input looks like outside the scanned portion.
5.1. UNDECIDABLE PROBLEMS FROM “LANGUAGE THEORY” (COMPUTABILITY) 95
5.1.1
A Non-c.e. Language
The language HALTTM is c.e., but not decidable. Since it is not decidable, its complement is not
decidable. Is its complement HALTTM c.e.? In other words, is HALTTM co-c.e.?
We will show it is not, using the following theorem.
Theorem 5.1.4. A language is decidable if and only if it is c.e. and co-c.e.
Proof. We prove each direction separately.
(decidable =⇒ c.e. and co-c.e.): Any decidable language is c.e., and the complement of any
decidable language is decidable, hence also c.e.
(c.e. and co-c.e. =⇒ decidable): Let A be c.e. and co-c.e., let M1 be a TM recognizing A,
and let M2 be a TM recognizing A. Define the TM M as follows. On input w, M runs M1 (w)
and M2 (w) in parallel. One of them will accept since either w ∈ A or w ∈ A. If M1 accepts
first, then M accepts w, and if M2 accepts w first, then M (w) rejects. So M decides A.
The second part, running two TMs in parallel, can be implemented via the Python code:
1 import threading , Queue
2
3 def d e ci d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer , w ) :
4
""" recognizer is function recognizing some language A , and
5
comp_recognizer is function recognizing complement of A . """
6
outputs_queue = Queue . Queue ()
7
8
def re c o g n i z e r _ a d d _ t o _ q u e u e () :
9
output = recognizer ( w )
10
if output == True :
11
outputs_queue . put ( " w in A " )
12
13
def c o m p _ r e c o g n i z e r _ a d d _ t o _ q u e u e () :
14
output = comp_recognizer ( w )
15
if output == True :
16
outputs_queue . put ( " w not in A " )
17
18
t1 = threading . Thread ( target = r e c o g n i z e r _ a d d _ t o _ q u e u e )
19
t2 = threading . Thread ( target = c o m p _ r e c o g n i z e r _ a d d _ t o _ q u e u e )
20
t1 . daemon = t2 . daemon = True
21
t1 . start ()
22
t2 . start ()
23
24
# exactly one of the threads will put a message in the queue
25
message = outputs_queue . get ()
26
if message == " w in A " :
96
CHAPTER 5. REDUCIBILITY
27
return True
28
elif message == " w not in A " :
29
return False
30
else :
31
raise AssertionError ( " should not be reachable " )
32
33 def c r e a t e _ d e c i d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer ) :
34
def decider ( w ) :
35
return d e c i d e r _ f r o m _ r e c o g n i z e r s ( recognizer , comp_recognizer , w )
36
return decider
We can test it as follows. Here, we make recognizers for the problem A = “is a given number
prime?”, and its complement A = “is a given number composite?”, and we intentionally let the
recognizers run forever when the answer is no, in order to demonstrate that the decider returned
by create_decider_from_recognizers still halts.4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
def loop () :
""" Loop forever . """
while True :
pass
def prime_recognizer ( n ) :
""" Check if n is prime . """
if n < 2: # 0 and 1 are not primes
loop ()
for x in xrange (2 , int ( n **0.5) +1) :
if n % x == 0:
loop ()
return True
def compo s i t e _ r e c o g n i z e r ( n ) :
if n < 2: # 0 and 1 are not primes
return True
for x in xrange (2 , int ( n **0.5) +1) :
if n % x == 0:
return True
loop ()
# WARNING : in ipython notebook , this continues to consume resources
# after it halts because it starts up threads that run forever
4
If you run this from the command line, it will halt as expected. Unfortunately, if you run this code within ipython
notebook, the threads continue to run even after the notebook cell has been executed, since the threads run as long as
the program that created them is still going, and the “program” is considered the whole notebook, not an individual
cell. To shut down these threads and keep them from continuing to run and consume resources, you will have to
shutdown the ipython notebook.
5.1. UNDECIDABLE PROBLEMS FROM “LANGUAGE THEORY” (COMPUTABILITY) 97
25
prime_decider = c r e a t e _ d e c i d e r _ f r o m _ r e c o g n i z e r s ( prime_recognizer ,
compos i t e _ r e c o g n i z e r )
26 for n in range (2 ,20) :
27
print " {:2} is prime ? {} " . format (n , prime_decider ( n ) )
Corollary 5.1.5. HALTTM is not c.e.
Proof. HALTTM is c.e. If HALTTM were c.e., then HALTTM would be decidable by Theorem 5.1.4,
a contradiction.
Note the key fact used in the proof of Corollary 5.1.5 is that the class of c.e. languages is not
closed under complement. Hence any language that is c.e. but not decidable (such as HALTTM )
has a complement that is not c.e.
LECTURE: end of day 18
98
CHAPTER 5. REDUCIBILITY
Chapter 4
Decidability
4.2
4.2.2
The Halting Problem
Diagonalization
Cantor considered the question: given two infinite sets, how can we decide which one is “bigger”?
We might say that if A ⊆ B, then |A| ≤ |B|.1 But this is a weak notion, as the converse does not
hold even for finite sets: {1, 2} is smaller than {3, 4, 5}, but {1, 2} 6⊆ {3, 4, 5}.
Cantor noted that two finite sets have equal size if and only if there is a bijection between them.
Extending this slightly, one can say that a finite set A is strictly smaller than B if and only if there
is no onto function f : A → B.
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
3
1
4
2
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
1
2
3
4
5
Figure 4.1: All the functions f : {1, 2} → {3, 4, 5}. None of them are onto.
Figure 4.1 shows that there is no onto function f : {1, 2} → {3, 4, 5}: there are only two values
of f , f (1) and f (2), but there are three values 3,4,5 that must be mapped to, so at least one of 3,
4, or 5 will be left out (will not be an output of f ), so f will not be onto. We conclude the obvious
fact that |{1, 2}| < |{3, 4, 5}|.
Conversely, if there is an onto function f : A → B, then we can say that |A| ≥ |B|. For instance,
f (3) = f (4) = 1 and f (5) = 2 is an onto function f : {3, 4, 5} → {1, 2}, so we conclude the obvious
fact that |{3, 4, 5}| ≥ |{1, 2}|.
Since the notion of onto functions is just as well defined for infinite sets as for finite sets, this
gives a reasonable notion of how to compare the cardinality of infinite sets.
Let’s consider two infinite sets: N = {0, 1, 2, . . . , } and Z = {. . . , −2, −1, 0, 1, 2, . . .}.
1
For instance, we think of the integers as being at least as numerous as the even integers.
99
100
CHAPTER 4. DECIDABILITY
What’s an onto function from Z to N? The absolute value function works: f (n) = |n|. In fact,
for any sets A, B where A ⊆ B, we have |A| ≤ |B|, i.e., there is an onto function f : B → A. What
is it? One choice is this: pick some fixed element a0 ∈ A. Then define f for all x ∈ B by f (x) = x
if x ∈ A, and f (x) = a0 otherwise. For f : Z → N, one possibility is to let f (n) = n if n ≥ 0 and
let f (n) = 0 if n < 0.
How about going the other way? What’s an onto function from N to Z? This doesn’t look quite
as easy; it seems like there are “more” integers than nonnegative integers, since N ( Z. But, we
can find an onto function:
f (0) = 0
f (1) = 1
f (2) = −1
f (3) = 2
f (4) = −2
f (5) = 3
f (6) = −3
...
Symbolically, we can express f as f (n) = −n/2 if n is even and dn/2e if n is odd. But functions
don’t always need to be expressed symbolically. The partial enumeration above is a perfectly clear
way to express the same function.
What about N and Q+ (the positive rational numbers)? Since N ⊂ Q+ , |N| ≤ |Q|. What about
the other way? Is there an onto function f : N → Q+ ?
With N, there’s one nice way to think about onto functions f with N as the range. We can
think of defining f : N → Q+ via r0 = f (0), r1 = f (1), . . ., where each qn ∈ Q+ . In other words,
we can describe f by describing a way to enumerate all the rational numbers in order r0 , r1 , . . ..
What order should we pick?
Each rational number r is defined by a pair of integers p, q ∈ Z+ , where r = pq . The following
order doesn’t work: r0 = 11 , r1 = 21 , r2 = 13 , r3 = 14 , . . . , r? = 21 , r? = 22 , r? = 23 , . . . There’s infinitely
many possible denominators q, so we can’t get through them all and then switch the numerator.
We need some way of changing both numerator and denominator to make sure all pairs of integers
get “visited” in this list.
Here’s on way to do it: 11 , 12 , 21 , 31 , 22 , 13 , 14 , 23 , 32 , 41 , 15 , 24 , 33 , 42 , 51 , 16 , . . . In other words, enumerate all
p, q ∈ Z+ where p + q = 2 (which is just p = q = 1), then all p, q ∈ Z+ where p + q = 3 (of which
there are 2), then all p, q ∈ Z+ where p + q = 4 (of which there are 3), then all p, q ∈ Z+ where
p + q = 5 (of which there are 4), etc.
So it seems with all these infinite sets, we can find an onto function from one to the other.
Perhaps |A| = |B| for all infinite sets?
The following theorem, proved by Cantor in 1891, changed the course of science.
Theorem 4.2.1. Let X be any set. Then there is no onto function f : X → P(X).
Proof. Let X be any set, and let f : X → P(X). It suffices to show that f is not onto.
4.2. THE HALTING PROBLEM
101
Define the set
D = { a ∈ X | a 6∈ f (a) } .
Let a ∈ X be arbitrary. Since D ∈ P(X), it suffices to show that D 6= f (a). By the definition of
D,
a ∈ D ⇐⇒ a ∈
6 f (a),
so D 6= f (a).
The interpretation is that |X| < |P(X)|, even if X is infinite.
This proof works for any set X at all. In the special case where X = N, we can visualize why
this technique is called “diagonalization”. Suppose for the sake of contradiction that there is a onto
function f : N → P(N); then we can enumerate the subsets of N in order S0 = f (0), S1 = f (1),
S2 = f (2), . . .. Each set S ⊆ N can be represented by an infinite binary sequence χS , where the
n’th bit χS [n] = 1 ⇐⇒ n ∈ S. Those sequences are the rows of the following infinite matrix:
S0
S1
S2
S3
..
.
= {1, 3, . . .}
= {0, 1, 2, 3, . . .}
= {2, . . .}
= {0, 2, . . .}
Sa = D = {0, 3, . . .}
0
0
1
0
1
1
1
1
0
0
2
0
1
1
1
3
1
1
0
0
...
...
...
...
...
..
.
a
1
0
0
1
...
?
If D is in the range of f , then Sa = D for some a ∈ N. But D is defined so that χD is the bitwise
negation of the diagonal of the above matrix. So if D appears as row a, this gives a contradiction
when we ask what is the bit at row a and column a.
For two sets X, Y , we write |X| < |Y | if there is no onto function f : X → Y . We write
|X| ≥ |Y | if it is not the case that |X| < |Y |; i.e., if there is an onto function f : X → Y . We write
|X| = |Y | if there is a bijection (a 1-1 and onto function) f : X → Y .2
We say X is countable if |X| ≤ |N|;3 i.e., if X is a finite set or if |X| = |N|.4 We say X is
uncountable if it is not countable; i.e., if |X| > |N|.5
Stating that a set X is countable is equivalent to saying that its elements can be listed; i.e.,
that it can be written X = {x0 , x1 , x2 , . . .}, where every element of X will appear somewhere in
the list.6
Observation 4.2.2. |N| < |R|; i.e., R is uncountable.
2
By a result known as the Cantor-Bernstein Theorem, this is equivalent to saying that there is an onto function
f : X → Y and another onto function g : Y → X; i.e., |X| = |Y | if and only if |X| ≤ |Y | and |X| ≥ |Y |.
3
Some textbooks define countable only for infinite sets, but here we consider finite sets to be countable, so that
uncountable will actually be the negation of countable.
4
It is not difficult to show that for every infinite countable set has the same cardinality as N; i.e., there is no
infinite countable set X with |X| < |N|.
5
The relations <, >, ≤, ≥, and = are transitive: for instance, (|A| ≤ |B| and |B| ≤ |C|) =⇒ |A| ≤ |C|.
6
This is because the order in which we list the elements implicitly gives us the bijection between N and X:
f (0) = x0 , f (1) = x1 , etc.
102
CHAPTER 4. DECIDABILITY
Proof. By Theorem 4.2.1, |N| < |P(N)|, so it suffices to prove that |P(N)| ≤ |R|;7 i.e., that there is
an onto function f : R → P(N).
Define f : R → P(N) as follows. Each real number r ∈ R has an infinite decimal expansion.8
For all n ∈ N, let rn ∈ {0, 1, . . . , 9} be the nth digit of the decimal expansion of r. Define f (r) ⊆ N
as follows. For all n ∈ N,
n ∈ f (r) ⇐⇒ rn = 0.
That is, if the nth digit of r’s binary expansion is 0, then n is in the set f (r), and n is not in the
set otherwise. Given any set A ⊆ N, there is some some number rA ∈ R whose decimal expansion
has 0’s exactly at the positions n ∈ A, so f (rA ) = A, whence f is onto.
Continuum Hypothesis. There is no set A such that |N| < |A| < |P(N)|.
More concretely, this is stating that for every set A, either there is an onto function f : N → A,
or there is an onto function g : A → P(N).
Interesting fact: Remember earlier when we stated that Gödel proved that there are true
statements that are not provable? The Continuum Hypothesis is a concrete example of a statement
that, if it is true, is not provable, nor is its negation. So it will forever remain a hypothesis; we can
never hope to prove it either true or false.
Theorem 4.2.1 has immediate consequences for the theory of computing.
Observation 4.2.3. There is an undecidable language L ⊆ {0, 1}∗ .
Proof. The set of all TM’s is countable, as is {0, 1}∗ . By Theorem 4.2.1, P({0, 1}∗ ), the set of all
binary languages, is uncountable. Therefore there is a language that is not decided by any TM.
4.2.3
The Halting Problem is Undecidable
Observation 4.2.3 shows that some undecidable language must exist. However, it is more challenging to exhibit a particular undecidable language. In the next theorem, we use the technique of
diagonalization directly to show that
HALTTM = { hM, wi | M is a TM and M halts on input w }
is undecidable.
Theorem 4.2.4. HALTTM is undecidable.
Proof. Assume for the sake of contradiction that HALTTM is decidable, via the TM H. Then H
accepts hM, wi if M halts on input w, and H rejects hM, wi if M runs loops on input w.
Define the TM D as follows. On input hM i a TM, D runs H(hM, M i),9 and does the opposite.10
7
Actually they are equal, but we need not show this for the present observation.
This expansion need not be unique, as expansions such 0.03000000 . . . and 0.02999999 . . . both represent the
3
number 100
. But whenever this happens, exactly one representation will end in an infinite sequence of 0’s, so take
this as the “standard” decimal representation of r.
9
That is, D runs H to determine if M halts on the string that is the binary description of M itself.
10
In other words, halt if H rejects, and loop if H accepts. Since H is a decider (by our assumption), D will do one
of these.
8
4.2. THE HALTING PROBLEM
103
Now consider running D(hDi). Does it halt or not? We have
D(hDi) halts ⇐⇒ H accepts hD, Di
defn of H
⇐⇒ D(hDi) does not halt,
defn of D
a contradiction. Therefore no such TM H exists.
The following Python code implements this idea:
1 def H (M , w ) :
2
""" Function that supposedly decides whether M ( w ) halts . """
3
raise N o t Im p l em e n te d E rr o r ()
4
5 def D ( M ) :
6
""" " Diagonalizing " function ; given a function M , if M halts
7
when given it self as input , run forever ; otherwise halt . """
8
if H (M , M ) :
9
while True : # loop forever
10
pass
11
else :
12
return
# halt
13
14 D ( D ) # behavior not well - defined ! So H cannot be implemented .
It is worth examining the proof of Theorem 4.2.4 to see the diagonalization explicitly.
We can give any string as input to a program. Some of those strings represent other programs,
and some don’t. We just want to imagine what happens when we run a program Mi on an input
hMj i that is the source code of another program Mj . If Mi is not even intended to handle program
inputs (for instance if it tests for primality, or Hamiltonian paths in graphs), the behavior of Mi
on source code inputs may not be interesting, but nonetheless, Mi either halts on input hMj i, or it
doesn’t.
We then get an infinite matrix describing all these possible behaviors.
hM0 i
hM1 i
hM2 i
hM3 i
M0
loop
halt
loop
halt
M1
halt
halt
halt
halt
M2
loop
loop
halt
loop
M3
halt
loop
halt
loop
..
.
D
...
..
halt
loop
loop
halt
hDi
.
...
?
104
CHAPTER 4. DECIDABILITY
If the halting problem were decidable, then the program D would be implementable, and it
would be the element-wise opposite of the diagonal of this matrix. But this gives a contradiction
since the entry in the diagonal corresponding to D would then be undefined. Since D can be
implemented if HALTTM is decidable, this establishes that HALTTM is not decidable.
LECTURE: end of day 19