Professional Documents
Culture Documents
Complexity
Theory
(Note that the set of optimal solutions could be empty, since the maximum
need not exist. The minimum always exists, since we mP only attains val-
ues in N+ . In the following we will assume that there are always optimal
1
Some authors also assume that for all x ∈ IP , SP (x) 6= ∅. In this case the class NPO
defined in the next section would be equal to the class exp-APX (defined somewhere else).
3
4 1. Complexity of optimization problems
solutions provided that SP (x) 6= ∅.) The objective value of any y ∈ S∗ (x) is
denoted by OPTP (x).2
Given an optimization problem P , there are (at least) three things one
could do given a valid instance x:
The first task seems to be the most natural one. Its precise formalization
is however a subtle task. One could compute the function F : I → P({0, 1}∗ )
mapping each x to its set of optimal solutions S∗ (x). However S∗ (x) could
be very large (or even infinite). Moreover, one is almost always content with
only one optimal solution. A cut of F is any function f : I → Σ∗ that maps
every x to some y ∈ S∗ (x). We say that we solve the construction problem
associated with P if there is a cut of F that we can compute efficiently.3
It turns out to be very useful to call such a cut again P and assume that
positive statements containing P are implicitly ∃-quantified and negative
statements are ∀-quantified. (Do not worry too much now, everything will
become clear.)
The second task is easy to model. We want to compute the function
x 7→ OPT(x). We denote this function by Peval .
The third task can be modelled as a decision problem. Let
(
{hx, bin(B)i | OPT(x) ≥ B} if goal = max
Pdec =
{hx, bin(B)i | OPT(x) ≤ B} if goal = min
Definition 1.2. NPO is the class of all optimization problems P = (I, S, m, goal)
such that
2
The name m∗P (x) would be more consequent, but OPTP (X) is so intuitive and con-
venient.
3
Note that not every cut can be computable, even for very simple optimization problems
like computing minimum spanning trees and even on very simple instances. Consider a
complete graph Kn , all edges with weight one. Then every spanning tree is optimal. But
a cut that maps Kn to a line if the nth Turing machine halts on the empty word and to
a star otherwise is certainly not computable.
1.3. Example: TSP 5
It is easy to see that M indeed decides Pdec and that its running time is
polynomial.
• The set of all valid instances is the set (of suitable encodings) of all
edge-weighted complete loopless graphs G. In the special case ∆-TSP,
the edge weights should also fulfill the triangle inequality (which can
be easily checked).
Exercise 1.1. Assume that there is a polynomial time algorithm that given
an instance x of TSP, returns a Hamiltonian tour whose weight is at most
2p(n) · OPT for some polynomial p, where n is the number of nodes of the
given graph. Then P = NP. (Hint: Show that under this assumption, one
can decide whether a graph has a Hamiltonian circuit.)
1. Pdec ≤T T
P Peval and Peval ≤P Pdec .
2. Peval ≤T
P P . (Since this is a negative statement about P , it means that
Peval ≤T
P P holds for all cuts P .)
Proof. We start with the first statement: Pdec ≤T P Peval is seen easily:
On input hx, bin(B)i, we can compute OPT(x) using the oracle Peval , and
compare it with B.
Peval ≤T
P Pdec is a little trickier: Since m is polynomial time computable,
OPT(x) ≤ 2q(|x|) for some polynomial q. Using binary search, we can find
OPT(x) with q(n) oracle queries.
1.5. NP-hard optimization problems 7
For the second statement, note that when we have an optimum solution,
then we can compute OPT(x).
If Pdec is NP-complete, then the optimization problem is not harder than
the decision problem.
Theorem 1.7. Let P ∈ NPO such that Pdec is NP-complete. Then P ≤T
P
Pdec .
Proof. Assume that P is a maximization problem, the minimization
case is symmetric. Let q be a polynomial such for every x ∈ I and y ∈ S(x),
|y| ≤ q(|x|) and m(x, y) is bounded by 2q(|x|) .
For given x, fix some polynomial time computable total order on the
set {0, 1}≤q(|x|) . For y ∈ {0, 1}≤q(|x|) , let λ(y) be the rank that y has with
respect to this order.
We derive a new problem P̂ from P by defining a new objective function.
The objective function of P̂ is given by
m̂(x, y) = 2q(n)+1 m(x, y) + λ(y).
Note that the first summand is always bigger than 2q(n)+1 > λ(y). This
implies that for all x and y1 , y2 ∈ S(x), m̂(x, y1 ) 6= m̂(x, y2 ). Furthermore,
if m̂(x, y1 ) ≥ m̂(x, y2 ) then m(x, y1 ) ≥ m(x, y2 ). Thus if y ∈ Sˆ∗ (x) then
y ∈ S∗ (x), too. (Here Sˆ∗ (x) is the set of optimum solutions of x as an
instance of P̂ .)
An optimal solution y of Sˆ∗ (x) can be easily derived from OPT(x): ˆ We
ˆ
compute the remainder of the division OPT(x) with 2 q(n)+1 . This remainder
is λ(y) from which we can obtain y. Thus P ≤T T
P P̂ ≤P P̂eval .
By Theorem 1.6, P̂eval ≤T P P̂dec . Since P̂dec ∈ NP and Pdec is NP-complete
by assumption, P̂dec ≤P Pdec . Using transitivity, we get P ≤T P Pdec .
Proof. There is a problem P in NPO such that Pdec is NP-hard, for in-
stance ∆-TSP. If P would belong to PO, then also Pdec ∈ P by Theorem 1.7,
a contradiction.
2 Approximation algorithms and
approximation classes
9
10 2. Approximation algorithms and approximation classes
2. APX := O(1)-APX.
(I hope that the elegant definition above clarifies why PR was defined
for maximization problems as it is.) There is a well-known 2-approximation
algorithm for ∆-TSP that is based on minimum spanning trees, thus
∆-TSP ∈ APX.
Exercise 2.1. 1. Show that there is an algorithm for Knapsack with run-
ning time polynomial in n and P := max1≤ν≤n pν . (Compute by dy-
namic programming sets of indices I(i, p) such that
Remark 2.7. The running of the PTAS constructed in the previous exercise
is also polynomial in 1 . This is called an fully polynomial time approxima-
tion scheme (FPTAS). The corresponding complexity class is denoted by
FPTAS.
x ∈ L =⇒ f (x) ∈ L0 and
0 0
x ∈ U \ L =⇒ f (x) ∈ U \ L
12 2. Approximation algorithms and approximation classes
U = {x | OPT(x) ≤ a or OPT(x) ≥ b}
and (
{x | OPT(x) ≥ b} if goal = max
L=
{x | OPT(x) ≤ a} if goal = min
That is, we get an instance x and the promise that the objective value
is at most a or at least b and we shall decide which of these two options is
the case. There is a difference in the definition of L for maximization and
minimization problems because the yes-instances shall be the inputs with
solutions that have a ”good” objective value. We will also allow a and b two
be functions N → N that depend on |x|.
Let us see why this algorithm is correct. If φ ∈ SAT, then OPTP (x) ≤
a(x) and
mP (x, y) ≤ α(|x|) · OPTP (x) < b(|x|).
If φ ∈
/ SAT, then OPTP (x) ≥ b(|x|) and
We can always approximate TSP within 2O(|x|) where x is the given in-
stance, since with |x| symbols we can encode integers up to 2O(|x|) . Thus
TSP is contained in the class exp-APX, as defined below.
1. How does one get a clique of G from a vertex cover of the complement?
g(x, y, β) ∈ SP (x),
Exercise 2.6. Essentially the same idea as in Exercise 2.4 gives a reduction
from Clique to IS. Show that this is an AP-reduction.
1. Obvious.
2. Obvious, too.
4. Obvious.
• Knapsack ∈ PTAS \ PO
Pr[V π (x, y) = 1] = 1.
y
2. For any x ∈
/ L and for all proofs π,
18
3.1. Probabilistically checkable proofs (PCPs) 19
In the theorem above, we do not use the randomness at all. The next
result, the celebrated PCP theorem [?], shows that allowing a little bit of
randomness reduces the number of queries dramatically.
Exercise 3.1. Show that PCP[O(log n), O(1)] ⊆ NP. (Hint: How many
random strings are there?)
The other direction is way more complicated, we will spend the next few
lectures with its proof. We will not present the original proof by Arora et
al. [ALM+ 98] but a recent and—at least compared to the first one—elegant
proof by Irit Dinur [Din07].
20 3. PCP and inapproximability
the verifier V to find out which bits he will query. Then we can give him all
the possible answers to the bits he queried to compute the sets Ay .
If x ∈ L, then there will be a proof π such that V π (x, y) = 1 for every
random string y. Therefore, if we set the variables of φ as given by this
proof π, then φ will be satisfied.
/ L, then for any proof π, there are at least 2r(n) /2 random strings
If x ∈
y for which V π (x, y) = 0. For each such y, one clause corresponding to a
tuple in Ay will not be satisfied. In other words, for any assignment, 2r(n) /2
clauses will not be satisfied. The total number of clauses is bounded by
(q − 2)2q+r(n) . The fraction of unsatisfied clauses therefore is
2r(n) /2
≥ ≥ 2−q−1 /(q − 2),
(q − 2)2q+r(n)
which is a constant.
“⇐=”: By Exercise 3.1, it suffices to show that NP ⊆ PCP[O(log n), O(1)].
Let L ∈ NP. By assumption, there is a polynomial time computable function
f such that
x ∈ L =⇒ f (x) is a satisfiable formula in 3-CNF,
x∈
/ L =⇒ f (x) is a formula in 3-CNF such that every assignment
satisfies at most (1 − ) of the clauses.
We construct a probabilistic verifier as follows:
Input: input x, proof π
1. Compute f (x).
2. Randomly select a clause c from f (x).
3. Interpret π as an assignment to f (x) and read the bits that belong to
the variables in c.
4. Accept if the selected clause c is satisfied. Reject otherwise.
In this chapter, we will strengthen the result of the previous one by showing
that Max-3-SAT is in fact APX-hard. We do this in several steps. First, we
show that any maximization problem in APX is AP-reducible to Max-3-SAT.
Second, we show that for every minimization problem P , there is a maxi-
mization problem P 0 such that P ≤AP P 0 . This will conclude the proof.
Our proof of the PCP-Theorem will also yield the following variant,
which we will use in the following.
23
24 A. Max-3-SAT is APX-hard
2. Else let κ0 be the largest κ such that gPCP (φx,µ(x)rκ ,a ) satisfies φx,µ(x)rκ .
(We restrict a to the variables of φx,µ(x)rκ .)
β−1 β−1
OPTMax-3-SAT (ψx ) − mMax-3-SAT (ψx , a) ≤ OPTMax-3-SAT (ψx ) ≤ kc .
β β
Let βκ denote the performance ratio of a with respect to ψx,κ , i.e., we view
a as an assignment of ψx,κ . We have
The last inequality follows from the fact that any formula in CNF has an
assignment that satisfies at least half of the clauses. This yields
c βκ − 1 β−1
· ≤ kc
2 βκ β
and finally
1
βκ ≤ .
1 − 2k(β − 1)/β
Exploiting (A.1), we get, after some routine calculations,
βκ ≤ 1 + .
where k = dbe. We have µ(x) ≤ OPTP 0 (x) ≤ (k + 1)µ(x). This means, that
A is a (k + 1)-approximation algorithm for P 0 . Hence, P 0 ∈ APX.
The AP reduction (f, g, α) from P 0 to P is defined as follows: f (x, β) = x
for all x ∈ IP . (Note that we do not need any dependence on β here). Next,
we set (
y if mP (x, y) ≤ µ(x)
g(x, y, β) =
A(x) otherwise
And finally, α = k + 1.
Let y be a β-approximate solution to x under mP 0 , that is, RP 0 (x, y) =
OPTP 0 (x)/ mP 0 (x, y) ≤ β. We have to show that RP (x, y) ≤ 1 + α(β − 1).
26 A. Max-3-SAT is APX-hard
We distinguish two cases: The first one is mP (x, y) ≤ µ(x). In this case,
(k + 1)µ(x) − mP 0 (x, y)
mP (x, y) =
k
(k + 1)µ(x) − OPTP 0 (x)/β
≤
k
(k + 1)µ(x) − (1 − (β − 1)) OPTP 0 (x)
≤
k
β−1
≤ OPTP (x) + OPTP 0 (x)
k
β−1
≤ OPTP (x) + (k + 1)µ(x)
k
≤ OPTP (x) + (β − 1)(k + 1)µ(x)/r
≤ (1 + α(β − 1)) OPTP (x).
mP (x, g(x, y)) = mP (x, A(y)) ≤ b OPTP (x) ≤ (1 + α(β − 1)) OPTP (x).
Thus, P ≤AP P 0 .
Now Theorems A.2 and A.3 imply the following result.
2. Use this to show that if Clique ∈ APX, then Clique ∈ PTAS. Now
apply Exercise A.1.
The next chapters follow the ideas of Dinur quite closely. The Diplomarbeit
by Stefan Senitsch [Sen07] contains a polished proof with many additional
details which was very helpful in preparing the next chapters.
Let Bn denote the set of all Boolean functions {0, 1}n → {0, 1} and Bn−
the set of all functions {−1, 1}n → {−1, 1}. These are essentially the same
objects, x 7→ −2x + 1 (or x 7→ (−1)x ) is a bijection that maps 1 to −1 and
0 to 1. Note that we identify 1 (true) with −1 and 0 (false) with 1. For our
purposes, it is more convenient to work with Bn− and this interpretation of
true and false.
Definition 4.1. Let x ∈ {−1, 1}n . The long code of x is the function
LCx : Bn− → {−1, 1} given by LCx (f ) = f (x) for all f ∈ Bn− .
The long code was invented by Bellare, Goldreich, and Sudan [BGS98].
2n
By ordering the functions in Bn− , we can view LCx as a vector in {−1, 1}2
and we will tacitly switch between these two views.
2n
The relative distance between two elements in A, B ∈ {−1, 1}2 is
i.e., it is the probability that the vectors A and B differ at a random position.
2n
Furthermore, we define a scalar product on {−1, 1}2 by
n
X
hA, Bi = E [AB] = 2−2 A(f ) · B(f ).
−
f ∈Bn −
f ∈Bn
27
28 4. The long code
Second,
n
X X Y Y X Y
hχS , χT i = 2−2 χS (f )χT (f ) = f (x)· f (x) = f (x).
− − x∈S x∈T − x∈S∆T
f ∈Bn f ∈Bn f ∈Bn
Thus, the χS form an orthonormal family. Since its size equals the
2n
dimension of {−1, 1}2 , it is spanning, too.
Once we have an orthonormal family, we can look at Fourier expansions.
The Fourier coefficients of a function A : Bn− → {−1, 1} are given by
n
X Y
ÂS = hA, χS i = 2−2 A(f ) f (x).
− x∈S
f ∈Bn
n
In what follows, we will usually consider folded strings. A ∈ {−1, 1}2
is called folded over true if for all f , A(−f ) = −A(f ). Let ψ : {−1, 1}n →
{−1, 1}. A is called folded over ψ if for all f , A(f ) = A(f ∧ ψ). A is simply
called folded, if it is folded over true and over ψ. (This assumes that ψ is
clear from the context.) If a string is folded, then we only need to specify
it on a smaller set of positions Dψ defined as follows: Let D be a set of
functions that contains exactly one function of every pair f and −f and let
Dψ = {f ∈ D | f = f ∧ ψ}.
1
To achieve this, pick a function, put it into the one set and its image under the
involution into the other. This is possible, since the involution has no fixed points. Repeat
until all function are put into one of the two sets. I am sorry that you had to read this,
but I was puzzled.
4.1. Properties of folded strings 29
and
A(−f ) = LCa (−f ) = −f (a) = − LCa (f ) = −A(f ).
Lemma 4.4. Let ψ ∈ Bn− , let A ∈ Vn be folded, and let S ⊆ {−1, 1}n . We
have:
1. Ef ∈Bn− [A(f )] = 0.
Input: folded string A : Bn− → {−1, 1}, ψ : {−1, 1}n → {−1, 1}.
1. Let τ = 1/100.
4. Let h = µ · g.
The following lemma show that if the test T accepts, then A is close to
the long code of a satisfying assignment of A or its negation.
Lemma 5.1. There exists a constant K ∗ such that the following holds:
If Pr[T rejects (A, ψ)] ≤ for small enough > 0, then there is an a ∈
{−1, 1}n with ψ(a) = −1 such that either δ(A, LCa ) < K ∗ or δ(A, − LCa ) <
K ∗ .
Proof. T accepts iff not all of A(f ), A(g) and A(h) equal 1. This is
equivalent to 1 − 81 (1 + A(f ))(1 + A(g))(1 + A(h)) = 1 (and not 0; the
30
5.1. First test 31
The pairs (f, g) and (f, h) are independent ((g, h) is, however, not!), thus
So we should analyze the terms E[χS (g)χT (h)]. We start with the case
S 6= T . Let z ∈ S \ T , the other case is symmetric. We have
Y Y
E[χS (g)χT (h)] = E g(x) h(y)
x∈S y∈T
Y Y
= E g(z) g(x) h(y)
x∈S\{z} y∈T
Y Y
= E[g(z)] E g(x) h(y)
x∈S\{z} y∈T
= 0,
32 5. Long code tests
since g(z) and the remaining product are independent and E[g(z)] = 0,
because g is random. If T = S, then
" #
Y
E[χS (g)χS (h)] = E g(x)h(x)
x∈S
" #
Y
=E g 2 (x)µ(x)
x∈S
" #
Y
=E µ(x)
x∈S
Y
= E[µ(x)]
x∈S
Y
= (Pr[µ(x) = 1] − Pr[µ(x) = −1])
| {z } | {z }
x∈S
= 12 (1−τ ) = 12 (1+τ )
Y
= (−τ )
x∈S
and
X
|W | ≤ |Â2S ||ÂR |(1 − τ )|R| τ |S\R|
R⊆S⊆{−1,1}n
X X
≤ |Â2S | |ÂR |(1 − τ )|R| τ |S\R| . (5.5)
S R⊆S
since ÂT = 0 for even T . We get a better bound for the first sum by
analyzing (5.5), since |A2R | ≤ 1.
Set = Pr[T rejects]. Then (5.1) yields −1 + τ + 8 ≥ W . For small
enough , this yields W < 0 and therefore we get
8
ρ≤ √ ≤ K.
(1 − τ )(1 − 1 − τ )
8√
with K = (1−τ )(1− 1−τ )
, which is constant.
Finally, we will apply Theorem 5.2. For small enough , 1 − Lρ will
be greater than 0. Since Â∅ = 0, the first case in the theorem cannot
34 5. Long code tests
happen. Hence |Â2{a} | ≥ 1 − Lρ for some a ∈ {−1, 1}n . Thus, either Â{a} ≥
√
1 − Lρ ≥ 1 − Lρ or −Â{a} ≥ 1 − Lρ. In the first case, we get
In the second case, we get δ(A, − LCa ) ≤ KL/2 · in the same way. By
Lemma 4.4, ψ(a) = −1, since Â{a} 6= 0.
Theorem 5.3. The long code test T has the following properties:
1. If a ∈ {−1, 1}n with ψ(a) = −1, then T accepts LCa and ψ with
probability 1.
Proof. We first prove 1: Let a ∈ {−1, 1}n with ψ(a) = −1 and let
A = LCa . By Lemma 4.3, A is folded. If A(f ) = f (a) = −1, then T accepts.
If A(f ) = f (a) = 1, then µ(a) = −1. Hence, A(h) = h(a) 6= g(a) = A(g).
Thus one of these two values equals −1 and T accepts, too.
Now we come to 2: Assume that the assertion does not hold. Then for
all c > 0, there is a δc such that δ(A, LCa ) ≥ δc for all a ∈ {−1, 1}n with
ψ(a) = −1 and T rejects with probability < cδc .
We choose c < 1/K ∗ small enough and apply Lemma 5.1. There is an
a ∈ {−1, 1}n such that ψ(a) = −1 and δ(A, LCa ) < K ∗ ≤ cK ∗ δc < δc or
δ(A, − LCa ) < δc . The first possibility is ruled out by the assumption about
δc . Hence δ(A, − LCa ) < δc .
5.2. Second test 35
We have
Pr[T rejects (− LCa , ψ)] = Pr[LCa (f ) = LCa (g) = LCa (h) = −1]
= Pr[f (a) = g(a) = h(a) = −1]
= Pr[f (a) = −1] · Pr[g(a) = 1] · Pr[h(a) − 1|f (a) = g(a) = −1]
1
= Pr[µ(a) = 1|f (a) = −1]
4
1
= (1 − τ ).
4
This implies that
2. There is a c > 0 such that for all 0 < δ ≤ 1, the following holds: If
δ(a, a0 ) ≥ δ for all a0 with ψ(a0 ) = −1, then for all folded A ∈ Bn− ,
Pr[T 0 rejects (a, A, ψ)] ≥ c · δ.
36 5. Long code tests
The last inequality is true since we can bound both probabilities by δ/4.
The first one by assumption and also the second one, since f 7→ f ej is a
bijection of Bn− .
6 Assignment Tester
37
38 6. Assignment Tester
39
40 7. Expander graphs
Lemma 7.2. Let G be a d-regular graph with adjacency matrix A and eigen-
values λ1 ≥ λ2 ≥ · · · ≥ λn .
and
A · 1n = d · 1n .
Thus d is an eigenvalue and 1n is an associated eigenvector.
Let λ be any eigenvalue and b be an associated eigenvector. We can scale
b in such a way that the largest entry of b is 1. Let this entry by bv . Then
X X
λ = λ · bv = av,u bu ≤ av,u = d.
u∈V u∈V
kAbk
λ(G) = max .
b⊥1n kbk
h(G)2
λ(G) ≤ d − .
2d
To prove the theorem, it is sufficient to prove
by Theorem 7.4. The proofs of both inequalities are very similar, we will
only show the first one. Let
B = dI − A
B 0 = dI + A
Proof. We have
X
f T Bf = dfv2 − f T Af
v∈V
X X X X
= (fu2 + fv2 ) + fv2 − 2fu fv + fv2
e={u,v} e={v} e={u,v} e={v}
X
2
= (fu − fv ) .
e={u,v}
1
We have to exclude bipartite graphs, which have λn = −d but can have edge expansion
> 0. Our prove will break down if λn = −d, because (d+λ) must not be zero when proving
the counter part of (7.3).
7.2. Edge expansion 43
Let β0 < β1 < · · · < βr be the different values that f attains. Let
Uj = {u ∈ V | fu ≥ βj },
Uj0 = {u ∈ V | fu ≤ βj }
be the set of all nodes whose value fu is at least or at most βj , respectively.
Lemma 7.9.
r
X
F = |E(Uj , Ūj )|(βj2 − βj−1
2
)
j=1
r−1
X
F = |E(Uj0 , Ūj0 )|(βj+1
2
− βj2 )
j=0
by the Cauchy–Schwartz and Lemma 7.8 We can bound the second factor
by
s X s X
(fu + fv )2 ≤ 2 (fu2 + fv2 )
e={u,v} e={u,v}
s X
≤ 2d fv2
v∈V
√
≤ 2dkf k .
Proof. We only show the statement for fv ≥ 0, the other case is com-
pletely similar. Since | supp(f )| ≤ n/2, we have β0 = 0 and |Uj | ≤ n/2 for
j > 0. We have |E(Uj , Ūj )| ≥ h(g)|Uj |. By Lemma 7.9,
r
X
F = |E(Uj , Ūj )|(βj2 − βj−1
2
)
j=1
r
X
≥ h(G) |Uj |(βj2 − βj−1
2
)
j=1
r−1
X
= h(G) βj2 (|Uj | − |Uj+1 |) +βr2 |Ur |
| {z }
j=1
=|{v|fv =βj }|
2
= h(G)kf k .
Finally, we will now prove (7.1) and (7.2). We only show (7.1), (7.2) is
proven in the same manner. Let λ < d be an eigenvector of A. d − λ is an
eigenvector of B = dI − A and every eigenvector of A associated with λ is
an eigenvector of B associated with d − λ. Let g be such an eigenvector. g
is orthogonal to 1n . We can assume that g has at most n/2 entries that are
≥ 0, otherwise we consider −g instead. We define
(
gv if gv ≥ 0
fv =
0 otherwise
7.2. Edge expansion 45
Since fv = 0 for v ∈
/ W , this implies
X
f T Bf = fv (Bf )v
v∈V
X
≤ (d − λ) fv gv
v∈V
X
≤ (d − λ) fv2
v∈V
= (d − λ)kf k2 . (7.3)
Let W` be the set of all `-walks in G. We have |W` | = nd` , since a path
is uniquely specified by its start node (n choices) and a vector {1, . . . , d}`
which tells us which of the d edges of the current node is the next in the
given walk. For this, we number the d edges incident with a node arbitrarily
from 1 to d. It is now clear that method RW generates the walks according
to the uniform distribution on RW.
Lemma 8.1. Method RW returns each walk W ∈ W` with a probability of
1/(nd` ).
Instead of choosing a start node, we can choose a node in the middle.
This modified method RW0 also generates the uniform distribution on all
walks of length `.
Input: d-regular graph G, ` ∈ N, 0 ≤ j ≤ `
1. Randomly choose a vertex vj .
2. For λ = j − 1, . . . , 0 choose vλ ∈ N (vλ+1 ) uniformly at random.
3. For λ = j + 1, . . . , ` choose vλ ∈ N (vλ−1 ) uniformly at random.
4. Return (v0 , . . . , v` ).
46
47
By symmetry, this is also the probability that v is the second node in a walk
that starts with an edge in F .
Second we estimate the probability yv that if we choose a random edge
incident to v, we have chosen an edge in F . This is simply
1
yv = · |{e = {u, v} | e ∈ F }|
d
2|F | 1
= · |{e = {u, v} | e ∈ F }|
d 2|F |
2|F |
≤ · xν . (8.2)
d
Now the probability distribution on the nodes after performing a walk
of length j that starts in F is given by Ãj−1 xv . (Note that xv is also the
probability that v is the second node in the walk.) The probability that the
(j + 1)th edge is in F is then given by
D E
y, Ãj−1 x (8.3)
Finally, we can start with the proof of the PCP theorem. We begin with
1
the observation that gap(1 − |E| , 1)-Max-CGS is NP-hard (over the alphabet
3
Σ = {0, 1} . Let G be a given constraint graph. We apply three procedures
to G:
G
↓ Preprocessing (G becomes an expander)
Gprep
↓ Amplification (UNSAT value gets larger, but also Σ)
Gamp
↓ Alphabet reduction (Σ = {0, 1}3 again)
Gred
If we do this O(log |E|) times, then we bring the (relative) size of the
1
gap from |E| to constant and we are done.
9.1 Preprocessing
Throughout this chapter, d0 and h0 will be “global” constants that come
out of the construction of a constant degree d0 expander Xn with constant
edge expansion h0 , see Theorem 7.6.
3. Let Eint be the edges within the copies Yv and Eext be the edges
between two different copies. For all e ∈ Eint , c1 (e) is an equality
49
50 9. The final proof
constraint that is satisfied iff both nodes have the same value (“internal
constraints”). For all e ∈ Eext , c1 (e) is the same constraint as the
original edge has (“external constraint”).
P
We have |V1 | ≤ v∈V d(v) ≤ 2|E| and |E1 | ≤ |V1 |(d0 + 1) ≤ 2|E|(d0 + 1).
Thus size(G1 ) = O(size(G)).
Next, we show that UNSAT(G1 ) ≤ UNSAT(G). Chose an assignment
σ : V → Σ with UNSAT(G) = UNSATσ (G) (i.e., an optimal assignment).
We define σ1 : V1 → Σ by σ1 (u) = σ(v) iff u belongs to V (Yv ), the vertex
set of the copy Yv that replaces v. In this way, all internal constraints
are fulfilled by construction. Every external constraint is fulfilled iff it was
fulfilled under σ in G. Therefore,
where the second equation follows from the fact that |E| ≤ |E1 |.
The interesting case is γ · UNSAT(G) ≤ UNSAT(G1 ). Let σ1 : V1 → Σ
be an optimum assignment. We define σ : V → Σ by a majority vote:
σ(v) is the value a ∈ Σ that is the most common among all values σ1 (u)
with u ∈ V (Yv ). Ties are broken arbitrarily. Let F ⊆ E be the set of
all unsatisfied constraints under σ and F1 ⊆ E1 the set of all unsatisfied
constraints under σ1 . Let S = {u ∈ V (Yv ) | v ∈ V, σ1 (u) 6= σ(v)} and
S v = S ∩ V (Yv ), i.e., all the looser nodes that voted for a different value for
σ(v). Let α := |F |/|E| = UNSATσ (G). We have
α
|E| + |S| > |F1 | + |S| ≥ α|E|.
2
Thus |S| ≥ α2 |E|. Consider some v ∈ V and let Sav = {u ∈ S v | σ1 (u) = a}.
We have S v = a6=σ(v) Sav . Because we took a majority vote, |Sav | ≤ 12 |V (Yv )|
S
for all a 6= σ1 (u). As Yv is an expander,
where the complement is taken “locally”, i.e., S̄av = V (Yv ) \ Sav . Since we
have equality constraints on all internal edges, all edges in |E(Sav , S̄av )| are
not satisfied. Thus,
1X X
|F1 | ≥ |E(Sav , S̄av )|
2
v∈V a6=σ(v)
X1 X
≥ h0 · |Sav |
2
v∈V a6=σ(v)
1 X
≥ h0 |S v |
2
v∈V
1
= h0 |S|
2
α
> h0 |E|.
4
Thus
|F1 |
UNSAT(G1 ) =
|E1 |
αh0 |E|
> ·
4 |E1 |
αh0
≥
8(d0 + 1)
h0
≥ UNSAT(G).
8(d0 + 1)
• G2 is (d + d0 + 1)-regular,
• size(G2 ) = O(size(G)),
d
• 2+2(d0 +1) · UNSAT(G) ≤ UNSAT(G2 ) ≤ UNSAT(G).
2. UNSAT(G) = 0 ⇒ UNSAT(Gt ) = 0.
1 |F | 1
min{UNSATσ (G), }≤ ≤ .
2t |E| t
By linearity of expectation,
X
E[N (W ) = 1|Nj (W ) = 1] = E[Nk (W ) = 1|Nj (W ) = 1].
k∈I
Thus
|F |
X Ω( |E| )
√ |F |
Pr[N (W ) > 0] ≥ 2 2 ≥Ω t·
√ + 1−λ/d
|E|
j∈I t
Exercise 9.1. Show that for every c > 0, there is a constant a such that
x
≥a·x
2/x + c
for all x ≥ 1.
|F |
Lemma 9.7. For all j ∈ I, PrW ∈Wt [Nj (W ) = 1] = Ω( |E| ).
|F |
P rW ∈Wt [Nj = 1] = pq
|E|
where p = PrW ∈calWt [σ̂(v0 )vj−1 = σ(vj−1 )] and q = PrW ∈calWt [σ̂(vt )vj =
σ(vj ). We are done if we can show that p and q are constant. Since both
cases are symmetric, we will only present a proof for p.
Let Xj be the random variable generated by the following process. We
start in vj−1 and perform a random walk of length j − 1. Let u be the node
that we reach. We output the opinion σ(u)vj−1 of u about vj−1 . If u has no
opinion about vj−1 (this can happen, since j can be greater than t/2; but
this will not happen too often) then we output some dummy value not in
Σ. Obviously, p = Pr[Xj−1 = σ(vj−1 )].
If j = t/2, then we reach all the nodes that have an opinion about vj−1 .
We start with the case j = t/2. In this case, both nodes v0 and vt have
an opinion about vj−1 and vj . Since σ(vj−1 ) is chosen by a majority vote,
1
p ≥ |Σ| in this case.
We will now show that for all j ∈ I, the probability Pr[Xj−1 = σ(vj−1 )]
cannot differ by too much from this, in particular, it is Ω(1/|Σ|). The self
loops will play a crucial role here, since they ensure that a random paths
with ` edges visit not more than (1 − 1/d)` different nodes. We leave the
rest of this proof as an exercise.
Lemma 9.8. There is a constant βred such that for all constraint graphs
G = ((V, E), Σ̂, c) we can construct in polynomial time a constraint graph
Gred = ((V 0 , E 0 ), {0, 1}3 , c0 ) such that
Proof. Let k = |Σ̂|. We identify every element of Σ̂ with a string {0, 1}k
with k = log |Σ̂|. Then we map each string to {0, 1}` with ` = O(k) using
the code from the beginning of this section. We replace every node v ∈ V by
a sequence of nodes v1 , . . . , v` . With every edge e = (u, v) ∈ E, we identify
a function φe : {0, 1}` × {0, 1}` → {0, 1}. φe (x, y) is true iff x and y are
code words corresponding to values a, b ∈ Σ̂ such that (a, b) ∈ c(e). For
each such φe , we construct an assignment tester (see Theorem 6.3) Ge =
((Ve , Ee ), {0, 1}3 , ce ). The graph Gred is the union of all these assignment
testers. The ` nodes v1 , . . . , v` representing v, v ∈ V , are shared by all
assignment tester corresponding to an edge that contains v. The constraints
of each edge in Gpre are the constraints of the Ge . We can assume that each
Ge has the same number of edges, say, r. Thus Gpre has r|E| edges.
Each assignment tester is a constant size graph whose size only depends
on |Σ̂|. This immediately yields the upper bound on the size of Gpre .
For the second statement of theorem, consider an optimal assignment σ
of G. We construct an assignment σ 0 for G0 as follows: If σ satisfies the
constraint ce , then, by the properties of assignment testers, we can extend
σ in such a way that all constraints of Ge are satisfied. If σ does not satisfy
ce , then we extend σ in any way. In the worst case, no constraints of Ge are
satisfied. Thus for every constraint satisfied in G, at least r constraints are
9.4. Putting everything together 57
2
βpre βamp 2
βred
with a = 4 .
With this lemma, the proof of the PCP theorem follows easily. We start
with the observation that the decision version of constraint graph satisfac-
tion is NP-complete, i.e., gap(1 − 1/|E|, 1)-Max-CGS is NP-hard. Let G be
an input graph. If we now apply the above lemma log |E| times, we get
an graph G0 that can be computed in time polynomial in size(G) with the
property that
1
UNSAT(G0 ) ≥ min{2log |E| · , a} = a
|E|
Being intractable, e.g., NP-complete, does not completely reflect the dif-
ficulty of a problem. Approximability is one way of refining the notion of
intractability: We have seen some NP-hard optimization problems, for which
finding a close-to-optimal solution is easy, and others, for which finding even
a very weak approximation is as hard as solving an NP-complete problem.
Average-case complexity is another way of refining the intractability of
problems: Unless P = NP, no efficient algorithm exists to solve an NP-
complete problem efficiently on all instances. However, we may still hope
that we can solve the problem efficiently on most instances or on typical
instances, where “typical” here means something like “sampled from a dis-
tribution that reflects practical instances.”
Another motivation for studying average-case complexity is cryptogra-
phy: A cryptographic system is secure only if any efficient attempt to break
it succeeds only with a very small probability. Thus, it does not help if a
cryptographic system is hard to break only in the worst case. In fact, most
of cryptography assumes that NP problems exist that are intractable not
only in the worst but even in the average case, i.e., on random inputs.
Bogdanov and Trevisan give a well-written survey about average-case
complexity [BT06]. They also cover connections of average-case complexity
to areas like cryptography.
59
60 10. Average-case complexity
nite set, there is no “uniform” distribution on {0, 1}? . Instead, the distribu-
tion that is commonly called the “uniform distribution” assigns probability
6 −2 −n
π2
n 2 to every string of length n ≥ 1 (or, to get rid of the factor 6/π 2 ,
1
we assign probability n(n+1) 2−n to strings of length n). We will use ensem-
bles of distributions as it will be more convenient. However, most results
hold independent of which possibility we choose. The uniform distribution
U = (Un )n∈N is then given by Un (x) = 2−n for x ∈ {0, 1}n .
In order to allow for different probability distributions, we do not fix
one, but we will consider distributional problems in the following.
Exercise 10.1. Show that there exists an ensemble D = (Dn )n∈N such that
the density functions Dn are polynomial-time computable but D ∈ / PComp
unless P = NP. The latter means that if the functions fDn are computable
in polynomial time, then P = NP.
One can argue that PSamp is the class of natural distributions: A distri-
bution is natural if we can efficiently sample from it. This, however, does not
mean that their density or distribution function is efficiently computable.
Exercise 10.5. Prove that Definition 10.4 and Levin’s definition are indeed
equivalent.
Exercise 10.6. Actually, Levin’s original definition uses a single distribu-
tion D : {0, 1}+ → [0, 1]. Given an ensemble D = (Dn )n≥1 with supp(Dn ) ⊆
{0, 1}n , we obtain a single distribution D : {0, 1}+ by setting D(x) =
6
D (x).
π 2 |x|2 |x|
A function t : {0, 1}+ → N is called polynomial on D-average if there
exists constants k and c such that
X t(x)1/k
D(x) ≤ c .
|x|
x∈{0,1}+
3COL = {G | G is 3-colorable} ,
let Gn,1/2 be the uniform distribution on graphs with n vertices, and let
G1/2 = (Gn,1/2 )n∈N . Show that (3COL, G1/2 ) ∈ AvgP.
Note: If we include every edge of n2 with a probability of p, i.e., the
n
probability of getting G = (V, E) is p|E| (1 − p)( 2 )−|E| , then this probability
distribution is called Gn,p .
Instead of having δ as part of the input, we can also have a function
δ : N → (0, 1] of failure probabilities. This leads to the following definition.
Definition 10.7. Let Π = (L, D) be a distributional problem, and let δ :
N → (0, 1]. An algorithm A is an errorless heuristic algorithm for Π with
failure probability at most δ if the following properties are fulfilled:
• For every n ∈ N and every x ∈ supp(Dn ), A(x, n) outputs either L(x)
or ⊥.
Exercise 10.9. Prove the following: For every constant c, AvgP ⊆ Avgn−c P
and HeurP ⊆ Heurn−c P.
So far, we have defined classes of problems that are tractable (to various
degrees) on average. Now we define the average-case analog of NP, which is
called DistNP.
For the proof of this theorem, we need Kolmogorov complexity (see also
the script of the lecture “Theoretical Computer Science”). To define Kol-
mogorov complexity, let us fix a universal Turing machine U . The Kol-
mogorov complexity of a string x ∈ {0, 1}? is the length of the shortest
string c such that U on input c = hg, yi outputs x. Here, g is a Gödel num-
ber of a Turing machine and y is an input for the Turing machine Mg . We
denote the length |c| by K(x).
The conditional Kolmogorov complexity K(x|z) is the length of the short-
est string c = hg, yi such that Mg on input y and z outputs x.
Example 10.12. We have K(0n ) = log n + O(1) and K(0n | bin(n)) = O(1)
and K(x) ≤ |x| + O(1).
66 10. Average-case complexity
This lower bound for the Kolmogorov complexity holds for all strings on
which A fails.
Now let x0 be the lexicographically first string of length n on which
A fails. Since L is decidable, x0 can be computed given n. This implies
K(x0 |n) = O(1). To conclude, we observe that for sufficiently large n, no
string exists on which A fails. Thus, A fails only on finitely many strings,
which proves L ∈ P.
Exercise 10.9 immediately yields the following result.
The goal of this section is to prove that DistNP = (NP, PComp) contains a
complete problem. There are three issues that we have to deal with: First,
we need an appropriate notion of reduction. Second, we have to take care
of the different probability distributions, which can differ vastly. Third, and
this is the easiest task, we need an appropriate problem with an appropriate
probability distribution.
11.1 Reductions
For reductions between distributional problems, the usual many-one reduc-
tions do not suffice: A feature that a suitable reduction should have is that
if (A, D) reduces to (A0 , D0 ) and (A0 , D0 ) ∈ AvgP, then (A, D) ∈ AvgP. Let
us consider the following example: We use the identity mapping as reduc-
tion between (A, D) and (A, D0 ). But D assigns high probability to the hard
instances, whereas D0 assigns small probability to hard instances. Then
(A, D0 ) is tractable on average, but (A, D) is not.
What we learn is that a meaningful notion of reduction must take into
account the probability distributions.
The first condition is usual for many-one reductions. The second condi-
tion forbids the scenario sketched above: Drawing strings according to Dn
and then using f (·, n) yields a probability distribution of instances for L0 .
(Of course, we just get binary strings again. But we view them as instances
for L0 .) Then the second property makes sure that no string y is generated
0
with much larger a probability than if y had been drawn according to Dm(n) .
67
68 11. Average-case completeness
The inequality holds because the reduction must fulfill the domination con-
dition. Thus, Π ∈ AvgP.
It is also not hard to show that ≤AvgP is transitive.
Lemma 11.3. Let Π = (L, D), Π0 = (L0 , D0 ), and Π00 = (L00 , D00 ) be distri-
butional problems with Π ≤AvgP Π0 and Π0 ≤ Π00 . Then Π ≤AvgP Π00 .
Proof. Let f be a reduction from Π to Π0 , and let g be a reduction from Π0
to Π00 . Let p and m the polynomials of Definition 11.1 for f , and let p0 and m0
be the polynomials for g. Obviously, h given by h(x, n) = g(f (x, n), m(n))
is polynomial-time computable and a many-one reduction from L to L00 .
To prove that it fulfills domination remains to be done. Therefore, Let
q(n) = p(n) · p0 (m(n)), and let `(n) = m0 (m(n)). The functions q and `
are obviously polynomials. Now let n be arbitrary, and let z ∈ supp(D`(n) 00 ).
Then
X X X
Dn (x) = Dn (x)
x:h(x,n)=z y:g(y,m(n))=z x:f (x,n)=y
X
0
≤ p(n)Dm(n) (y)
y:g(y,m(n))=z
≤ p(n)p0 (m(n))Dm
00 00
0 (m(n)) (z) = q(n)D`(n) (z) .
11.2. Bounded halting 69
Now log |C(x, n)| ≤ log(m(n)) + 1 and |C(x, n)| ≤ log Dn1(x) + 1 yields
1
BH
hg, C(x, n), 1q(n) i ≥ 2| −(2 {z
log `+`)
UN }· (m(n) + 1)2
· Dn (x) · Ω(1) ,
=Θ(1)
(A0 immediately rejects inputs that are not syntactically correct.) Note that
BH
(I) = U BH 1 hg, x, 1t+1/δ i
UN
N+ δ
BH . On inputs from U BH
by the definition of UN N +1/δ , algorithm A outputs ⊥
with a probability of at most 1/(N + 1/δ) < δ. Thus, A0 outputs ⊥ on at
most a δ fraction of the instances obtained from UN BH .
72 11. Average-case completeness
Tiling
A tile is a square with a symbol on each of its four sides. Tiles must not be
rotated or turned over. If we have a set T of tiles, we assume that we have
an infinite number of tiles of any kind in T . A tiling of an n × n square is
an arrangement of n2 tiles that cover the square such that the symbols of
the adjacent sides of the tiles agree. The size of a tile is the length of the
binary representation of its four symbols.
Post correspondence
The Post correspondence problem is one of the better known undecidable
problems. In a restricted variant, it becomes NP-complete. Together with
an appropriate probability distribution, it becomes DistNP-complete.
74
12.1. Flatness and DistNP-complete problems 75
for all n and x. (This might look confusing at first glance since the
strings x0 , z and f (x0 ) are exponentially long, but it is just the domi-
nation condition.)
Now we have E2p(n) (x0 ) = 2−n . Thus, domination implies that
2−n
Dm(2p(n) ) (f (x0 , 2p(n) )) ≥ = 2−s(n) (12.1)
r(2p(n) )
for some polynomial s. Since D is flat, there exists an ε > 0 such that
p(n) ))ε
Dm(2p(n) ) (f (x0 , 2p(n) )) ≤ 2−(m(2 (12.2)
Now we are almost done. Since the images f (x0 , 2p(n) ) are in supp(Dm(2p(n) ) ),
they are polynomially bounded in m(2p(n) ). By (12.3), this quantity is
bounded from above by a polynomial s(n). Hence, A ∈ EXP: (1) x ∈ A if
and only if f (x0 , 2p(n) ) ∈ L. (2) We can compute y = f (x0 , 2p(n) ) in time
p(n)
2q(n) . (3) We can decide whether y ∈ L in time 2p(|y|) ≤ 2p(m(2 )) ≤
2p(s(n)) = 2poly(n) , where the second inequality holds because of (12.3).
Since the uniform distribution U = (Un )n∈N is flat, we immediate get
the following result as a special case.
Corollary 12.3. There is no L ∈ NP such that (L, U) is DistNP-complete
unless NEXP = EXP.
76 12. Average case versus worst case
Lemma 12.5. Let Q = (Qn )n∈N be given by Qn (1n ) = 1. Then, for every
unary language L ⊆ {1}? , we have L ∈ P if and only if (L, Q) ∈ AvgP.
Proof. Clearly, if L ∈ P, then (L, Q) ∈ AvgP. To see that the converse
also holds, consider any algorithm A that witnesses (L, Q) ∈ AvgP. Let t
be the running-time of A. Then we have, for some ε > 0, Ex∼Qn (tε (x, n)) =
O(n). Since supp(Qn ) = {1n }, this is equivalent to t(1n , n) = O(n1/ε ).
Thus, A runs in worst-case polynomial time.
With these two lemmas, the main theorem of this section can be proved
easily.
Theorem 12.6 (Ben-David, Chor, Goldreich, Luby [BDCGL92]). If E 6=
NE, then DistNP 6⊆ AvgP.
12.2. Collapse in exponential time 77
Show that the existence of one-way permutations implies that there are prob-
lems for which search is harder than decision.
78
13.1. Randomized decision algorithms 79
for all a > 0. By symmetry, we have the same bound for Pr(X < E(X) − a).
There are many variants of Chernoff bounds. Sometimes, they lead to slightly
different bounds. For most applications, however, it does not matter which version
we use.
Pr0 A0 (x, n, δ) ∈
/ {L(x), ⊥} ≤ 2−Ω(k(n))
A
and
Pr0 A0 (x, n, δ) = ⊥ ≥ 2−Ω(k(n))
Pr ≤δ.
x∼Dn A
2. For every n, every δ > 0, and every x ∈ supp(Dn ), A(x, n, δ, a(n, δ))
runs in time p(n, 1/δ).
3. For every n and every δ > 0, we have Prx∼Dn (A(x, n, δ, a(n, δ)) =
⊥) ≤ δ.
(One might prefer to define AvgP/poly in terms of circuits rather than Tur-
ing machines that take advice. But we want to have one circuit for each n
and δ, and supp(Dn ) can contain strings of different lengths. This technical
problem can be solved, but why bother?)
Prove that AvgBPP ⊆ AvgP/poly.
13.2. Search algorithms 81
1. For every n and δ > 0, A runs in time p(n, 1/δ) and outputs either a
string w or ⊥.
What does this definition mean? For any x ∈ L, A may output a non-
witness w. According to item (2), this happens with bounded probability.
This is an internal failure of the algorithm A and not due to x being a hard
instance. Item (3) bounds the probability that A outputs ⊥. Intuitively, A
outputs ⊥ not because of internal failure, but because x is a hard instance.
82 13. Decision versus search
querying the above for i ∈ {1, 2, . . . , p(|x|)}, we can find the witness. In the
following, let |x| = n and p = p(n).
Of course, we cannot assume in general that witnesses are unique. But
we know tools to make witnesses unique (recall the Valiant–Vazirani theorem
from the lecture “Computational Complexity Theory” [VV86]). We use a
family H of pairwise independent hash functions h : {0, 1}p → {0, 1}p : H
consists of all functions x 7→ Ax + b, where A ∈ {0, 1}p×p and b ∈ {0, 1}p .
By restricting h ∈ H to the first j bits, we obtain h|j . Also {h|j | h ∈ H} is
a family of pairwise independent hash functions.
Now we consider the language
We build the quadruple hx, h, i, ji such that, for |x| = n, it always has length
q = q(n) for some polynomial q. Furthermore, we make sure that x, h, i
and j are independent. This means that we can equivalently draw x{0, 1}n ,
h ∈ H as well as i, j ∈ {1, . . . , p} uniformly and independently at random
and pair them to hx, h, i, ji. In this way, we get the same distribution. This
can be done since the lengths of x and h is fixed once n is known. (We can,
for instance, assume that p be a power of 2. Then we can write i and j as
binary string of length log p, possibly with leading 0s.)
It can be shown (see again the script of the lecture “Computational
Complexity Theory”) that if j is the logarithm of the number of witnesses
for x, then, with at least constant, positive probability (taken over the choice
of h ∈ H), there is a unique witness w for x that also satisfies h|j (w) = 0.
Now we proceed as follows:
2. If, for some j ∈ {1, . . . , p}, the sequence of answers to the queries
hx, h, 1, ji, . . . , hx, h, p, ji yields a witness w for x ∈ L, then we output
this witness. (Note that hx, wi ∈ V can be checked in polynomial
time.)
Proof. We have already done the lion’s share of the work. It remains to
estimate the failure probabilities. Let L ∈ NP be arbitrary such that wit-
nesses for L have length p for some polynomial p, and let A be a randomized
errorless heuristic scheme for (W 0 , U) with
/ {W 0 (y), ⊥} ≤ β
Pr A(y, q, α) ∈ (13.3)
A
be the set of bad strings. Let us analyze the probability Prx∼Un (x ∈ Z) that
a random x is bad. We have
Pr ∃i, j : Pr A(hx, h, i, ji, q, α) = ⊥ ≥ γ ≥ φ · Pr (x ∈ Z) .
x,h A x∼Un
Thus,
φ · Prx∼Un (x ∈ Z)
Pr Pr A(y, q, α) = ⊥ ≥ γ ≥ .
y=hx,h,i,ji A p2
| {z }
≤α by (13.3)
αp2
From this, we learn Prx∼Un (x ∈ Z) ≤ φ . We want
Pr (x ∈ Z) ≤ δ , (13.6)
x∼Un
αp2
thus we put the constraint φ ≤ δ. For x ∈
/ Z, we have
Pr B(x, n, δ) = ⊥ ≤ φ + (1 − φ)p2 γ .
B
1
Now we choose φ = 1/40 and γ = 40p2
. This also specifies α to α = δφ/p2 ,
αp2
which satisfies our constraint φ ≤ δ This specification of φ and γ yields
1
Pr B(x, n, δ) = ⊥ ≤ . (13.7)
B 20
/ Z. We set c0 = 1/20 < c. Then item (3) of Definition 13.3 follows
for x ∈
from (13.6) and (13.7).
14 Hardness amplification
86
14.1. Impagliazzo’s hard-core set lemma 87
We will prove Yao’s XOR lemma in Section 14.2. There are several
different proofs of this lemma. An elegant (and quite intuitive) proof is
via Impagliazzo’s hard-core set lemma (Section 14.1). This lemma is also
interesting in its own right and a somewhat surprising result. There are at
least two different proofs of this lemma. An elegant (and quite intuitive)
proof is via von Neumann’s min-max theorem. Impagliazzo attributes this
proof to Nisan.
Lemma 14.3. Let f : {0, 1}n → {0, 1} be an (s, δ)-hard function with
respect to the uniform distribution on {0, 1}n , and let ε > 0. Then there is
a probability distribution D on {0, 1}n with the following properties:
−ε2
, 21 − ε -hard with respect to D.
1. f is s · 8·log(εδ)
2. D(x) ≤ 1
δ · 2−n for all x ∈ {0, 1}n .
This is the same as the probability that f (x) = C(x) if x is drawn according
to the following distribution D: First, draw T ∼ D0 . Second, draw x ∈ T
uniformly at random. This probability distribution D is as stated in the
lemma: For each x0 ∈ {0, 1}n , we have Prx∼D (x = x0 ) = PrT ∼D0 (x0 ∈
T ) · 1δ 2−n ≤ 1δ · 2−n . Thus, f is (s, 12 − ε)-hard with respect to D. The lemma
follows for this case since s0 ≤ s.
Now consider the second case. There exists a probability distribution C
on circuits of size s0 such that, for every subset T ⊆ {0, 1}n of cardinality
δ2n , we have
1
Pr (C(x) = f (x)) ≥ + ε ,
x∼T,C∼C 2
which corresponds to an average advantage of 2ε.
Let U be the set of inputs x for which the distribution C on circuits
achieves an advantage of at most ε in computing f . (“Advantage” is gener-
alized to distributions over circuits in the obvious way.)
Proof of Claim 14.4. Assume to the contrary that |U | > δ(1 − ε)2n . If
|U | ≥ δ2n , then U would give rise to a strategy of player D to keep the
payoff to at most 21 (1 + ε), which contradicts the assumption.
Otherwise, consider any set T 0 ⊇ U of cardinality δ2n for which C
achieves the smallest advantage. Since |U | > δ(1 − ε)2n , we have |T 0 \ U | <
εδ2n . Then the advantage of C on T 0 would be smaller than
1 1
· |T 0 ∩ U | · ε + |T 0 \ U | < n · (εδ2n + εδ2n ) = 2ε .
δ2n δ2
This contradicts the average advantage of at least 2ε on this set T 0 (which
was the assumption of the second case).
14.1. Impagliazzo’s hard-core set lemma 89
Now we construct a circuit C̃ of size s that gets f (x) right for more than
a 1 − δ fraction of the inputs, which contradicts the assumption that f is
(s, δ)-hard. The idea is as follows: On U , we might have only little chance
to get the right answer. Thus, we ignore inputs from U . For inputs from
{0, 1}n \ U , however, we have a non-trivial chance of computing the right
answer if we sample circuits according to the distribution C, which is an
optimal strategy for player C. Then we amplify probabilities by sampling
more than one circuit and taking the majority outcome.
More precisely, let us draw t independent random circuits according
to the distribution C. Our new circuit C̃ outputs the majority output
of these t circuits. Fix any x ∈ / U . We can bound the probability that
C̃ gets f (x) wrong using Chernoff bounds: C̃ errs only if at most t/2 of
its subcircuits give a wrong output. The expected number of correct out-
puts is at least 12 (1 + ε). Thus, Chernoff bounds yields an upper bound of
exp(−2(tε/2)2 /t) = exp(−tε2 /2). We set t = −4·log(εδ) ε2
≥ −2·log(εδ/2)
ε2
. This
gives a probabilistic construction of C̃ that errs on only a εδ/2 fraction of
the inputs not from U . (Note that we do not need to construct C̃ explicitly.
Its existence suffices.) Since |U | ≤ δ(1 − ε)2n , C̃ errs with a probability of
at most δ(1 − ε) + εδ/2 ≤ δ for random x ∈ {0, 1}n .
Since C̃ consists of t circuits of size s0 , its size is at most 2ts0 = s. This
contradicts the assumption that f is (s, δ)-hard.
min n
max q T Ap = max min q T Ap .
Ppn∈ [0, 1] q ∈ [0, 1]m q ∈ [0, 1]m Pp ∈ [0, 1]n
P m P m n
j=1 pj = 1 i=1 qi = 1 i=1 qi = 1 j=1 pj = 1
The number minp maxq q T Ap = maxq minp q T Ap is called the value of the game.
90 14. Hardness amplification
The goal of the next lemma is to get a hard-core set from the hard-core
distribution just constructed.
Lemma 14.5. Let D : {0, 1}n → [0, 1] be a probability distribution such that
D(x) ≤ 1δ 2−n for all x ∈ {0, 1}n . Let f : {0, 1}n → {0, 1} be a function such
that f is (s, 12 − 2ε ) with respect to D for 2n < s < 16n
1 n
2 (εδ)2
Then there exists a set H ⊆ {0, 1}n such that f is (s, 21 − ε)-hard with
respect to the uniform distribution on H.
• |H| ≥ δ2n · (1 − 4ε ).
If |H| = δ2n , then we are done. If |H| < δ2n , we add δ2n − |H| arbitrary
elements to H and call the new set again H. No circuit gets more than
( 21 + ε) · δ2n strings of H right. If |H| > δ2n , then we remove |H| − δ2n
elements from H and again call the new set H. Since we only remove
elements, no circuits gets more strings of the new set right that it got for
the old set. Thus, this set also meets our requirements.
From the two lemmas above, the main result of this section follows easily.
η)δ2n such that f is (s · poly(ε, δ, η), 12 − ε)-hard with respect to the uniform
distribution.
Prove this!
Hint: Modify Lemma 14.3, then the rest follows.
This is close to optimal: Assume that there is a circuit C that errs only
with a probability of δ, and consider any set H of cardinality significantly
larger than 2δ2n . Then the probability that C errs on a random input is
significantly smaller than 1/2.
The problem with this proof idea is that a circuit for computing g does
not necessarily proceeds by first computing f (x1 ), . . . , f (xk ). It is allowed
to do anything. Nevertheless this idea can be turned into a proof of the
XOR lemma.
Proof. Let H be a hard-core set for f of cardinality at least δ2n as in
Theorem 14.6. Assume to the contrary that there exists a circuit C of size
2
s0 = s · 100−ε
log(εδ) such that
1
Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + (1 − δ)k + ε . (14.1)
x1 ,...,xk 2
14.2. Yao’s XOR lemma 93
Let D be the uniform distribution over (x1 , . . . , xk ) with xi ∈ {0, 1}n and
conditioned on at least one xi being in the hard-core set H. This yields
Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) (14.2)
(x1 ,...,xk )∼D
≥ Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) − Pr ∃i : xi ∈ H
x1 ,...,xk x1 ,...,xk
1
> +ε.
2
Let us take a different view on the distribution D: First, we pick a non-
empty set T ⊆ {1, . . . , k} with an appropriate distribution. Then, we choose
xi ∈ H for i ∈ T uniformly at random and xi ∈ {0, 1}n \ H for i ∈ / T . Let
the latter distribution be DT . Thus, we can rewrite (14.2) as
1
E Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + ε .
T (x1 ,...,xk )∼DT 2
Fix a set T that maximizes the inner probability. Without loss of generality,
we assume that 1 ∈ T . Then we can further rewrite the probability as
1
E Pr C(x1 , . . . , xk ) = g(x1 , . . . , xk ) > + ε ,
x2 ,...,xk x1 ∼H 2
where, by abusing notation, x1 ∼ H means that x1 is drawn uniformly
at random from H. Now let aj for j > 1 be the assignment for xj that
maximizes the above probability. This yields
1
Pr C(x1 , a2 , . . . , ak ) ⊕ f (a2 ) ⊕ . . . ⊕ f (ak ) = f (x1 ) > + ε ,
x1 ∼H 2
where we have rearranged terms to isolate f (x1 ). To get a circuit for f
from C, we replace x2 , . . . , xk by the constants a2 , . . . , ak . Then we observe
that f (a2 ) ⊕ . . . ⊕ f (ak ) is a constant. Thus, we possibly have to negate the
output of C. This increases the size by at most 1. Thus, we have a circuit
−ε2
C 0 of size s0 + 1 ≤ s · 64 log(εδ) for f that has success probability greater than
1
2 + ε on the hard-core set H for f . This would contradict Theorem 14.6.
Using Exercise 14.2, we can even improve the hardness almost to 21 − ε −
(1 − 2δ)k .
Analogously to Exercise 13.3, we can generalize Heurδ P to non-uniform
circuits of polynomial size in a straight-forward way. In this way, we obtain
Heurδ P/poly.
Corollary 14.8. Let C be a class of Boolean functions with the follow-
Lk property: If f = (fn )n∈N ∈ C, then also g ∈ C with g(x1 , . . . , xk ) =
ing
i=1 f (xi ), where k can be a function of |xi |.
Suppose there exists a family of functions f = (fn )n∈N ∈ C such that
f ∈
/ Heur1/p(n) P/poly. Then, for every constant c > 0, there exists a family
g of functions such that g ∈ C and g ∈ / Heur 1 −n−c P/poly.
2
94 14. Hardness amplification
We can phrase the XOR lemma and Corollary 14.8 also the other way
round: Assume that a class C of functions is closed under ⊗, and suppose
that that every function in C can be computed with a success probability of
at least 21 + ε for some not too small ε > 0. (Say, ε = 1/ poly(n).) Then we
can reduce the error probability to 1/ poly(n). (If this were not the case,
then we would be able to amplify the hardness of 1/ poly(n) to 21 − ε – a
contradiction.) So if we are able to compute a function with a non-trivial
advantage, then we can bring the advantage close to 1. This is closely related
to boosting, which is a concept in computational learning theory. Klivans
and Servedio [KS03] explain the connections between boosting and hard-core
sets.
15 Amplification within NP
Our goal in this section is to show a statement of the form “if there is a
language in NP that is mildly hard on average, then there is a language in
NP that is very hard on average.” Unfortunately, the XOR lemma does not
yield such a result: If L ∈ NP, then it is unclear if computing L(x) ⊕ L(y) is
also in NP. For instance, if L is co-NP-complete, then L(x) ⊕ L(y) can only
be computed in NP if NP = co-NP.
We circumvent this problem by replacing parity by a monotone function
g : {0, 1}k → {0, 1}. If L ∈ NP, then computing g(L(x1 ), . . . , L(xk )) from
x1 , . . . , xk is also in NP.
Exercise 15.1. Prove the above statement. More precisely, prove the fol-
lowing stronger statement:
Assume that NP 6= co-NP. Prove that the following two statements are
equivalent for any function g : {0, 1}k → {0, 1}:
In fact, not the bias of g is the parameter that plays a role, but the
expected bias of g with respect to a random restriction.
95
96 15. Amplification within NP
• Pr(ρ(i) = ?) = δ.
1−δ
• Pr(ρ(i) = 0) = Pr(ρ(i) = 1) = 2 .
Exercise 15.3. Use Theorem 15.3 and Exercise 15.2 to derive a weaker
form of the XOR lemma (Theorem 14.7), which holds only for balanced
functions.
The quantity
NSensδ (h) = 1 − NStabδ (h) = Pr f (x) 6= f (y)
x∼{0,1}m ,y∼N δ (x)
by linearity of expectation.
Since bias? (hρ ) ∈ [0, 1], we have bias? (hρ )2 ≤ bias? (hρ ), which yields the
first inequality. The second inequality follows from Jensen’s inequality since
squaring is a convex function:
p r
NStab?δ (h) = E m (bias? (hρ )2 )
ρ∼P2δ
q
bias? (hρ )2 E m bias? (hρ )
≥ Em =
ρ∼P2δ ρ∼P2δ
= EBias?2δ (h) .
15.3. Recursive majority-of-three and tribes 99
The following lemma will be very useful to compute the noise stability
and noise sensitivity of the functions g that we use to amplify hardness.
Lemma 15.6. Let h : {0, 1}m → {0, 1} be a balanced Boolean function, and
let g : {0, 1}k → {0, 1} be an arbitrary Boolean function. Then
NSensδ (g ⊗ h)
= Pr g h(x1 ), . . . , h(xk ) 6= g h(y1 ), . . . , h(yk ) .
{0, 1}m
x1 , . . . , xk ∈
y1 ∼ Nδ (x1 ), . . . ,
yk ∼ Nδ (xk )
Let zi = h(xi ) and zi0 = h(yi ). Then Pr(zi = 1) = Pr(zi = 0) = 1/2 since h
is balanced. Furthermore, the probability that zi0 6= zi is just NSensδ (h) =
1 − NStabδ (h). Thus,
g(z) 6= g(z 0 )
NSensδ (g ⊗ h) = Pr
z ∈ {0, 1}k
z 0 ∼ NNSensδ (h)
as claimed.
Lemma 15.7. For ` ≥ log1.1 (1/δ), we have NStab?δ (M` ) ≤ δ −1.1 · (3` )−0.15 .
Exercise 15.6. Prove Lemma 15.7. You can also show a slightly weaker
variant: Prove that there exist constants a > 1, b ≥ 1, c > 0 such that for
` ≥ loga (1/δ), we have NStabδ (M` )? ≤ δ −b (3` )−c .
Hint: Calculate NSensδ (M1 ) explicitly. Then use Lemma 15.6. Make a
case distinction whether NSensδ (Mk ) is large or small for k ≤ `.
100 15. Amplification within NP
n/w−1 w
_ ^
Tn (x1 , . . . , xn ) = xiw+j .
i=0 j=1
Lemma 15.8. For every constant η > 0, there is a constant r > 0 such that
EBias1−r (Tn ) ≤ 21 + n−1/2+η .
Exercise 15.7. Let f : {0, 1}n → {0, 1} be a Boolean function. The influ-
ence of the j-th variable is defined as
f (x) 6= f (x(j) ) ,
Ij = Pr
x∼{0,1}n
NSensδ (f ) ≤ δ · I(f ) .
1 1 δ −0.55 −0.075`
EBiasδ (M` ) ≤ + 3 .
2 2 2
This assumes that ` is sufficiently large, which can be ensured by choosing
C large.
Now we observe that
C −0.075
1 δ −0.55 −0.075`
c 0.55 n
3 ≤ (2n ) + n−C ≤ n−0.074C
2 2 3
for large enough (but still constant) C. Finally, n−0.074C ≤ m−0.07 for large
enough C since m = nC+1 .
Using the tribes function, we can further improve the hardness.
Theorem 15.10. Suppose that there is a family f = (fn )n∈N of functions in
NP which is infinitely often balanced and (poly(n), 1/ poly(n))-hard. (This
means that fn is 1/ poly(n)-hard for circuits of polynomial size.) Then there
is a family of functions in NP which is infinitely often (poly(n), 21 −n−1/2+ε )-
hard for any ε > 0.
Exercise 15.9. Prove Theorem 15.10 using the tools Lemma 15.8, Lemma 15.9,
and Theorem 15.3.
At the expense of a small loss in the final hardness, we get even get rid
of the requirement that the initial function is balanced.
Theorem 15.11. Suppose that there is a family of functions in NP which
is infinitely often (poly(n), 1/ poly(n))-hard. Then there is a family of func-
tions in NP which is infinitely often (poly(n), 21 − n−1/3+ε )-hard.
102 15. Amplification within NP
Theorem 15.11’. Suppose that (L, U) ∈ Heur 1 −n−0.33 P/poly for all L ∈
2
NP. Then (L, U) ∈ Heur1/p P/poly for every polynomial p and every language
L ∈ NP.
The problem
It is of course in NL, but the NL-hardness proof for CONN does not work for
undirected G, since the configuration graph of a nondeterministic Turing
machine is a directed graph.
In this chapter, we will show that UCONN can be decided in randomized
logarithmic space. We define RSpace(s(n)) to be the class of all languages
that can be decided by an s(n)-space bounded probabilistic Turing machine
with one-sided error. The Turing machine has a separate input tape and the
space used on the random tape is not counted. In the same way, we define
BPSpace(s(n)), the only difference is that we allow two-sided errors.
RL ⊆ NL.
1. Let v := s.
2. For i := 1 to poly(n) do
103
104 16. RL and undirected connectivity
Pn decompose p = α1̃ + q with q⊥1̃. q⊥1̃ means that hq, 1̃i = 0, that is,
We
i=1 qi = 0. This means that α = 1, since p is a probability distribution.
Thus
Ãt p = Ãt (1̃ + q) = 1 + Ãt q.
We have kpk2 = k1̃k2 + kqk2 , since 1̃ and q are orthogonal. Therefore,
kqk ≤ kpk. Since p is a probability distribution, kpk ≤ kpk1 ≤ 1. Hence,
kÃt p − 1̃k = kÃt qk ≤ λt kqk ≤ λt .
Next we will show that many graphs are “slight” expanders, that is,
λ(G)/d is bounded away from 1 by 1/ poly(n).
105
1
Ãi,j (xj − yi )2 ≥ .
4dn3
Since the sum above only contains nonnegative terms, this will also be a
lower bound for 1P − kyk2 . We sort the nodes (indices) such that x1 ≥ x2 ≥
· · · ≥ xn . Since ni=1 xi = 0, we have x1 ≥ 0 ≥ xn . Because kxk2 = 1,
√ √
x1 ≥ 1/ n or xn ≤ −1/ n. Thus
1
x1 − xn ≥ √ .
n
1
Thus there is an i0 such that xi0 − xi0 +1 ≥ n1.5 . Set U = {1, . . . , i0 } and
Ū = {i0 + 1, . . . , n}. Since G is connected, there is and edge {j, i} with
j ∈ U and i ∈ Ū . Then
Thus
1
kyk2 ≤ 1 −
4dn3
and
1
kyk ≤ 1 − .
8dn3
106 16. RL and undirected connectivity
Since this holds for all y = Ãx with kxk = 1 and x⊥1̃, this is also an upper
bound for λ(G)/d.
Assume G = (V, E) is a connected d-regular graph such that every node
has a self loop. Let p be any probability distribution on V . By Lemmas 16.3
and 16.4, t
1 3 1
t
kà p − 1̃k ≤ 1 − 3
≤ e−t/(8dn ) ≤ 1.5
8dn 2n
for t ≥ 12dn3 ln n + 8dn3 ln 2. Thus
1
kÃt p − 1̃k1 ≤
2n
and (Ãt p)i ≥ 1/n − 1/(2n) = 1/(2n). Thus the probability that we hit
any particular node i is at least 1/(2n). If we repeat this for 2n times, the
probability that we hit i is at least 1 − 1/e ≥ 1/2.
This proves the correctness of the algorithm in the beginning of our
chapter. The input graph G need not to be regular or have self loops at
every node. But we can make it regular with self loops by attaching an
appropriate number of self loops to each node. The degree of the resulting
graph is at most n. This does not change the connectivity properties of
the graph. (In fact, we even do not have to do the preprocessing, since
if a node is hit in the new graph, it is only hit earlier in the old graph.)
Then we apply the analysis above to the connected component that contains
s. If t is in this component, too, then we hit it with probability at least
1/2. Note that instead of restarting the random walk, we can perform one
longer random walk, since the analysis does not make any assumption on
the starting probability except that the mass should be in the component
of s.
17 Explicit constructions of expanders
107
108 17. Explicit constructions of expanders
Matrix product
nodes degree λ̃(G)
G n d λ
Gk n dk λk
à ⊗ Ã0 = .. .. ..
,
. . .
0
an,1 Ã . . . an,n Ã0
where A = (ai,j ).
The new graph has nn0 nodes and its degree is dd0 .
xm y
= λµz.
These are all eigenvalues, since one can show that if x1 , . . . , xm and y1 , . . . , yn
are bases, then xi ⊗ yj , 1 ≤ i ≤ m, 1 ≤ j ≤ n, form a basis, too.
From the lemma, it follows that λ̃(G ⊗ G0 ) = max{λ̃(G), λ̃(G0 )}, since
1 · λ̃(G0 ) and λ̃(G) · 1 are eigenvalues of à ⊗ Ã0 , but the eigenvalue 1 · 1 is
excluded in the definition of λ̃(G ⊗ G0 ).
Tensor product
nodes degree λ̃(G)
G n d λ
G0 n0 d0 λ0
G ⊗ G0 nn0 dd0 max{λ, λ0 }
• For every edge {u, v} of G, there are d parallel edges between node i
in Hu and node j in Hv where v is the ith neighbour of u and u is the
jth neighbour of v.
110 17. Explicit constructions of expanders
We assume that the nodes of H are the number from 1 to D and that the
neighbours of each node of G are ordered. Such an ordering can for instance
be induced by an ordering of the nodes of G.
We can think of G r H of having an inner and an outer structure. The
inner structures are the copies of H and the outer structure is given by G.
For every edge of G, we put d parallel edges into G r H. This ensures that
when we choose a random neighbour of some node v, the probability that we
stay in Hv is the same as the probability that we go to another Hu . In other
words, with probability 1/2, we perform an inner step and with probability
1/2, we perform an outer step. The normalized adjacency matrix of G r H
is given by
1 1
à + I ⊗ B,
2 2
where I is the n × n-identity matrix. The nD × nD-matrix  is defined as
follows: Think of the rows and columns labeled with pairs (v, j), v is a node
of G and j is a node of H. Then there is a 1 in the position ((u, i), (v, j)) if v
is the ith neighbour of u and u is the jth neighbour of v. Â is a permutation
matrix.
Obviously, G r H has nD nodes and it is 2d-regular.
kAxk
kAk = sup = max kAxk.
x6=0 kxk kxk=1
It is the “smallest” norm that is compatible with the given vector norm.
For the Euclidian norm k.k2 on Rn , then induced norm is the so-called spectral
norm, the square root of the largest of the absolute values of the eigenvalues of
AH A. If A is symmetric, then this is just the largest of the absolute values of the
eigenvalues of A. In particular,
λ(G) ≤ kAk2 .
Replacement product
nodes degree λ̃(G)
G n D 1−
H D d 1−δ
GrH nD 2d 1 − δ 2 /24
3. For k ≥ 3, let
Gk := ((Gb k−1 c ⊗ Gd k−1 e )50 ) r H
2 2
1
Theorem 17.4. Every Gk is a 2d-regular (1 − 50 )-expander with (2d)100k
nodes. Furthermore, the mapping
Definition 18.1.
114
18.1. Connectivity in expanders 115
• a lot of other interesting problems are contained in SL, see the com-
pendium by [AG00].
Proof. If a node v in G
For every node v with degree > 3, we identify one of the new nodes of the
cycle with v. Let the resulting graph be G0 . By construction, G0 is cubic
and if there is a path between s and t in G then there is one between in G0
and vice versa.
With a little care, the construction can be done in logarithmic space.
(Recall that the Turing machine has a separate output tape that is write-
only and oneway, so once it decided to output an edge this decision is not
reversible.) We process each node in the order given by the representation
of G. For each node v, we count the number m of its neighbours. If m ≤ 3,
then we just copy the edges containing v to the output tape and output the
additional self loops. If m > 3, then we output the edges {(v, i), (v, i + 1)},
1 ≤ i < m and {(v, m), (v, 1)}. Then we go through all neighbours of v. If
u is the ith neighbour of v, then we determine which neighbour v of u is,
say the jth, and output the edge {(v, i), (u, j)}. (We only need to do this if
v is processed before u because otherwise, the edge is already output.)
Let d be large enough such that there is a d/2-regular 0.01-expander H
with d50 nodes. (Again, the constants are chosen in such a way that the
proof works; they are fairly arbitrary and we have taken them from the
book by Arora and Barak.) We can make our cubic graph G d50 -regular by
adding d50 − 3 self loops per node. Recursively define
G0 := G
Gk := (Gk−1 r H)50 .
2. Gk is d50 -regular,
1 k
3. λ̃(Gk ) ≤ 1 − k , where k = min{ 20 , 8d1.5
50 n3 }.
1 1 1.5k 1
If k = 20 , then λ̃(Gk ) ≤ 1 − 20 . If k = 8d50 n3
< 20 , then
Note that we can view (v, δ1 , . . . , δk ) as a stack and all the recursive calls
operate on the same step. Thus we only have to store one node at a time.
118 18. UCONN ∈ L
Extractors are a useful tool for randomness efficient error probability am-
plification. To define extractors, we first have to be able to measure the
closeness of probability distributions.
Definition 19.1. Let X and Y two random variables with range S. The
statistical difference of X and Y is Diff(X, Y ) = maxT ⊆S | Pr[X ∈ T ] −
Pr[Y ∈ T ]|. X and Y are called -close if Diff(X, Y ) ≤ .
In the same way, we can define the statistical difference of two probability
distributions.
We can think of T as a statistical test which tries to distinguish the
distributions of X and Y . The L1 -distance of X and Y is defined as
X
|X − Y |1 = | Pr[X = s] − Pr[Y = s]|
s∈S
119
120 19. Extractors
and
kÃ` − 1̃k ≤ 2−n/2+k/2−log 1/−2 · 2−k/2+1 ≤ · 2−n/2−1 .
Finally,
One can (quite easily) find AC0 circuits for addition. Multiplication seems a
little harder, but there are constant depth circuits with unbounded fan-in for
multiplication, if we use not only and, or, and not gates, but also threshold
gates. But for a long time, it was not even known how to divide even in
logspace, let alone with constant-depth circuits of polynomial size. This
changed The goal of this and the following section is to develop threshold
circuits for division.
It will sometimes be more convenient to do this in the framework of
logic. Thus, we will show the equivalence of constant-depth circuits to first-
order logic in this section, which has been proved by Barrington and Im-
merman [BI90]. In the next section, we will then show that division can be
performed by constant-depth circuits.
This chapter is far from being a complete introduction to complexity
theory in terms of logic, which is called descriptive complexity. We will only
cover first-order logic with some extensions. For a more detailed introduc-
tion, we refer to Immerman [Imm99].
123
124 20. Circuits and first-order logic
succ(i, j) ≡ ∀k : (k < j) → (k ≤ i) .
(We can compute i − 1 using succ.) But this does not work: C is not
a predicate that we are allowed to use. Instead, it is a placeholder for
something, and we are only allowed to replace it by a first-order formula.
Thus, we replace C(i) by the first-order sentence
We call something like C, which looks like a predicate, will be used like a
predicate, but is no predicate, a pseudo predicate in the following.
In the same way, we can add numbers from {0, 1, . . . , n − 1} using BIT.
We call the corresponding ternary predicate +, written in the usual way
x + y = z.
Just to get a better feeling, assume that our input consists of three parts,
representing n/3-bit numbers a, b, and c. We want to test if a + b = c. First,
we compute m = n/3 and m0 = 2n/3 = 2m, which can be done by
∃m∃m0 : m + m = m0 ∧ m + m0 = n .
Exercise 20.1. Give a first-order sentence for the language COPY = {ww |
w ∈ {0, 1}? }.
Exercise 20.2. Give a first-order sentence for the following variant of par-
ity: n o
Lblog nc
x ∈ {0, 1}? | x = x1 . . . xn ∧ i=1 2 xi = 1 .
(We will soon prove that FO equals AC0 (with appropriate uniformity). We
already know that parity is not in AC0 (not even in non-uniform AC0 ). In-
ε
deed, every constant depth circuit for parity has to be of size 2n for some
constant ε > 0. Why is this not a contradiction?)
Do the proof without the following Lemma 20.2.
Now we encode an array containing the prefix sums within each block
into a single variable. Furthermore, we have a variable that represents an
array containing s1 , s2 , . . .
The number x consists of at most dlog2 ne bits. Let L be the smallest power
of two larger than dlog2 ne. The number L can be expressed as follows:
∃L : Pow2(L) ∧ BIT(L − 1, n) = 1 .
We have to add less than L bits. This is a number with at most dlog2 Le
bits. Let L0 be the smallest power of two larger than dlog2 Le. The number
L0 can be expressed in the same way as L.
The idea is to keep a running-sum: Using one existentially quantified
variable s (which represents L bits), we can guess (roughly) L/L0 numbers
s1 , s2 , . . . , sL/L0 such that si = si−1 + ti , where ti is the number of 1s of
x(i−1)L0 +1 , . . . , xiL0 . Furthermore, s1 = t1 . Given i, the bits of si are the
bits s(i−1)L0 , . . . , siL0 −1 . We can address them since L0 is a power of two.
Thus, for instance, we can add them or compare them. We assume for the
moment that the numbers ti are given. Then we can express this as
∃s : s1 = t1 ∧ ∀i : (i > 1) → si = ti + si−1 .
We call the kth partial sums tkj . We know how to address since L0 is a power
of two:
(We have used + with a Boolean value BIT(), but it should be clear how to
interpret this.)
Noting that we can deal with not-large-enough values of n by hard-wiring
the results completes the proof.
Remark 20.4. Instead of BIT and ≤, we can directly use + and ×. This
is equivalent: We can implement BIT using + and ×, and we have already
seen that BIT suffices to implement + and ×. Thus, FO = FO[≤, BIT] =
FO[+, ×].
Hx : P (x)
≡ ∃x0 : Mx : P (x) ∨ x = x0 ∧ ¬ Mx : P (x) .
2. S(x, y)P (x): “There are exactly y values of x with x > bn/2c and
P (x).”
Lemma 20.5. ITADD, and hence also MULT, can be expressed in FOM.
20.3 Uniformity
The circuit complexity classes NCi and ACi all come in different flavors
corresponding to different conditions on uniformity.
Recall: A family C = (Cn )n∈N of circuits is called polynomial-time uni-
form, if the mapping 1n 7→ Cn (where we identify Cn and an appropriate
encoding of Cn ) can be computed in polynomial time. C is log-space uniform
if the mapping can be computed in logarithmic space.
From the two uniformity conditions, we get ptime-u ACi and ptime-u NCi
as well as logspace-u ACi and logspace-u NCi . However, both uniformity
conditions have drawbacks if we want to analyze subclasses of L or NC1 .
The reason is, for instance, for ptime-uAC0 , constructing the circuit can be
much harder than evaluating it. Thus, in order to study subclasses of NC1 ,
we need a more restrictive variant of uniformity.
It turns out that a good choice is DLOGTIME uniformity:
Proof. We only prove the first statement. The second follows with
an almost identical proof, where we have to replace the quantifier M by
threshold gates and vice versa. (Since threshold gates can take more than
n inputs, we need Lemma 20.6 for TC0 = FOM.)
Let us first prove FO ⊆ AC0 . Let ϕ be any first-order formula of quantifier
depth d. Without loss of generality, we can assume that ϕ is in prenex
normal form. This means that ϕ = Q1 y1 Q2 y2 . . . Qd yd : ψ(y1 , . . . , yd ), where
ψ is quantifier-free and Q1 , . . . , Qd are any quantifiers.
For such a ϕ, there is a canonical constant-depth circuit Cn for every n.
A tree of fan-out n and depth d corresponds to the quantifiers. At each leaf,
there is a constant-size circuit corresponding to ψ(y1 , . . . , yd ). This circuit
consists of Boolean operators, input nodes, and constants corresponding to
the value of atomic formulas (=, ≤, and BIT), where the constants depend
on y1 , . . . , yd .
What remains to be done is to show that this canonical circuit family is
indeed DLOGTIME uniform. The address of a node will consist of log n bits
for each quantifier (this is needed to specify the respective value of yi ) as well
as a constant number of bits specifying which node of the respective copy
of the constant-size circuit we are considering. In order to answer queries
for the connection language, our DLOGTIME machine has to be able to (i)
compare O(log n) bit numbers and do arithmetic with them (like dividing
them into their parts for the several quantifiers) and (ii) compute from the
numbers y1 , . . . , yd to which input nodes the respective constant-size circuit
has to be connected. The latter is possible because the DLOGTIME machine
can perform BIT and all the other operations on O(log n) bit numbers that
a first-order formula can do. It is not difficult but tedious to work out the
details.
To prove AC0 ⊆ FO, we first observe that the connection language is in
FO by Lemma 20.9. Let C = (Cn )n∈N be a DLOGTIME uniform family
of constant-depth, polynomial-size circuits. Since Cn is of polynomial size,
we can refer to its nodes by tuples of (a constant number of) variables. In
order to devise a first-order formula for Cn , we will express the predicate
AccGate(a), which is true if and only if gate a accepts, i.e., outputs 1. If we
have AccGate, then we just have to evaluate it for the output gate.
To get AccGate, we define inductively predicates AccGated (a) with the
meaning “gate a on level d outputs 1”. For level 0, AccGate0 (a) is true if
and only if (i) a is a number of a gate on level 0 and (ii) a is connected to
some input xi = 1.
To express AccGated (a), we have to evaluate if a is a gate on level d
in the first place. Since d is constant, this can be expressed. If gate a is
indeed on level d, then we proceed as follows. If a is a NOT gate, then
AccGated (a) = ¬ AccGated−1 (b) for some gate b on level d − 1. (We can
easily find out which gate b we need using the connection language, which is
in FO by assumption.) If a is an AND gate, we use a ∀b : ξ(b) to range over
20.4. Circuits vs. logic 133
all other gates b. The expression ξ(b) is true if and only if (i) b is not a gate
on level d − 1, or (ii) b is not a predecessor of a, or (iii) b is a predecessor of
a and AccGated−1 (b) = 1.
For addition and multiplication, and also for subtraction, it is not too hard
to come up with AC0 circuits. But division seems to be much harder. It
is fairly easy to see that division can be done in polynomial time. But for
a long time, it was unknown if division is possible in logarithmic space. In
the 1980s, it has been shown that there are polynomial-time uniform TC0
circuits. (We will define below, what TC0 means.) However, even this does
not prove that division can be done in logarithmic space. This was shown
by Chiu et al. [CDL01], who proved that division lies in log-space uniform
TC0 (which is a subclass of L). Finally, it was shown that division is in
DLOGTIME uniform TC0 , which is optimal: the problem is complete for
DLOGTIME uniform TC0 . (We will also define below what DLOGTIME
uniform means. Let us just remark that it is even weaker than log-space
uniform.)
The goal of this chapter is to prove that division is in DLOGTIME
uniform TC0 = FOM.
We will do so in three steps: First, we reduce division to iterated multi-
plication. Second, we will introduce a new predicate POW (see below). Let
us write FOMP = FOM[BIT, <, POW] and FOP = FO[BIT, <, POW] for
short. We will show that division can be described in FOMP. This places
division in L since FOM = TC0 ⊆ L and POW can easily be seen to be in L.
Third, we will show that in fact POW can also be expressed in FOM, which
places division in FOM = TC0 .
In the remainder of this chapter, variables in capital letters, such as X
and Y , denote numbers of polynomial lengths. We call them also long num-
bers. Small letters represent short numbers, which are of length O(log n).
If there are numbers of length poly(log n), we will mention their lengths
explicitly.
134
21.2. Division in FOM + POW 135
If we want to compute X/Y and have 1/Y with sufficient precision, then
division reduces to multiplication. And we already know how to multiply in
TC0 . Now observe that
∞
X 1
(1 − α)i =
α
i=0
for α ∈ (0, 1). If we assume further that α ∈ [1/2, 1), then we have
n
1 X
= (1 − α)i + O(2−n ) .
α
i=0
Now let j = dlog2 Y e be roughly the number of bits of Y . Then use 2−j Y ∈
[ 12 , 1) as α in the preceding equation. This yields
n
X X
2nj · =X· (1 − 2−j Y )i + O(X · 2nj−n )
Y
i=0
n
X
=X· (2j − Y )i · (2j )n−i + O(X · 2nj−n ) . (21.1)
i=0
This is equivalent to
n
X X
=X· (2j − Y )i · 2−ij + O(X · 2−n ) .
Y
i=0
Even more,
k
X
X= xi hi Ci − rM
i=1
for some number r = rankM (X), called the rank of X with respect to M .
Note that r is a short number, equal to the sum of the integer parts of
xi hi Ci /M = xi hi /mi , which is in {0, 1, . . . , mi − 1}.
What does CRR help? We have reduced iterated multiplication to iter-
ated multiplication of short numbers, which is considerably easier.
The algorithm for iterated multiplication is now easy to describe:
POW(a, i, b, p) ≡ ai ≡ b (mod p) .
Proof. For each mi and each j < n, we can compute 2j mod mi using
POW. In this way, we obtain values yi,j ∈ {0, 1, . . . , mi − 1} for 1 ≤ i ≤ k
and 0 ≤ j < n. Then we add yi,1 +. . .+yi,n−1 using iterated addition (which
is in FOM) and take the sum modulo mi to obtain xi .
In the lemma above, the prime numbers m1 , . . . , mk are given. You
might wonder how we actually get them since, of course, they are not part
of the input. This is not very difficult, but we will nevertheless deal with it
later on.
Pk
Recall that X = i=1 xi hi Ci − rankM (X)M and that Ci = M/mi .
Thus,
k
X X xi hi
= − rankM (x) .
M mi
i=1
The numbers x1 , . . . , xk are given to us since CRRM (X) is part of the in-
put. The numbers C1 , . . . , Ck can be computed in FOMP: For Ci , we add
the discrete logarithms of mj for j 6= i. And h1 , . . . , hk are the inverse of
C1 , . . . , Ck , respectively, which can also easily be computed in FOMP.
By Lemma 21.2, we can compute each summand can be computed to
polynomially many bits of accuracy. We know that iterated addition is in
FOM, thus we can compute polynomially many bits of the binary represen-
tation of
k
X xi hi X
= + rankM (X) .
mi M
i=1
Since rankM (X) is an integer, X/M is just the fractional part of this sum,
of which we have sufficiently many bits.
A useful consequence of us being able to compare numbers in CRR is that
it allows
Q us to change the CRR basis: If we have primes primes p1 , . . . , p`
with `i=1 pi = P , then we can get CRRP (X) from CRRM (X). The crucial
ingredient for this is that, given CRRM (X) we can compute X mod p for a
short prime p.
Lemma 21.4. Given CRRM (X) and a short prime p,, we can compute
X mod p in FOMP.
Proof. If p = mi for some i, then we know the answer from the input.
Thus, we can assume that p does not divide M . Let P = M p. If we can
compute CRRP (X), then this gives us X mod p.
We turn to brute-force: We try all p ≤ poly(n) possible values q for
X mod p. This gives us the CRRP of numbers X0 , . . . , Xp−1 . One of these
numbers is X. Moreover, X is the only number among X0 , . . . , Xp−1 that
is smaller than M . (This follows that numbers smaller than M differ in
CRRM and all X0 , . . . , Xp−1 have a unique representation in CRRP .)
We can compute CRRP (M ) by adding the discrete logarithms of the
primes m1 , . . . , mk modulo p. We carry out comparisons with M in CRRP .
Thus, we can compute X mod p by finding the unique Xi that is smaller
than M . All of this can be done in FOMP.
The last lemma towards implementing the third step of our division
algorithm is dividing by products of short primes.
Theorem 21.6. Given CRRM (X), with 0 ≤ X < M , we can compute the
binary representation of X in FOMP.
Proof. It suffices to compute bX/2s c for any s. Then the sth bit of X
is given as bX/2s c − 2 · bX/2s+1 c. We get this number in CRRM , but it is
easy to distinguish 0 from 1, even in CRR.
First, we create numbers A1 , . . . , As . Each Aj is the product of polyno-
mially many short distinct primes that do not divide M , and want Aj > M .
Recall that M = ki=1 mi for short primes. Let m1 < m2 < . . . < mk
Q
be the first k odd primes. (We can assume this without loss of generality.)
Then we set Aj = ki=1 pjk+1 , where p` is the `th smallest prime number.
Q
The prime number theorem guarantees that there are enough short primes
for our purposes, and these Aj fulfill our conditions. Furthermore, a list of
all (short) primes smaller than poly(n) can easily be computed by a TC0
circuit, this also in FOM. Thus, we know how to get these primes.
1+A
Assume that the Aj are very large. Then 2Ajj ≈ 12 . Thus, X/2s ≈
1+A
X sj=1 2Ajj . It might look as if we are complicating the problem, but it
Q
turns out that, on the one hand, this quantity involving the Aj s is easier to
s
compute and, on the Qs other hand, it is precise enough to give us bX/2 c.
Let P = M · j=1 Aj . We extend the basis to get CRRP (X). Since
M < Aj for all j, we have
s
1 s
Y 1 + Aj
< 1+ .
Aj M
j=1
140 21. Threshold circuits for division
By (21.3), we have
Qs 1+Aj
2s
j=1 2 X s+1 X X
X · Qs < s · 1+ < s · 1+ < s +1.
j=1 Aj 2 M 2 X 2
Exercise 21.1. Using Lemma 21.1 and Theorem 21.6, one can convert in
FOMP numbers from any base to any other base.
Prove this!
Corollary 21.7. Division, iterated multiplication, and powering can be ex-
pressed in FOMP.
21.3 POW is in FO
21.3.1 Two special cases in FO
The first step towards proving POW ∈ FO will be to show POW as well as
division and iterated multiplication of very short numbers can be performed
in FO. We start by showing that this is true for POW.
Lemma 21.8. POW(a, r, b, p), where a, r, b, and p have O(log log n) bits
each, is in FO.
Proof. Let us assume that a, b, p, and r have k log log n bits each. We
can compute ar mod b in FO by using repeated squaring. To do this, we
consider the sequence r0 , r1 , . . . , rk log log n of exponents with ri = br/2i c.
Thus, r0 = r and rk log log n = 0. Moreover, ri = 2ri+1 or ri = 2ri+1 + 1
depending on the corresponding bit of r.
21.3. POW is in FO 141
Theorem 21.9. ItMult and Division, where the inputs have (log n)O(1) bits,
are in FO.
Proof. We know from Corollary 21.7 that division and iterated mul-
tiplication with inputs of length r can be done in FOMP over the universe
0, . . . , r −1. We set r = (log n)k . Then Division and ItMult can be expressed
in FOMP over the universe 0, . . . , (log n)k −1. We will show that such FOMP
formulas can be expressed in FO over the universe 0, 1, . . . , n − 1.
Note that all uses of POW in these formulas are called with inputs
of O(log(log n)k ) = O(log log n) bits. Thus, we can replace POW by FO
formulas according to Lemma 21.8. In the same way, the threshold quantifier
can be replaced by a FO formula since the range of the quantified variables
is 0, . . . , (log n)k − 1. This is because BSUM can be expressed in FO as
long as there are at most (log n)O(1) ones to count, which is the result of
Exercise 21.2, which completes the proof.
Remark 21.10. Beyond being a tool for showing that division is in FOM,
this theorem is also interesting in its own right: It gives tight bounds for
the size of the numbers for which Division and ItMult are in FO: One the
one hand, the theorem shows that this is the case for numbers consisting of
(log n)O(1) bits. On the other hand, we have FO = AC0 . And any circuit √
2d
of constant depth d that computes parity of m √bits must be of size 2Ω( m) .
2d
For parity of m bits being in FO, we need 2Ω( m) ≤ poly(n), which implies
m ≤ poly(log n).
21.3.2 POW is in FO
What remains to be done is to show that POW is in FO. In order to prove
this, we first show something slightly more general: powering in groups of
142 21. Threshold circuits for division
Exercise 21.3. Show that for any group that can be represented in FO, the
inverse and the neutral element can be defined in FO.
Lemma 21.11. Finding small powers in any group of order n, i.e., comput-
ing ar for a group element a and a small number r, is FO Turing reducible
to finding products of log n elements.
2(log n)2 .
Given these elements and numbers, we can easily compute ar : aui i for
1 ≤ i ≤ k as well as au amounts to computing products of small numbers of
group elements: For aui i , this follows from ui < 2 log n. And for au , we use
two rounds of multiplying at most 2 log n elements. The result ar is then
also just a product of k + 1 numbers.
We will choose the group elements ai to be d-th roots of unity for a small
prime d. The numbers ui can then be computed using Chinese remaindering.
Our first step consists of finding a CRR basis D consisting of primes, each
of which is at most O(log n). More precisely, we choose a set k = o(log n)
primes d1 , . . . , dk such that di < 2 log n for each
Qk i and each di is relatively
prime to n. Furthermore, we want n < D = i=1 di < n . We can compute2
d−1
i d−1
x = xmn+1 = xdi = a−ni i .
Note that we can compute xdi using multiplication of O(log n) numbers, but
we cannot compute abn/di c directly since it might happen that d−1
i is not
O(log n).
Our third step consists of finding the exponents u, u1 , . . . , uk . By the
choice of the ai in the second step, we have
Pk
au1 1 · . . . · auk k = a i=1 ui bn/di c
.
k
X n
u≡r− ui · (mod n) . (21.4)
di
i=1
In order to get a small value for u, we have to choose ki=1 ui · b dni c close
P
to r. To achieve this, we approximate r as a linear combination of bn/di c:
Compute f = brD/nc. (We can compute this since r has only O(log n) bits
by Theorem 21.9.) Let Di = D/di . Then we compute ui = f Di−1 mod di .
This gives us
k
X
ui Di ≡ f (mod D) .
i=1
This yields
Xk
n rD rD n n
u= − + ui · − .
D n n di di
i=1
For any number x, we have x − bxc ∈ [0, 1). Furthermore, n/D < 1 by
our choice of D, ui < 2 log n for each i, and k = o(log n). Thus, we have
u < 2(log n)2 , which finishes the proof.
Now we note that, first, FO is closed under polynomial changes of the
input size and, second, the product of log(nk ) = k log n groups elements is
FO Turing reducible to the product of log n groups elements. This yields
that finding powers in any group of order nk is FO Turing reducible to finding
the product of log n elements.
We now apply the above result that powering reduces to iterated mul-
tiplication to the groups of integers modulo p for a prime p = O(nk ). The
multiplicative group Z?p contains the integers 1, . . . , p − 1. Multiplication in
Z?p is FO-definable since multiplication is FO-definable.
For evaluating POW(a, r, b, p), we proceed now as follows: If a = 0, then
we just have to check whether also b = 0. Otherwise, we can find ar in Z?p ,
provided that the product of log n group elements can be computed with
inputs of size log2 n. However, this can be done according to Theorem 21.9.
This immediately yields the main results of this section and of this chapter.
[ALM+ 98] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan,
and Mario Szegedy. Proof verification and hardness of approx-
imation problems. J. ACM, 45(3):501–555, 1998.
[BGS98] Mihir Bellare, Oded Goldreich, and Madhu Sudan. Free bits,
PCPs, and nonapproximability—towards tight results. SIAM
J. Comput, 27(3):804–915, 1998.
145
146 BIBLIOGRAPHY