Two recent papers by Cui Peng

Cui Peng of Renmin University in Beijing has recently released two preprints, one claiming a proof of P=NP and one claiming a refutation of the Unique Games Conjecture; I will call them the “NP paper” and the “UG paper,” respectively.

Of all the papers I have seen claiming a resolution of the P versus NP problem, and, believe me, I have seen a lot of them, these are by far the most legit. On Scott Aronson’s checklist of signs that a claimed mathematical breakthrough is wrong, they score only two.

Unfortunately, both papers violate known impossibility results.

The two papers follow a similar approach: a certain constraint satisfaction problem is proved to be approximation resistant (under the assumption that P{\neq}NP, or under the UGC, depending on the paper) and then a Semidefinite Programming approximation algorithm is developed that breaks approximation resistance. (Recall that a constraint satisfaction problem is approximation resistant if there is no polynomial time algorithm that has a worst-case approximation ratio better than the algorithm that picks a random assignment.)

In both papers, the approximation algorithm is by Hast, and it is based on a semidefinite programming relaxation studied by Charikar and Wirth.

The reason why the results cannot be correct is that, in both cases, if the hardness result is correct, then it implies an integrality gap for the Charikar-Wirth relaxation, which makes it unsuitable to break the approximation resistance as claimed.

Suppose that we have a constraint satisfaction problem in which every constraint is satisfied by a {p} fraction of assignment. Then for such a problem to not be approximation resistant, we have to devise an algorithm that, for some fixed positive {\delta>0}, returns a solution whose cost (the number of constraints that it satisfies) is at least {p+\delta} times the optimum. The analysis of such an algorithm needs to include some technique to prove upper bounds for the true optimum; this is because if you are given an instance in which the optimum satisfies at most a {p+o(1)} fraction of constraints, as is the case for a random instance, then the algorithm will satisfy at most a {p+o(1)} fraction of constraints, but then the execution of the algorithm and the proof of correctness will give a (polynomial-time computable and polynomial-time checkable) certificate that the optimum satisfies at most a {(p+o(1))/(p+\delta) < 1 - \delta + o(1)} fraction of constraints.

For algorithms that are based on relaxations, such certificates came from the relaxation itself: one shows that the algorithm satisfies a number of constraints that is at least {p+\delta} times the optimum of the relaxation, and the optimum of the relaxation is at least the optimum of the constraint satisfaction problem. But if there are instances for which the optimum is {p+o(1)} and the optimum of the relaxation is {1-o(1)}, then one cannot use such a relaxation to design an algorithm that breaks approximation-resistance. (Because on, such instances, the algorithm will not be able to satisfy a number of constraint equal to {p+\delta} times the optimum of the relaxation.)

In the UG paper, the approximation resistance relies on a result of Austrin and Håstad. Like all UGC-based inapproximability results that I am aware of, the hardness results of Austrin and Håstad are based on a long code test. A major result of Raghavendra is that for every constraint satisfaction problem one can write a certain SDP relaxation such that the integrality gap of the relaxation is equal to the ratio between soundness and completeness in the best possible long code test that uses predicates from the constraint satisfaction problem. In particular, in Section 7.7 of his thesis, Prasad shows that if you have a long code test with soundness {c} and completeness {s} for a constraint satisfaction problem, then for every {\epsilon > 0} there is an instance of the problem in which no solution satisfies more than {s+\epsilon} fraction of constraints, but there is a feasible SDP solution whose cost is at least a {c-\epsilon} fraction of the number of constraints. The SDP relaxation of Charikar and Wirth is the same as the one studied by Prasad. This means that if you prove, via a long code test, that a certain problem is approximation resistant, then you also show that the SDP relaxation of Charikar and Wirth cannot be used to break approximation resistance.

The NP paper adopts a technique introduced by Siu On Chan to prove inapproximability results by starting from a version of the PCP theorem and then applying a “hardness amplification” reduction. Tulsiani proves that if one proves a hardness-of-approximation result via a “local” approximation-reduction from Max 3LIN, then the hardness-of-approximation result is matched by an integrality gap for Lasserre SDP relaxations up to a super-constant number of rounds. The technical sense in which the reduction has to be “local” is as follows. A reduction from Max 3LIN (the same holds for other problems, but we focus on starting from Max 3LIN for concreteness) to another constraint satisfaction problems has two parameters: a “completeness” parameter {c} and a “soundness” parameter {s}, and its properties are that:

  • (Completeness Condition) the reduction maps instances of 3LIN in which the optimum is {1-o(1)} to instances of the target problem in which the optimum is at least {c-o(1)};
  • (Soundness Condition) the reduction maps instances of 3LIN in which the optimum is {1/2 +o(1)} to instances of the target problem in which the optimum is at most {s+o(1)}.

Since we know that it’s NP-hard to distinguish Max 3LIN instances in which the optimum is {1-o(1)} from instances in which the optimum is {1/2 +o(1)}, such a reduction shows that, in the target problem, it is NP-hard to distinguish instances in which the optimum is {c-o(1)} from instances in which the optimum is {s+o(1)}. The locality condition studied by Tulsiani is that the Completeness Condition is established by describing a mapping from solutions satisfying a {1-o(1)} fractions of the Max 3LIN constraints to solutions satisfying a {c-o(1)} fraction of the target problem constraints, and the assignment to each variable of the target problem can be computed by looking at a sublinear (in the size of the Max 3LIN instance) number of Max 3LIN variables. Reductions that follows the Chan methodology are local in the above sense. This means that if one proves that a problem is approximation-resistant using the Chan methodology starting from the PCP theorem, then one has a local reduction from Max 3LIN to the problem with completeness {1-o(1)} and soundness {p+o(1)}, where, as before, {p} is the fraction of constraints of the target problem satisfied by a random assignment. In turn, this implies that not just the Charikar-Wirth relaxation, but that, for all relaxations obtained in a constant number of rounds of Lasserre relaxations, there are instances of the target problem that have optimum {p+o(1)} and SDP optimum {1-o(1)}, so that the approximation resistance cannot be broken using such SDP relaxations.

On the Unique Games Conjecture (part 2)

[I am preparing a survey talk on Unique Games for a mathematics conference, and writing a survey paper for a booklet that will be distributed at the conference. My first instinct was to write a one-line paper that would simply refer to Subhash’s own excellent survey paper. Fearing that I might come off as lazy, I am instead writing my own paper. Here are some fragments. Comments are very welcome.]

1. Why is the Unique Games Conjecture Useful

In a previous post we stated the Unique Games Conjecture and made the following informal claim, here rephrased in abbreviated form:

To reduce Label Cover to a graph optimization problem like Max Cut, we map variables to collections of vertices and we map equations to collections of edges; then we show how to “encode” assignments to variables as 2-colorings of vertices which cut a {\geq c_1} fraction of edges, and finally (this is the hardest part of the argument) we show that given a 2-coloring that cuts a {\geq c_2} fraction of edges, then

  1. the given 2-coloring must be somewhat “close” to a 2-coloring coming from the encoding of an assignment and
  2. if we “decode” the given 2-coloring to an assignment to the variables, such an assignment satisfies a noticeable fraction of equations.

Starting our reduction from a Unique Game instead of a Label Cover problem, we only need to prove (1) above, and (2) more or less follows for free.

To verify this claim, we “axiomatize” the properties of a reduction that only achieves (1): we describe a reduction mapping a single variable to a graph, such that assignments to the variable are mapped to good cuts, and somewhat good cuts can be mapped back to assignments to the variable. The reader can then go back to our analysis of the Max Cut inapproximability proof in the previous post, and see that the properties below are sufficient to implement the reduction.

Continue reading

Survey on Hardness of Approximation

In 2004 I wrote a survey on hardness of approximation as a book chapter for a book on approximation algorithm. I have just prepared a revised version for a new edition of the book.

While it would have been preferable to rethink the organization and start from scratch, because of time constraints I was only able to add sections on integrality gaps and on unique games, and to add references to more recent work (e.g. the combinatorial proof of the PCP theorem, the new 2-query PCPs, the tight results on minimizing congestion in directed networks, and so on).

If your (or your friend’s) important results are not cited, do let me know. The deadline to submit the chapter has long passed, but the threats from the editor haven’t yet escalated to the point where I feel that I have to submit it or else.

How Good is the Leighton-Rao Relaxation?

Continuing our seemingly never-ending series on approximations to edge expansion, we come to the question of how good is the Leighton-Rao relaxation.

Given a graph G=(V,E), we are trying to approximate the sparsest cut problem (which, in turn, approximates the edge expansion problem within a factor of two) defined as

\displaystyle \phi(G) := \min_{S\subseteq V} \frac {edges(S,V-S)} {\frac 1n |S| \cdot |V-S|}

And we have seen that an equivalent characterization of sparsest cut is:

(1) \ \ \ \displaystyle \phi(G) = \min_{x\in {\mathbb R}^V}  \frac{ \sum_{i,j} A(i,j) | x_i - x_j | } {\frac 1n \sum_{i,j} |x_i - x_j | }

In the Leighton-Rao relaxation, we have

(2) \ \ \ \displaystyle LR(G) := \min_{d\in {\cal M}(V)} \frac{ \sum_{i,j} A(i,j) d(i,j) }{\frac 1n \sum_{i,j} d(i,j)}

where {\cal M}(V) is the set of all semi-metrics d: V \times V \rightarrow {\mathbb R}, that is, of all functions such that d(i,j) = d(j,i) \geq 0, d(i,i)=0, and that obey the “triangle inequality”

d(i,j) \leq d(i,k) + d(k,j)

for all i,j,k \in V.

This is clearly a relaxation of the sparsest cut problem, because for every x \in {\mathbb R}^V the “distances” |x_i - x_j| define indeed a semi-metric over V.

How bad can the relaxation be? Take a family of regular constant-degree expander graphs, that is graphs where \phi is a constant and the degree k is a constant independent of n=|V|. Define d(i,j) to be the shortest path distance between i and j in the graph. This is clearly a semi-metric, and we have

\displaystyle \sum_{i,j} A(i,j) d(i,j) = kn = O(n)

because along every edge the distance is precisely one, and

\displaystyle \sum_{i,j} d(i,j) = \Omega (n^2 \log_k n)

because, for every vertex i, at most k^t other vertices can be within
distance t, and so at least half the vertices are at distance \Omega( \log_k n).

So we have

\displaystyle LR(G) = O \left( \frac 1 {\log n} \right)

even though \phi(G) = \Omega(1), so the error in the approximation can be as bad as a factor of \log n. We shall prove that this is as bad as it gets.

Our task is then: given a feasible solution to (2), that is, a distance function defined on the vertices of G, find a feasible solution to sparsest cut whose cost is as small as possible, and no more than a O(\log n) factor off from the cost of the solution to (2) that we started from. Instead of directly looking for a cut, we shall look for a solution to (1), which is a more flexible formulation.

The general strategy will be, given d, to find a distribution X over vectors x \in {\mathbb R}^V such that for every two vertices i,j we have

(3) \ \ \ \displaystyle \Omega\left( \frac {1}{\log n} \right) \cdot d(i,j) \leq {\mathbb E} | X_i - X_j | \leq d(i,j)

Suppose that we start from a solution to (2) of cost c, that is, we have a semimetric d(\cdot,\cdot) such that

\displaystyle \sum_{i,j} A_{i,j} d(i,j) = c \cdot \frac 1n \sum_{i,j} d(i,j)

Then our distribution X is such that

\displaystyle {\mathbb E}  \sum_{i,j} A_{i,j} |X_i - X_j| \leq \sum_{i,j} A_{i,j} d(i,j)

and

\displaystyle \sum_{i,j} d(i,j) \leq O \left(  {\log n} \right) {\mathbb E} \sum_{i,j}|X_i - X_j|

so

\displaystyle {\mathbb E}  \left[ \sum_{i,j} A_{i,j} |X_i - X_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |X_i - X_j| \right] \leq 0

and there must exist a vector x in the support of the distribution X such that

\displaystyle \sum_{i,j} A_{i,j} |x_i - x_j| - O(c\log n) \cdot \frac 1n \sum_{i,j} |x_i - x_j|  \leq 0

and now we are done because the cost of x in (1) is at most O(\log n) times the optimum of the Leighton-Rao relaxation.

Though this works, it seems an overkill to require condition (3) for all pairs of vertices. It seems sufficient to require

{\mathbb E} |X_i-X_j| \leq d(i,j)

only for edges, and to require

d(i,j) \leq O(\log n) \cdot  {\mathbb E} |X_i-X_j|

only “on average” over all pairs of vertices. It can be shown, however, that if one wants to round a generalization of the Leighton-Rao relaxation to sparsest cut with “non-uniform demands,” then any rounding algorithm can be turned into a rounding algorithm that satisfies (3).

So how do we start from an arbitrary metric and come up with a probabilistic linear order of the vertices so that their distances in the original metric are well approximated in the linear ordering? We will describe a method of Bourgain, whose applicability in this setting is due Linial, London and Rabinovich.

The starting point is to see that if A is a set of vertices, and we define

\displaystyle X_i := d(A,i) :=  \min_{a\in A} d(a,i)

then, no matter how we choose A, we always have

|X_i - X_j | = |d(A,i) - d(A,j) | \leq d(i,j)

[Hint: use the triangle inequality to see that d(A,j) \leq d(A,i) + d(i,j).]

This means that “all we have to do” is to find a distribution over sets A such that for every i,j we have

(3) \ \ \ \displaystyle {\mathbb E} | d(A,i) - d(A,j) | \geq \Omega \left( \frac 1 {\log n} \right) \cdot d(i,j)

If you want to make it sound more high-brow, you may call such a distribution a Frechet embedding of d into \ell_1. Constructing such a distribution will be the subject of the next post.

The Limitations of the Spectral Partitioning Algorithm

We continue to talk about the problem of estimating the expansion of a graph G=(V,E), focusing on the closely related sparsest cut, defined as

\displaystyle \phi(G):= \min_{S} \frac {edges(S,V-S) } { \frac 1n \cdot |S| \cdot |V-S| }

The spectral paritioning algorithm first finds a vector x \in {\mathbb R}^n minimizing

(1) \ \ \  \displaystyle \frac{ \sum_{i,j} A(i,j) |x(i)-x(j)|^2 } { \frac 1n \cdot \sum_{i,j} |x(i)-x(j)|^2}

(where A is the adjacency matrix of G) and then finds the best cut (S,V-S) where S is of the form \{ i \in V: \ x(i) \geq t \} for a threshold t.

We proved that if the quantity in (1) is \epsilon and G is d-regular, then the algorithm will find a cut of sparsity at most \sqrt{8 \epsilon d}, and that if x is the eigenvector of the second eigenvalue, then it is an optimal solution to (1), and the cost of an optimal solution to (1) is a lower bound to \phi(G). This means that the algorithm finds a cut of sparsity at most \sqrt{8 \phi(G) d}.

Can the analysis be improved? Continue reading

The limitations of linear and semidefinite programming

The fact that Linear Programming (LP) can be solved in polynomial time (and, also, efficiently in practice) and that it has such a rich geometric theory and such remarkable expressive power makes LP a powerful unifying concept in the theory of algorithms. It “explains” the existence of polynomial time algorithms for problems such as Shortest Paths and Min Cut, and if one thinks of the combinatorial algorithms for such problems as algorithms for the corresponding linear programs, one gains additional insights.

When looking for algorithms for a new combinatorial problem, a possible approach is to express the problem as a 0/1 integer program, then relax it to a linear program, by letting variables range between 0 and 1, and then hope for the best. “The best” being the lucky event that the value of the optimum of the relaxation is the same as that of the combinatorial program, or at least a close approximation. If one finds that, instead, the relaxation has optimal fractional solutions of cost very different from the combinatorial optimum, one may try to add further inequalities that are valid for 0/1 solutions and that rule out the problematic fractional solutions.

Many “P=NP” papers follow this approach, usually by presenting a polynomial-size linear programming relaxation of TSP and then “proving” that the optimum of the relaxation is the same as the combinatorial optimum. One can find recent examples here and here.

Similar results were “proved” in a notorious series of paper by Ted Swart in the mid 1980s. After counterexamples were found, he would revise the paper adding more inequalities that would rule out the counterexample.

Finally, Mihalis Yannakakis took matters into his own hands and proved that all “symmetric” relaxations of TSP of sub-exponential size have counterexamples on which the optimum of the relaxation is different from the combinatorial optimum. (All of the LPs suggested by Swart where “symmetric” according to Yannakakis’s definition.)

This is actually one of the few known lower bounds that actually applies to a “model of computation” capturing a general (and otherwise promising) class of algorithms.

(I first read about this story in Christos Papadimitriou’s complexity book, but I found the above references in Gerhard J Woeginger’s P versus NP page.)

In the theory of approximation algorithms, we have plenty of problems that are believed to be intractable but that are not known to be NP-hard, such as approximating Vertex Cover within a factor of 1.9, or approximating Sparsest Cut within a factor of 10. LP and Semidefinite Programming (SDP) approaches are more or less the only promising tools we have to prove that such problems are tractable and, while we wait for NP-hardness result (for now, we only have “Unique-Games-hardness”), it is good to see whether certain candidate LP and SDP algorithms have any chance, or if they admit counterexamples showing large gaps between the optimum of the relaxation and the combinatorial optimum.

The problem with this approach is the nearly endless variety of relaxations that one can consider: what happens when we add triangle inequalities? and pentagonal inequalities? and so on. As in the case of Yannakakis’s result, it would be great to have a result that says “all SDP relaxations of Vertex Cover of type X fail to achieve an approximation ratio smaller than 2,” where “type X” is a general class of sub-exponential size SDP relaxations that include the type of inequalities that people use “in practice.”

Lovasz and Schrijver describe a method, denoted LS+, that starts from an LP relaxation of a problem (actually it can start from any convex relaxation), and then turns it into tighter and tighter SDP relaxations, by adding auxiliary variables and linear and semidefinite constraints. A weaker version of the method, denoted LS, only adds auxiliary variables and linear constraints.

A nice thing about the method is that, after you apply it to your initial relaxation, thus getting a tighter relaxation, you can then apply it again to the tighter one, thus getting an even better relaxation, and so on. Starting from an LP relaxation with n variables and poly(n) constraints, k applications of the method yield a relaxation solvable in n^{O(k)} time, which is polynomial for all fixed k and sub-exponential for k=o(n/log n). Lovasz and Schrijver prove that, after k applications (or “rounds”) the resulting relaxation enforces all inequalities over k-tuples of variables that are valid for 0/1 solutions. (In particular, one gets the combinatorial optimum after n rounds.) Typical approaches in the design of approximation algorithms are SDP with local inequalities (triangle inequalities etc.), and this is all captured after a few rounds of LS+.

It would be great to show that no constant (ideally, no sublinear) number of rounds of LS+ starting from the basic LP relaxation gives a 2-\epsilon approximation for vertex cover. Arora, and others Bollobas and Lovasz considered related questions in a FOCS 2002 paper that has inspired a considerable amount of later work. (See the introduction of the journal version.) Unfortunately the question remains open evern for two rounds of LS+. After one round, one gets an SDP relaxation equivalent to (number of vertices minus) the Lovasz Theta function, and Goemans and Kleinberg prove that such SDP does not achieve approximation better than 2. Beyond that, it is pretty much terra incognita. Charikar proves that a relaxation with triangle inequalities (which is incomparable with two rounds of LS+ and is weaker than three rounds) does not achieve approximation better than 2. Also, a sublinear number of rounds of LS+ does not achieve approximation better than 7/6. For LS, which, I remind you, generates linear programming relaxations rather than semidefinite programming ones, we know that no sublinear number of rounds leads to an approximation better than 2.

I will illustrate the main idea in the LS and LS+ method using the example of Vertex Cover. In the linear programming formulation, we have variables x_i, one for each vertex i, and the constraints that x_i + x_j \geq 1 for each edge (i,j) and that 0\leq x_i \leq 1. We would like to add constraints that are only satisfied by 0/1 solutions, and the constraint x_i^2 = x_i would work beautifully except that it is not linear (nor convex). Instead, Lovasz and Schrijver add new variables y_{i,j}, one for each pair of vertices, with the idea that we would like to have y_{i,j}=x_i * x_j; then they add the requirement y_{i,i} = x_i and various inequalities that make the y_{i,j} be “sort of like” x_i*x_j. In particular, we would like to require that if x_k \neq 0, then x_i = y_{i,k}/x_k. This is again a non-linear constraint, but, at least, we can check whether, for each fixed k, y_{i,k}/x_k is a fractional vertex cover: we just need to check the inequalities y_{i,k} + y_{j,k} \geq x_k for each edge (i,j)

Similarly, if x_k \neq 1, we expect to have x_i = (x_i-y_{i,k})/(1-x_k). We cannot check it directly, but we can check if
x_i - y_{i,k} + x_j - y_{j,k} \geq 1-x_k
hold for each edge (i,j). Finally, we obviously want y_{i,j} = y_{j,i}. This describe the LS method. For LS+, we add the requirement that the symmetric matrix (y_{i,j}) be positive semidefinite. Being positive semidefinite means that there are vectors b_1,\ldots,b_n such that
y_{i,j} = \langle b_i,b_j \rangle
where \langle \cdot , \cdot \rangle denotes inner product. If y_{i,j} = x_i *x_j then (y_{i,j}) is clearly positive semidefinite.