Ran Canetti has written a post on the Berkeley Simons Institute blog concerning new assumptions used in recent cryptographic work on problems such as obfuscation, and concerning how the theory community should view such work.
Cui Peng of Renmin University in Beijing has recently released two preprints, one claiming a proof of P=NP and one claiming a refutation of the Unique Games Conjecture; I will call them the “NP paper” and the “UG paper,” respectively.
Of all the papers I have seen claiming a resolution of the P versus NP problem, and, believe me, I have seen a lot of them, these are by far the most legit. On Scott Aronson’s checklist of signs that a claimed mathematical breakthrough is wrong, they score only two.
Unfortunately, both papers violate known impossibility results.
The two papers follow a similar approach: a certain constraint satisfaction problem is proved to be approximation resistant (under the assumption that PNP, or under the UGC, depending on the paper) and then a Semidefinite Programming approximation algorithm is developed that breaks approximation resistance. (Recall that a constraint satisfaction problem is approximation resistant if there is no polynomial time algorithm that has a worst-case approximation ratio better than the algorithm that picks a random assignment.)
In both papers, the approximation algorithm is by Host, and it is based on a semidefinite programming relaxation studied by Charikar and Wirth.
The reason why the results cannot be correct is that, in both cases, if the hardness result is correct, then it implies an integrality gap for the Charikar-Wirth relaxation, which makes it unsuitable to break the approximation resistance as claimed.
Suppose that we have a constraint satisfaction problem in which every constraint is satisfied by a fraction of assignment. Then for such a problem to not be approximation resistant, we have to devise an algorithm that, for some fixed positive , returns a solution whose cost (the number of constraints that it satisfies) is at least times the optimum. The analysis of such an algorithm needs to include some technique to prove upper bounds for the true optimum; this is because if you are given an instance in which the optimum satisfies at most a fraction of constraints, as is the case for a random instance, then the algorithm will satisfy at most a fraction of constraints, but then the execution of the algorithm and the proof of correctness will give a (polynomial-time computable and polynomial-time checkable) certificate that the optimum satisfies at most a fraction of constraints.
For algorithms that are based on relaxations, such certificates came from the relaxation itself: one shows that the algorithm satisfies a number of constraints that is at least times the optimum of the relaxation, and the optimum of the relaxation is at least the optimum of the constraint satisfaction problem. But if there are instances for which the optimum is and the optimum of the relaxation is , then one cannot use such a relaxation to design an algorithm that breaks approximation-resistance. (Because on, such instances, the algorithm will not be able to satisfy a number of constraint equal to times the optimum of the relaxation.)
In the UG paper, the approximation resistance relies on a result of Austrin and Håstad. Like all UGC-based inapproximability results that I am aware of, the hardness results of Austrin and Håstad are based on a long code test. A major result of Raghavendra is that for every constraint satisfaction problem one can write a certain SDP relaxation such that the integrality gap of the relaxation is equal to the ratio between soundness and completeness in the best possible long code test that uses predicates from the constraint satisfaction problem. In particular, in Section 7.7 of his thesis, Prasad shows that if you have a long code test with soundness and completeness for a constraint satisfaction problem, then for every there is an instance of the problem in which no solution satisfies more than fraction of constraints, but there is a feasible SDP solution whose cost is at least a fraction of the number of constraints. The SDP relaxation of Charikar and Wirth is the same as the one studied by Prasad. This means that if you prove, via a long code test, that a certain problem is approximation resistant, then you also show that the SDP relaxation of Charikar and Wirth cannot be used to break approximation resistance.
The NP paper adopts a technique introduced by Siu On Chan to prove inapproximability results by starting from a version of the PCP theorem and then applying a “hardness amplification” reduction. Tulsiani proves that if one proves a hardness-of-approximation result via a “local” approximation-reduction from Max 3LIN, then the hardness-of-approximation result is matched by an integrality gap for Lasserre SDP relaxations up to a super-constant number of rounds. The technical sense in which the reduction has to be “local” is as follows. A reduction from Max 3LIN (the same holds for other problems, but we focus on starting from Max 3LIN for concreteness) to another constraint satisfaction problems has two parameters: a “completeness” parameter and a “soundness” parameter , and its properties are that:
- (Completeness Condition) the reduction maps instances of 3LIN in which the optimum is to instances of the target problem in which the optimum is at least ;
- (Soundness Condition) the reduction maps instances of 3LIN in which the optimum is to instances of the target problem in which the optimum is at most .
Since we know that it’s NP-hard to distinguish Max 3LIN instances in which the optimum is from instances in which the optimum is , such a reduction shows that, in the target problem, it is NP-hard to distinguish instances in which the optimum is from instances in which the optimum is . The locality condition studied by Tulsiani is that the Completeness Condition is established by describing a mapping from solutions satisfying a fractions of the Max 3LIN constraints to solutions satisfying a fraction of the target problem constraints, and the assignment to each variable of the target problem can be computed by looking at a sublinear (in the size of the Max 3LIN instance) number of Max 3LIN variables. Reductions that follows the Chan methodology are local in the above sense. This means that if one proves that a problem is approximation-resistant using the Chan methodology starting from the PCP theorem, then one has a local reduction from Max 3LIN to the problem with completeness and soundness , where, as before, is the fraction of constraints of the target problem satisfied by a random assignment. In turn, this implies that not just the Charikar-Wirth relaxation, but that, for all relaxations obtained in a constant number of rounds of Lasserre relaxations, there are instances of the target problem that have optimum and SDP optimum , so that the approximation resistance cannot be broken using such SDP relaxations.
Sufficiently many to start a soccer team.
Some constraint satisfaction problems are approximation resistant, in the sense that, unless P=NP, there is no polynomial time algorithm that achieves a better approximation ratio than the trivial algorithm that picks a random assignment. For example, a random assignment satisfies (on average) a fraction of the clauses of a given Max 3SAT instance, and, for every , it is NP-hard to achieve approximation . Max 3LIN is the variant of Max 3SAT in which every constraint is a XOR instead of an OR of variables; it is another example of an approximation resistant problem, because a random assignment satisfies (on average) half of the constraints, and approximation is NP-hard for every . (These, and more, hardness of approximation results were proved by Håstad in 1997, in a paper with a notably understated title.)
In 2000, Håstad proved that if we restrict constraint satisfaction problems to instances in which every variable occurs in (at most) a fixed constant number of constraints, then the problem is never approximation resistant. If we have a constraint satisfaction problem in which each constraint is satisfied with probability by a random assignment, and each variable appears in at most constraint, then there is a simple polynomial time algorithm that achieves an approximation ratio . The following year, I showed that if we have a constraint satisfaction problem that is NP-hard to approximate within a factor of , then it is also NP-hard to approximate within a factor , where is a constant (whose value depends on the specific constraint satisfaction problem), when restricted to instances in which every variable occurs at most times.
Thus, for example, if we restrict to instances in which every variable occurs in at most constraints, Max 3SAT can be approximated within a factor but not , and Max 3LIN can be approximated within a factor but not in polynomial time, unless , where are constants.
Last Fall, Prasad Raghavendra and I worked for a while on the problem of bridging this gap. The difficulty with Max 3SAT is that there are instances derived from Max Cut such that every variable occurs in at most clauses, there is no “trivial contradiction” (such as 8 clauses over the same 3 variables, which have a fixed contribution to the cost function and can be eliminated without loss of generality), and every assignment satisfies at most clauses. If we want an approximation ratio , we need our algorithm to certify that such instances are unsatisfiable. It is probably possible to show that there are simple LP or SDP relaxations of Max 3SAT such that a polynomial time algorithm can find assignments that satisfies a number of clauses which is at least a times the optimum of the relaxation, but we could not find any way to reason about it, and we gave up. Also, we wrongly assumed that there was the same issue with Max 3LIN.
Meanwhile, Farhi, Goldstone and Gutmann, who had successfully developed a quantum algorithm to approximate Max Cut on bounded degree graphs, were looking for another problem to tackle, and asked Madhu Sudan what was known about NP-hard and Unique Games-hard problems on bounded degree instances. They were particularly interested in Max 3SAT in bounded-degree instances. Madhu referred them to me, Sam Gutmann happened to be in the Bay Area, and so we met in November and I pointed them to the known literature and suggested that they should try Max 3LIN instead.
A month later, I heard back from them, and they had a approximate algorithm for Max 3LIN. That sounded amazing, so I went looking into the paper for the section in which they discuss their upper bound technique, and there is none. They show that, for every instance that does not have trivial contradictions (meaning two constraints that are the negation of each other), there is an assignment that satisfies a fraction of constraints, and they describe a distribution that, on average, satisfies at least as many. The distribution is samplable by a quantum computer, so the approximation, in their paper, is achieved by a quantum algorithm.
After realizing that we had been wrong all along on the need for non-trivial upper bounds for Max 3LIN, Prasad and I tried to find a way to replicate the result of Farhi et al. with a classical algorithm, and we found a way to satisfy a fraction of constraints in instances of constraint satisfaction problems “without triangles” (a result of this form is also in the paper of Farhi et al.), and then a fraction of constraints in all Max 3LIN instances.
The day before submitting our paper to ICALP (from which it would have been rejected without consideration anyways), I saw a comment by Boaz Barak on Scott Aronson’s blog announcing the same results, so we got in contact with Boaz, who welcomed us to the club of people who had, meanwhile, gotten those results, which also included Ankur Moitra, Ryan O’Donnell, Oded Regev, David Steurer, Aravindan Vijayaraghavan, David Witmer, and John Wright. Later, Johan Håstad also discovered the same results. If you kept count, that’s eleven theoreticians.
The paper is now online (with only 10 authors, Johan
may write posted a separate note); we show that a fraction of constraints can be satisfied in all Max kLIN instances, with odd , and a advantage over the random assignment can be achieved in all “triangle-free” instances of constraint satisfaction problems. It remains an open problem to improve Håstad’s approximation for Max 3SAT.
The argument for Max 3LIN is very simple. Khot and Naor prove that, given a Max 3LIN instance , one can construct a bipartite Max 3LIN instance such that an assignment satisfying a fraction of constraints in can be easily converted into an assignment satisfying a fraction of constraints of ; furthermore, if every variable occurs in at most constraints of , then every variable occurs in at most constraints of .
An instance is bipartite if we can partition the set of variables into two subsets and , such that each constraint involves two variables from and one variable from . The reduction creates two new variables and for each variable of ; every constraint of is replaced by the three constraints
Given an assignment to the and variables that satisfies a fraction of the constraints of , Khot and Naor show that either , or , or an assignment obtained by choosing to be with probability or with probability , satisfies at least a fraction of constraints of .
It remains to show that, given a bipartite instance of Max 3LIN in which every variable occurs in at most constraints (and which does not contain two equations such that one is the negation of the other), we can find an assignment that satisfies a fraction of constraints.
The idea is to first pick the variables at random, and then to pick the variables greedily given the choice of the variables.
When we pick the variables at random, the instance reduces to a series of constraints of the form . Each variable belongs to (at most, but let’s assume exactly, which is actually the worst case for the analysis) such constraints; on average, half of those constraints will be and half will be . If the fixings of clauses of were mutually independent (which would be the case in “triangle-free” instances), then we would expect that the difference between 0s and 1s be about , so the greedy fixing has a advantage over the random assignment.
In general instances, although we do not have mutual independence, we do have pairwise independence and “almost four-wise independence.” Fix a variable , and let us call the set of pairs such that constraint is part of the instance, for some , and let us call the random variable which is if and otherwise, for a random choice of the variables. We want to argue that, with constant probability, .
First, we see that the are unbiased, and they are pairwise independent, so . The fourth moment of is plus the number of 4-cycles in the graph that has vertices and the edges in . Now, contains edges, a four-cycle is completely described by the first and third edge of the cycle, so the fourth moment is . Finally, it is a standard fact that if we have a sum of unbiased random variables, and the second moment of their sum is and the fourth moment of their sum is , then the absolute value of the sum is, on average (and with constant probability) .
The algorithm for general CSPs on triangle-free instances is similarly based on the idea or randomly fixing some variables and then greedily fixing the remaining variables. Without the reduction to the “bipartite” case, which does not hold for problems like Max 3SAT, it is more technically involved to argue the advantage over the random assignment.
Is there a polynomial time algorithm that achieves a approximation ratio for Max 3SAT in instances such that each variable occurs at most times? This remains an open question.
This year, the chair of ICALP decided to play an April Fool’s prank three weeks early, and I received the following message:
“Dear author, we regret to inform you that the margins in your submission are too small, and hence we are rejecting it without review”
I was almost fooled. In my defense, the second time that I applied for a position in Italy, the hiring committee judged all my publications to be non-existent, because the (multiple) copies I had sent them had not been authenticated by a notary. So I am trained not to consider it too strange that a paper could be evaluated based on the width of its margins (or the stamps on its pages) rather than on the content of its theorem.
[Update 10/24/14: there was a bug in the code I wrote yesterday night, apologies to the colleagues at Rutgers!]
[Update 10/24/14: a reaction to the authoritative study of MIT and the University of Maryland. Also, coincidentally, today Scott Adams comes down against reputation-based rankings]
Saeed Seddighin and MohammadTaghi Hajiaghayi have proposed a ranking methodology for theory groups based on the following desiderata: (1) the ranking should be objective, and based only on quantitative information and (2) the ranking should be transparent, and the methodology openly revealed.
Inspired by their work, I propose an alternative methodology that meets both criteria, but has some additional advantages, including having an easier implementation. Based on the same Brown University dataset, I count, for each theory group, the total number of letters in the name of each faculty member.
Here are the results (apologies for the poor formatting):
1 ( 201 ) Massachusetts Institute of Technology
2 ( 179 ) Georgia Institute of Technology
3 ( 146 ) Rutgers – State University of New Jersey – New Brunswick
4 ( 142 ) University of Illinois at Urbana-Champaign
5 ( 141 ) Princeton University
6 ( 139 ) Duke University
7 ( 128 ) Carnegie Mellon University
8 ( 126 ) University of Texas – Austin
9 ( 115 ) University of Maryland – College Park
10 ( 114 ) Texas A&M University
11 ( 111 ) Northwestern University
12 ( 110 ) Stanford University
13 ( 108 ) Columbia University
14 ( 106 ) University of Wisconsin – Madison
15 ( 105 ) University of Massachusetts – Amherst
16 ( 105 ) University of California – San Diego
17 ( 98 ) University of California – Irvine
18 ( 94 ) New York University
19 ( 94 ) State University of New York – Stony Brook
20 ( 93 ) University of Chicago
21 ( 91 ) Harvard University
22 ( 91 ) Cornell University
23 ( 87 ) University of Southern California
24 ( 87 ) University of Michigan
25 ( 85 ) University of Pennsylvania
26 ( 84 ) University of California – Los Angeles
27 ( 81 ) University of California – Berkeley
28 ( 78 ) Dartmouth College
29 ( 76 ) Purdue University
30 ( 71 ) California Institute of Technology
31 ( 67 ) Ohio State University
32 ( 63 ) Brown University
33 ( 61 ) Yale University
34 ( 54 ) University of Rochester
35 ( 53 ) University of California – Santa Barbara
36 ( 53 ) Johns Hopkins University
37 ( 52 ) University of Minnesota – Twin Cities
38 ( 49 ) Virginia Polytechnic Institute and State University
39 ( 48 ) North Carolina State University
40 ( 47 ) University of Florida
41 ( 45 ) Rensselaer Polytechnic Institute
42 ( 44 ) University of Washington
43 ( 44 ) University of California – Davis
44 ( 44 ) Pennsylvania State University
45 ( 40 ) University of Colorado Boulder
46 ( 38 ) University of Utah
47 ( 36 ) University of North Carolina – Chapel Hill
48 ( 33 ) Boston University
49 ( 31 ) University of Arizona
50 ( 30 ) Rice University
51 ( 14 ) University of Virginia
52 ( 12 ) Arizona State University
53 ( 12 ) University of Pittsburgh
I should acknowledge a couple of limitations of this methodology: (1) the Brown dataset is not current, but I believe that the results would not be substantially different even with current data, (2) it might be reasonable to only count the letters in the last name, or to weigh the letters in the last name by 1 and the letters in the first name by 1/2. If there is sufficient interest, I will post rankings according to these other methodologies.
STOC 2015 will be in Portland (yay!), and part of FCRC (boo!). The call for papers is out.
A conjectural approaches to the Riemann hypothesis is to find a correspondence between the zeroes of the zeta function and the eigenvalues of a linear operator. This was first conceived by Hilbert and by Pólya in the 1910s. (See this correspondence with Odlyzko.) In the subsequent century, there have been a number of rigorous results relating the zeroes of zeta functions in other settings to eigenvalues of linear operators. There is also a connection between the distribution of known zeroes of the Riemann zeta function, as derived from explicit calculations, and the distribution of eigenvalues of the Gaussian unitary ensemble.
In a previous post we talked about the definition of the Ihara zeta function of a graph, and Ihara’s explicit formula for it, in terms of a determinant that is closely related to the characteristic polynomial of the adjacency matrix of the graph, so that the zeroes of the zeta function determine the eigenvalue of the graph. And if the zeta function of the graph satisfies a “Riemann hypothesis,” meaning that, after a change of variable, all zeroes and poles of the zeta function are in absolute value, then the graph is Ramanujan. Conversely, if the graph is Ramnujan then its Ihara zeta function must satisfy the Riemann hypothesis.
As I learned from notes by Pete Clark, the zeta functions of curves and varieties in finite fields also involve a determinant, and the “Riemann hypothesis” for such curves is a theorem (proved by Weil for curves and by Deligne for varieties), which says that (after an analogous change of variable) all zeroes and poles of the zeta function must be in absolute value, where is the size of the finite field. This means that one way to prove that a family of -regular graphs is Ramanujan is to construct for each graph in the family a variety over such that the determinant that comes up in the zeta function of is the “same” (up to terms that don’t affect the roots, and to the fact that if is a root for one characteristic polynomial, then has to be a root for the other) as the determinant that comes up in the zeta function of the variety . Clark shows that one can constructs such families of varieties (in fact, curves are enough) for all the known algebraic constructions of Ramanujan graphs, and one can use families of varieties for which the corresponding determinant is well understood to give new constructions. For example, for every , if is the number of distinct prime divisors of (which would be one if is a prime power) Clark gets a family of graphs with second eigenvalue . (The notes call the number of distinct prime divisors of , but it must be a typo.)
So spectral techniques underlie fundamental results in number theory and algebraic geometry, which then give us expander graphs. Sounds like something that we (theoretical computer scientists) should know more about.
The purpose of these notes is to explore a bit more the statements of these results, although we will not even begin to discuss their proofs.
One more reason why I find this material very interesting is that all the several applications of polynomials in computer science (to constructing error-correcting codes, secret sharing schemes, -wise independent generators, expanders of polylogarithmic degree, giving the codes used in PCP constructions, self-reducibility of the permanent, proving combinatorial bounds via the polynomial method, and so on), always eventually rely on three things:
- A multivariate polynomial restricted to a line can be seen as a univariate polynomial (and restricting to a -dimensional subspace gives a -variate polynomial); this means that results about multivariate polynomials can often be proved via a “randomized reduction” to the univariate case;
- A univariate polynomial of degree has at most roots, which follows from unique factorization
- Given desired roots we can always find an univariate polynomial of degree which has those roots, by defining it as .
This is enough to explain the remarkable pseudo-randomness of constructions that we get out of polynomials, and it is the set of principles that underlie much of what is below, except that we are going to restrict polynomials to the set of zeroes of another polynomial, instead of a line, and this is where things get much more complicated, and interesting.
Before getting started, however, I would like to work out a toy example (which is closely related to what goes on with the Ihara zeta function) showing how an expression that looks like the one defining zeta functions can be expressed as a characteristic polynomial of a linear operator, and how its roots (which are then the eigenvalues of the operator) help give bound to a counting problem, and how one can get such bounds directly from the trace of the operator.
If we have quantities that we want to bound, then the “zeta function” of the sequence is
For example, and this will be discussed in more detail below, if if a bivariate polynomial over , we may be interested in the number of solutions of the equation over , as well as the number of solutions over the extension . In this case, the zeta function of the curve defined by is precisely
and Weil proved that is a rational function, and that if are the zeroes and poles of , that is, the roots of the numerator and the denominator, then they are all at most in absolute value, and one has
Here is our toy example: suppose that we have a graph and that is the number of cycles of the graph of length . Here by “cycle of length ” we mean a sequence of vertices such that and, for , . So, for example, in a graph in which the vertices form a triangle, and are two distinct cycles of length 3, and is a cycle of length 2.
We could define the function
and hope that its zeroes have to do with the computation of , and that can be defined in terms of the characteristic polynomial.
Indeed, we have (remember, is a complex number, not a matrix):
and, if are the zeroes and poles of , then
How did we get these bounds? Well, we cheated because we already knew that , where are the eigenvalues of . This means that
Where we used . Now we see that the poles of are precisely the inverses of the eigenvalues of .
Now we are ready to look more closely at various zeta functions.
I would like to thank Ken Ribet and Jared Weinstein for the time they spent answering questions. Part of this post will follow, almost word for word, this post of Terry Tao. All the errors and misconceptions below, however, are my own original creation.
After my lectures in the “boot camp” of the spectral graph theory program at the Simons Institute, I promised I would post some references, because I stated all results without attribution.
Here is a a first draft.
If you notice that work that you know of (for example, your work) is misrepresented or absent, please let me know and I will edit the document. (If possible, when you do so, do not compare me to Stalin and cc your message to half a dozen prominent people — true story.)
I am still in shock at the news that Microsoft decided to shut down MSR SVC and fire, literally from one day to the next, almost all the scientific staff.
It is shocking that the company decided that it was a good idea to destroy an investment they had made over several years; I am only familiar with the theory side of MSR SVC, which had great success in nurturing young talent and developing breakthrough ideas and results. The work on privacy led by Cynthia Dwork, for example, informed the thinking on privacy of higher management, and on how one could find new ways to balance European privacy law with the data collection necessary for advertising. This is one of the returns of having academic research in a company: not so that they can implement in C++ their algorithms, but so that they can generate and communicate new ways of thinking about problems. (Cynthia is one of the few people retained by Microsoft, but she has lost all her collaborators.) Microsoft’s loss will be other places’ win as the MSR SVC diaspora settles down elsewhere.
It is also shocking that, instead of planning an orderly shutdown, they simply threw people out overnight, which shows a fundamental indifference to the way the academic job market works (it can take a full year for an academic job offer to materialize).
I am not shocked by the class demonstrated by Omer Reingold; the moral stature of people is best seen in difficult moments. Omer has written a beautiful post about the lab, whose comment section has become a memorial to the lab, with people posting their personal remembrances.
Here at Berkeley and Stanford we will do our best to help, and we will make sure that everybody has some space to work. There will also be some kind of community-wide response, but it will take some time to figure out what we can do. Meanwhile, I urge everybody to reach out to their friends formerly at MSR SVC, make them feel the love of their community, and connect them to opportunities for both short term and long term positions, as they weigh their options.
Suppose that we want to construct a very good family of -regular expander graphs. The Alon-Boppana theorem says that the best we can hope for, from the point of view of spectral expansion, is to have , and we would certainly be very happy with a family of graphs in which .
Known constructions of expanders produce Cayley graphs (or sometimes Schreier graphs, which is a related notion), because it is easier to analyze the spectra of such graphs. If is a group with operation and is the inverse of element , and is a symmetric set of generators, then the Cayley graph is the graph whose vertices are the elements of and whose edges are the pairs such that .
When the group is Abelian, there is good news and bad news. The good news is that the eigenvectors of the graphs are completely characterized (they are the characters of ) and the eigenvalues are given by a nice formula. (See here and here.) The bad news is that constant-degree Cayley graphs of Abelian groups cannot be expanders.
That’s very bad news, but it is still possible to get highly expanding graphs of polylogarithmic degree as Cayley graphs of Abelian groups.
Here we will look at the extreme case of a family of graphs of degree , where is the number of vertices. Even with such high degree, the weak version of the Alon-Boppana theorem still implies that we must have , and so we will be happy if we get a graph in which . Highly expanding graphs of degree are interesting because they have some of the properties of random graphs from the distribution. In turn, graphs from have all kind of interesting properties with high probabilities, including being essentially the best known Ramsey graphs and having the kind of discrepancy property that gives seedless extractors for two independent sources. Unfortunately, these properties cannot be certified by spectral methods. The graph that we will study today is believed to have such stronger properties, but there is no known promising approach to prove such conjectures, so we will content ourselves with proving strong spectral expansion.
The graph is the Paley graph. If is a prime, is the group of addition modulo , and is the set of elements of of the form , then the graph is just . That is, the graph has a vertex for each , and two vertices are adjacent if and only if there is an such that .