Beyond Worst-Case Analysis: Lecture 8

Scribed by Luowen Qian

In which we use spectral techniques to find certificates of unsatisfiability for random {k}-SAT formulas.

1. Introduction

Given a random {k}-SAT formula with {m} clauses and {n} variables, we want to find a certificate of unsatisfiability of such formula within polynomial time. Here we consider {k} as fixed, usually equal to 3 or 4. For fixed {n}, the more clauses you have, the more constraints you have, so it becomes easier to show that these constraints are inconsistent. For example, for 3-SAT,

  1. In the previous lecture, we have shown that if {m > c_3 \cdot n} for some large constant {c_3}, almost surely the formula is not satisfiable. But it’s conjectured that there is no polynomial time, or even subexponential time algorithms that can find the certificate of unsatisfiability for {m = O(n)}.
  2. If {m > c \cdot n^2} for some other constant {c}, we’ve shown in the last time that we can find a certificate within polynomial time with high probability that the formula is not satisfiable.

    The algorithm for finding such certificate is shown below.

    • Algorithm 3SAT-refute({f})
    • for {b_1 \in \{0,1\}}
      • if 2SAT-satisfiable({f} restricted to clauses that contains {x_1= \overline b_1}, with {x:= \overline b_1})
        • return {\bot}
    • return UNSATISFIABLE

    We know that we can solve 2-SATs in linear time, and approximately

    \displaystyle \frac{\binom{n - 1} 2 \cdot m}{\binom n 3 \cdot 2} = \frac{3m}{2n + O(1)} > \frac 3 2 cn - O(1)

    clauses contains {x_1 = \overline{b_1}}. Similarly when {c} is sufficiently large, the 2-SATs will almost surely be unsatisfiable. When a subset of the clauses is not satisfiable, the whole 3-SAT formula is not satisfiable. Therefore we can certify unsatisfiability for 3-SATs with high probability.

In general for {k}-SAT,

  1. If {m > c_k \cdot n} for some large constant {c_k}, almost surely the formula is not satisfiable.
  2. If {m > c'_k \cdot n^{k - 1}} for some other constant {c'_k}, we can construct a very similar algorithm, in which we check all assignments to the first {k-2} variables, and see if the 2SAT part of the restricted formula is unsatisfiable.

    Since for every fixed assignments to the first {k - 2} variables, approximately

    \displaystyle \frac{\binom{n - k + 2} 2}{\binom n k 2^{k - 2}} = \frac{k!}{(n^{k - 2} + O(n^{k - 3})) 2^{k - 1}}

    portion of the {m} clauses remains, we expect the constant {c'_k = \Omega\left(\frac{2^k}{k!}\right)} and the running time is {O(2^k m)}.

So what about {m}‘s that are in between? It turns out that we can do better with spectral techniques. And the reason that spectral techniques work better is that unlike the previous method, it does not try all the possible assignments and fails to find a certificate of unsatisfiability.

2. Reduce certifying unsatisfiability for k-SAT to finding largest independent set

2.1. From 3-SAT instances to hypergraphs

Given a random 3-SAT formula {f}, which is an and of {m} random 3-CNF-SAT clauses over {n} variables {x_1, x_2, ..., x_n} (abbreviated as vector {{\bf x}}), i.e.

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3}} and no two {(\sigma_{i,1}, b_{i,1}, \sigma_{i,2}, b_{i,2}, \sigma_{i,3}, b_{i,3})} are exactly the same. Construct hypergraph {H_f = (X, E)}, where

\displaystyle X = \left\{(i, b) \middle| i \in [n], b \in \{0, 1\}\right\}

is a set of {2n} vertices, where each vertex means an assignment to a variable, and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline{b_{j,1}}), (\sigma_{j,2}, \overline{b_{j,2}}), (\sigma_{j,3}, \overline{b_{j,3}})\}

is a set of {m} 3-hyperedges. The reason we’re putting in the negation of {b} is that a 3-CNF clause evaluates to false if and only if all three subclauses evaluate to false. This will be useful shortly after.

First let’s generalize the notion of independent set for hypergraphs.

An independent set for hypergraph {H = (X, E)} is a set {S \subseteq X} that satisfies {\forall e \in E, e \not \subseteq S}.

If {f} is satisfiable, {H_f} has an independent set of size at least {n}. Equivalently if the largest independent set of {H_f} has size less than {n}, {f} is unsatisfiable. Proof: Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i) | i \in [n] \}} is an independent set of size {n}. If not, it means some hyperedge {e_j \subseteq S}, so {\sigma_{j,1} = \overline{b_{j,1}} \land \sigma_{j,2} = \overline{b_{j,2}} \land \sigma_{j,3} = \overline{b_{j,3}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

We know that if we pick a random graph that’s sufficiently dense, i.e. the average degree {d > \ln n}, by spectral techniques we will have a certifiable upper bound on the size of the largest independent set of {O\left(\frac n{\sqrt d}\right)} with high probability. So if a random graph has {\Omega(n \log n)} random edges, we can prove that there’s no large independent set with high probability.

But if we have a random hypergraph with {\Omega(n \log n)} random hyperedges, we don’t have any analog of spectral theories for hypergraphs that allow us to do this kind of certification. And from the fact that the problem of certifying unsatisfiability of random formula of {\Omega(n \log n)} clauses is considered to be hard, we conjecture that there doesn’t exist a spectral theory for hypergraphs able to replicate some of the things we are able to do on graphs.

However, what we can do is possibly with some loss, to reduce the hypergraph to a graph, where we can apply spectral techniques.

2.2. From 4-SAT instances to graphs

Now let’s look at random 4-SATs. Similarly we will write a random 4-SAT formula {f} as:

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \lor x_{\sigma_{i,4}} = b_{i,4} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3} < \sigma_{i,4}} and no two {(\sigma_{i,1}, b_{i,1}, ..., \sigma_{i,4}, b_{i,4})} are exactly the same. Similar to the previous construction, but instead of constructing another hypergraph, we will construct just a graph {G_f = (V, E)}, where

\displaystyle V = \left\{(i_1, b_1, i_2, b_2) \middle| i_1, i_2 \in [n], b_1, b_2 \in \{0, 1\}\right\}

is a set of {4n^2} vertices and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline {b_{j,1}}, \sigma_{j,2}, \overline {b_{j,2}}), (\sigma_{j,3}, \overline {b_{j,3}}, \sigma_{j,4}, \overline {b_{j,4}})\}

is a set of {m} edges.

If {f} is satisfiable, {G_f} has an independent set of size at least {n^2}. Equivalently if the largest independent set of {H_f} has size less than {n^2}, {f} is unsatisfiable. Proof: The proof is very similar to the previous one. Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i, x_j, y_j) | i, j \in [n] \}} is an independent set of size {n^2}. If not, it means some edge {e_j \subseteq S}, so {\sigma_{j,1} = \overline {b_{j,1}} \land \sigma_{j,2} = \overline {b_{j,2}} \land \sigma_{j,3} = \overline {b_{j,3}} \land \sigma_{j,4} = \overline {b_{j,4}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

From here, we can observe that {G_f} is not a random graph because some edges are forbidden, for example when the two vertices of the edge has some element in common. But it’s very close to a random graph. In fact, we can apply the same spectral techniques to get a certifiable upper bound on the size of the largest independent set if the average degree {d > \ln n}, i.e. if {m = \Omega(n^2 \log n)}, we can certify unsatisfiability with high probability, by upper bounding the size of the largest independent set in the constructed graph.

We can generalize this results for all even {k}‘s. For random {k}-SAT where {k} is even, if {m > c_k n^{k/2} \log n}, we can certify unsatisfiability with high probability, which is better than the previous method which requires {m = \Omega(n^{k - 1})}. The same {n^{k/2}(\log n)^{O(1)}} is achievable for odd {k}, but the argument is significantly more complicated.

2.3. Certifiable upper bound for independent sets in modified random sparse graphs

Despite odd {k}‘s, another question is that in this setup, can we do better and get rid of the {\log n} term? This term is coming from the fact that spectral norm break down when the average degree {d < \ln n}. However it’s still true that random graph doesn’t have any large independent sets even when the average degree {d} is constant. It’s just that the spectral norm isn’t giving us good bounds any more, since the spectral norm is at most {O\left(\sqrt{\max d}\right) = O\left(\sqrt \frac{\log n}{\log \log n}\right)}. So is there something tighter than spectral bounds that could help us get rid of the {\log n} term? Could we fix this by removing all the high degree vertices in the random graph?

This construction is due to Feige-Ofek. Given random graph {G \sim G_{n, p}}, where the average degree {d = np} is some large constant. Construct {G'} by taking {G} and removing all edges incident on nodes with degree higher than {2\bar d} where {\bar d} is the average degree of {G}. We denote {A} for the adjacency matrix of {G} and {A'} for that of {G'}. And it turns out,

With high probability, {\left\lVert A' - \frac d n J \right\rVert \le O\left(\sqrt d\right)}.

It turns out to be rather difficult to prove. Previously we saw spectral results on random graphs that uses matrix traces to bound the largest eigenvalue. In this case, it’s hard to do so because the contribution to the trace of a closed walk is complicated by the fact that edges have dependencies. The other approach is that given random matrix {M}, we will try to upper bound {\left\lVert M \right\rVert = \max\limits_x \frac {x^T M x} {\lVert x \rVert^2}}. A standard way for this is to that for every solution, count the instances of {M} in which the fixed solution is good, and argue that the number of the fixed solutions is small, which tells us that there’s no good solution. The problem here is that the set of solutions is infinitely large. So Feige-Ofek discretize the set of vectors, and then reduce the bound on the quadratic form of a discretized vector to a sum of several terms, each of which has to be carefully bounded.

We always have

\displaystyle  \max \textrm{IndSet}(G) \le \max \textrm{IndSet}(G') \le \frac n d \left\lVert A' - \frac d n J \right\rVert

and so, with high probability, we get an {O\left(\frac n {\sqrt d}\right)} polynomial time upper bound certificate to the size of the independent set for a {G_{n,d/n}} random graph. This removes the extra {\log n} term from our analysis of certificates of unsatisfiability for random {k}-SAT when {k} is even.

3. SDP relaxation of independent sets in random sparse graphs

In order to show a random graph has no large independent sets, a more principled way is to argue that there is some polynomial time solvable relaxation of the problem whose solution is an upper bound of the problem.

Let SDPIndSet{(G)} be the optimum of the following semidefinite programming relaxation of the Independent Set problem, which is due to Lovász:

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \langle {\bf x}_0, {\bf x}_i \rangle = ||{\bf x}_i ||^2 \ \ \ \forall i\in V\\ && \langle {\bf x}_i, {\bf x}_j \rangle = 0 \ \ \ \forall (i,j)\in E \end{array}

Since it’s the relaxation of the problem of finding the maximum independent set, {\max \textrm{IndSet}(G) \le \textrm{SDPIndSet}(G)} for any graph {G}. And this relaxation has a nice property.

For every {0 < p < 1}, and for every graph {G}, we have \begin{equation*} {\rm SDPIndSet}(G) \leq \frac 1p \cdot || pJ – A || \end{equation*} where {J} is the all-one matrix and {A} is the adjacency matrix of {G}.

Proof: First we note that SDPIndSet{(G)} is at most

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

and this is equal to

\displaystyle  \begin{array}{rcl}  \max && \frac { \left( \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle \right) ^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

because

\displaystyle  \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle = \left\langle \sum_{i\in V} {\bf x}_i , {\bf x}_0\right \rangle \leq \left \| \sum_{i\in V} {\bf x}_i \right \| \cdot || {\bf x}_0 || = \left \| \sum_{i\in V} {\bf x}_i \right \|

Finally, the above optimization is equivalent to the following

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most the unconstrained problem

\displaystyle \begin{aligned} \max \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} &= \max \frac { \sum_{i,j} \left( J - \frac 1p A\right)_{i,j} \langle {\bf x}_i,{\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} \\ &= \lambda_{\max} \left (J - \frac 1p A \right) \\ &\leq \frac 1p || pJ - A||. \end{aligned}

\Box

Recall from the previous section that we constructed {G'} by removing edges from {G}, which corresponds to removing constraints in our semidefinite programming problem, so {\textrm{SDPIndSet}(G) \le \textrm{SDPIndSet}(G') \le \left\lVert J - \frac 1 p A' \right\rVert}, which is by theorem 3 at most {O\left(\frac n{\sqrt d}\right)} with high probability.

4. SDP relaxation of random k-SAT

From the previous section, we get an idea that we can use semidefinite programming to relax the problem directly and find a certificate of unsatisfiability for the relaxed problem.

Given a random {k}-SAT formula {f}:

\displaystyle  \begin{array}{rcl}  f({\bf x}) &= & \bigwedge\limits_{i = 1}^m \bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\overline{\bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j}}} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\bigwedge\limits_{j = 1}^k x_{\sigma_{i,j}} = \overline{b_{i,j}}}. \end{array}

The satisfiability of {f} is equivalent of the satisfiability of the following equations:

\displaystyle  \begin{array}{rcl}  && x_i^2 = x_i \forall i \in [n] \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}x_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \end{array}

Notice that if we expand the polynomial on the left side, there are some of the monomials having degree higher than 2 which prevents us relaxing these equations to a semidefinite programming problem. In order to resolve this, {\forall A \subseteq {\bf x}} and {|A| \le k/2} we introduce {x_A = \prod_{i \in A} x_i}. Then we can relax all variables to be vectors, i.e.

\displaystyle  \begin{array}{rcl}  && \lVert {\bf x}_\emptyset \rVert^2 = 1 \\ && \langle {\bf x}_A, {\bf x}_B \rangle = \langle {\bf x}_C, {\bf x}_D \rangle \ \ \ \forall A \cup B = C \cup D \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}{\bf x}_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \ \ \ \textrm{rewritten as quadratic forms of } {\bf x}_A \end{array}

For example, if we have a 4-SAT clause

\displaystyle  x_3 \lor \overline{x_4} \lor x_7 \lor \overline{x_{10}},

we can rewrite it as

\displaystyle  \begin{array}{rcl}  1 - (1 - {\bf x}_3) \cdot {\bf x}_4 \cdot (1 - {\bf x}_7) \cdot {\bf x}_{10} &= &1 - {\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_7{\bf x}_{10} - {\bf x}_3{\bf x}_4{\bf x}_7{\bf x}_{10} \\ &= &1 - {\bf x}_{\{4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,7\}}{\bf x}_{\{10\}} - {\bf x}_{\{3,4\}}{\bf x}_{\{7,10\}}. \end{array}

For this relaxation, we have:

  1. If {m < c(k, n) n^{k/2}}, the SDP associated with the formula is feasible with high probability, where {c(k, n) = 1/n^{o(1)}} for every fixed {k}.
  2. If {m > c'(k) n^{k/2}}, the SDP associated with the formula is not feasible with high probability, where {c'(k, n)} is a constant for every fixed even {k}, and {c'(k, n) = \textrm{poly}(\log n)} for every fixed odd {k}.

Advertisements

412346_252840668164070_176511152_o

(Photo from facebook.com)

 

 Michael Cohen, one the most brilliant young minds of our field, recently passed away in Berkeley.

After going to MIT for college, Michael worked for Facebook and was a graduate student at MIT. This semester, he was at Berkeley as Simons Fellow in connection with the program on optimization at the Simons Institute.

In a few short years, Michael left his mark on a number of problems that are close to the heart of in theory‘s readers.

He was part of the team that developed the fastest algorithm for solving systems of linear equations in which the matrix of constraints is a graph Laplacian (or, more generally, is symmetric and diagonally dominated), running in time O(m \sqrt {\log n}) where m is the number of non-zero entries of the matrix and n is the number of variables.

He also worked on matrix approximation via subsampling, on algorithms that approximate random walk properties, on algorithms for flow and shortest paths, and on geometric algorithms.

My favorite result is his single-author paper giving a polynomial time construction of bipartite Ramanujan graphs of all degree and all sizes, making the approach of Marcus, Spielman and Srivastava constructive.

Michael was a unique person, who gave a lot to our community and had touched several lives. His loss is an unspeakable tragedy that I still find very hard to process.

Coincidence?

“Art imitates life, but life imitates bad TV” (Woody Allen)

The mention for a major alumni award given by U.C. Berkeley is for excellence in achievement.

Meanwhile, in the episode “Brother, can you spare two dimes?”, Mr. Burns has to come up on the spot with the name for a fake awards, and he comes up with an award for outstanding achievement in the field of excellence.

 

(You’ll note that the dancers in the video are wearing gold and blue)

Louis CK on the 2016 presidential campaign

From an interview for New York Magazine:

It’s like if you were on a plane and you wanted to choose a pilot. You have one person, Hillary, who says, “Here’s my license. Here’s all the thousands of flights that I’ve flown. Here’s planes I’ve flown in really difficult situations. I’ve had some good flights and some bad flights, but I’ve been flying for a very long time, and I know exactly how this plane works.” Then you’ve got Bernie, who says, “Everyone should get a ride right to their house with this plane.” “Well, how are you going to do that?” “I just think we should. It’s only fair that everyone gets to use the plane equally.” And then Trump says, “I’m going to fly so well. You’re not going to believe how good I’m going to fly this plane, and by the way, Hillary never flew a plane in her life.” “She did, and we have pictures.” “No, she never did it.”

End-of-year traditions

Having spent some time in Japan, I have learnt of the tradition of holding a Bōnenkai, literally a party to forget the year. Held either as a company end-of-year party, or by groups of friends, it’s a get-together in which people drink a lot and forget the bad things that happened to them during the year.

It occurred to me that this is the complement of Thanksgiving, in which you get together to remember the good things that happened during the year.

I don’t think there is anything else left to say about the difference between Japanese and American culture.

Interestingly, there are a couple more possibilities. One could remember the bad things that happened during the year, as in the airing of grievances during Festivus.

Finally, one could forget the good things, which is very much the Italian attitude.

Edited to add: I don’t know how I forgot (ah!) but there is a famous Neapolitan folk song that goes

Chi ha avuto, ha avuto, ha avuto
Chi ha dato, ha dato, ha dato,
Scurdammuce ‘o passato,
simm’e Napule, paisa’

which is roughly

Who has received, has received
Who has given, has given,
Let’s forget the past
We are [all] from Naples

Which one would you get?

It is time to get a new laptop, and I would like it to be as light as possible subject to having a full-size keyboard and a not-too-small (at least 12”) display. So it is down to the MacBook Air versus the Lenovo X300.

I have been planning to move to OS X, and I appreciate superior design, so the MacBook is the default choice, but I also like to be able to connect a computer to other devices, which leads to the problem described in the video.