Beyond Worst-Case Analysis: Lecture 11

Scribed by Neng Huang

In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.

1. A Brief Review of the Model

First, let’s briefly review the model. We have a random graph {G = (V, E)} with an unknown partition of the vertices into two equal parts {V_1} and {V_2}. Edges across the partition are generated independently with probability {q}, and edges inside the partition are generated independently with probability {p}. To abbreviate the notation, we let {a = pn/2}, which is the average internal degree, and {b = qn/2}, which is the average external degree. Intuitively, the closer are {a} and {b}, the more difficult it is to reconstruct the partition. We assume {a > b}, although there are also similar results in the complementary model where {b} is larger than {a}. We also assume {b > 1} so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

  1. For every {\epsilon > 0}, there exists a constant {c_\epsilon} such that if {a - b > c_\epsilon\sqrt{a+b}}, then we can reconstruct the partition up to less than {\epsilon n} misclassified vertices.
  2. There exists a constant {c} such that if {a-b \geq c \sqrt{\log n}\sqrt{a+b}}, then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every {\epsilon > 0}, there also exists a constant {c_\epsilon'} such that if {a-b < c_\epsilon'\sqrt{a+b}}, then it will be impossible to reconstruct the partition even if an {\epsilon} fraction of misclassified vertices is allowed. Also, the constant {c_\epsilon} will go to infinity as {\epsilon} goes to 0, so if we want more and more accuracy, {a-b} needs to be a bigger and bigger constant times {\sqrt{a+b}}. When the constant becomes {O(\sqrt{\log n})}, we will get an exact reconstruction as stated in the second result.

Continue reading

Advertisements

Beyond Worst-Case Analysis: Lecture 10

Scribed by David Dinh

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

Continue reading

Beyond Worst-Case Analysis: Lecture 9

Scribed by Chinmay Nirkhe

In which we explore the Stochastic Block Model.

1. The {G_{n,p,q}} problem

The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the {G_{n,p,q}} problem.

Definition 1 ({G_{n,p,q}} graph distribution) The {G_{n,p,q}} distribution is a distribution on graphs of {n} vertices where {V} is partitioned into two 2 subsets of equal size: {V = V_1 \sqcup V_2}. Then for {\{i,j\}} pair of vertices in the same subset, {\Pr( (i,j) \in E) = p} and otherwise {\Pr( (i,j) \in E) = q}.

We will only consider the regime under which {p > q}. If we want to find the partition {V = V_1 \sqcup V_2}, it is intuitive to look at the problem of finding the minimum balanced cut. The cut {(V_1, V_2)} has expected size {q n^2 / 4} and any other cut will have greater expected size.

Our intuition should be that as {p \rightarrow q}, the problem only gets harder. And for fixed ratio {p/q}, as {p,q \rightarrow 1}, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for {p,q} then we can also solve it for {cp, cq} where {c > 1}, by keeping only {1/c} edges and reducing to the case we can solve.

Recall that for the {k}-planted clique problem, we found the eigenvector {{\bf x}} corresponding to the largest eigenvalue of {A - \frac{1}{2} J}. We then defined {S} as the vertices {i} with the {k} largest values of {|x_i|} and cleaned up {S} a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of {A - \left( \frac{p + q}{2} \right) J}. Note this is intuitive as the average degree of the graph is {p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}. The idea is simple: Solve {{\bf x}} the largest eigenvector corresponding to the largest eigenvalue and define

\displaystyle  V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)

Continue reading

Beyond Worst-Case Analysis: Lecture 8

Scribed by Luowen Qian

In which we use spectral techniques to find certificates of unsatisfiability for random {k}-SAT formulas.

1. Introduction

Given a random {k}-SAT formula with {m} clauses and {n} variables, we want to find a certificate of unsatisfiability of such formula within polynomial time. Here we consider {k} as fixed, usually equal to 3 or 4. For fixed {n}, the more clauses you have, the more constraints you have, so it becomes easier to show that these constraints are inconsistent. For example, for 3-SAT,

  1. In the previous lecture, we have shown that if {m > c_3 \cdot n} for some large constant {c_3}, almost surely the formula is not satisfiable. But it’s conjectured that there is no polynomial time, or even subexponential time algorithms that can find the certificate of unsatisfiability for {m = O(n)}.
  2. If {m > c \cdot n^2} for some other constant {c}, we’ve shown in the last time that we can find a certificate within polynomial time with high probability that the formula is not satisfiable.

    The algorithm for finding such certificate is shown below.

    • Algorithm 3SAT-refute({f})
    • for {b_1 \in \{0,1\}}
      • if 2SAT-satisfiable({f} restricted to clauses that contains {x_1= \overline b_1}, with {x:= \overline b_1})
        • return {\bot}
    • return UNSATISFIABLE

    We know that we can solve 2-SATs in linear time, and approximately

    \displaystyle \frac{\binom{n - 1} 2 \cdot m}{\binom n 3 \cdot 2} = \frac{3m}{2n + O(1)} > \frac 3 2 cn - O(1)

    clauses contains {x_1 = \overline{b_1}}. Similarly when {c} is sufficiently large, the 2-SATs will almost surely be unsatisfiable. When a subset of the clauses is not satisfiable, the whole 3-SAT formula is not satisfiable. Therefore we can certify unsatisfiability for 3-SATs with high probability.

In general for {k}-SAT,

  1. If {m > c_k \cdot n} for some large constant {c_k}, almost surely the formula is not satisfiable.
  2. If {m > c'_k \cdot n^{k - 1}} for some other constant {c'_k}, we can construct a very similar algorithm, in which we check all assignments to the first {k-2} variables, and see if the 2SAT part of the restricted formula is unsatisfiable.

    Since for every fixed assignments to the first {k - 2} variables, approximately

    \displaystyle \frac{\binom{n - k + 2} 2}{\binom n k 2^{k - 2}} = \frac{k!}{(n^{k - 2} + O(n^{k - 3})) 2^{k - 1}}

    portion of the {m} clauses remains, we expect the constant {c'_k = \Omega\left(\frac{2^k}{k!}\right)} and the running time is {O(2^k m)}.

So what about {m}‘s that are in between? It turns out that we can do better with spectral techniques. And the reason that spectral techniques work better is that unlike the previous method, it does not try all the possible assignments and fails to find a certificate of unsatisfiability.

2. Reduce certifying unsatisfiability for k-SAT to finding largest independent set

2.1. From 3-SAT instances to hypergraphs

Given a random 3-SAT formula {f}, which is an and of {m} random 3-CNF-SAT clauses over {n} variables {x_1, x_2, ..., x_n} (abbreviated as vector {{\bf x}}), i.e.

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3}} and no two {(\sigma_{i,1}, b_{i,1}, \sigma_{i,2}, b_{i,2}, \sigma_{i,3}, b_{i,3})} are exactly the same. Construct hypergraph {H_f = (X, E)}, where

\displaystyle X = \left\{(i, b) \middle| i \in [n], b \in \{0, 1\}\right\}

is a set of {2n} vertices, where each vertex means an assignment to a variable, and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline{b_{j,1}}), (\sigma_{j,2}, \overline{b_{j,2}}), (\sigma_{j,3}, \overline{b_{j,3}})\}

is a set of {m} 3-hyperedges. The reason we’re putting in the negation of {b} is that a 3-CNF clause evaluates to false if and only if all three subclauses evaluate to false. This will be useful shortly after.

First let’s generalize the notion of independent set for hypergraphs.

An independent set for hypergraph {H = (X, E)} is a set {S \subseteq X} that satisfies {\forall e \in E, e \not \subseteq S}.

If {f} is satisfiable, {H_f} has an independent set of size at least {n}. Equivalently if the largest independent set of {H_f} has size less than {n}, {f} is unsatisfiable. Proof: Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i) | i \in [n] \}} is an independent set of size {n}. If not, it means some hyperedge {e_j \subseteq S}, so {\sigma_{j,1} = \overline{b_{j,1}} \land \sigma_{j,2} = \overline{b_{j,2}} \land \sigma_{j,3} = \overline{b_{j,3}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

We know that if we pick a random graph that’s sufficiently dense, i.e. the average degree {d > \ln n}, by spectral techniques we will have a certifiable upper bound on the size of the largest independent set of {O\left(\frac n{\sqrt d}\right)} with high probability. So if a random graph has {\Omega(n \log n)} random edges, we can prove that there’s no large independent set with high probability.

But if we have a random hypergraph with {\Omega(n \log n)} random hyperedges, we don’t have any analog of spectral theories for hypergraphs that allow us to do this kind of certification. And from the fact that the problem of certifying unsatisfiability of random formula of {\Omega(n \log n)} clauses is considered to be hard, we conjecture that there doesn’t exist a spectral theory for hypergraphs able to replicate some of the things we are able to do on graphs.

However, what we can do is possibly with some loss, to reduce the hypergraph to a graph, where we can apply spectral techniques.

2.2. From 4-SAT instances to graphs

Now let’s look at random 4-SATs. Similarly we will write a random 4-SAT formula {f} as:

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \lor x_{\sigma_{i,4}} = b_{i,4} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3} < \sigma_{i,4}} and no two {(\sigma_{i,1}, b_{i,1}, ..., \sigma_{i,4}, b_{i,4})} are exactly the same. Similar to the previous construction, but instead of constructing another hypergraph, we will construct just a graph {G_f = (V, E)}, where

\displaystyle V = \left\{(i_1, b_1, i_2, b_2) \middle| i_1, i_2 \in [n], b_1, b_2 \in \{0, 1\}\right\}

is a set of {4n^2} vertices and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline {b_{j,1}}, \sigma_{j,2}, \overline {b_{j,2}}), (\sigma_{j,3}, \overline {b_{j,3}}, \sigma_{j,4}, \overline {b_{j,4}})\}

is a set of {m} edges.

If {f} is satisfiable, {G_f} has an independent set of size at least {n^2}. Equivalently if the largest independent set of {H_f} has size less than {n^2}, {f} is unsatisfiable. Proof: The proof is very similar to the previous one. Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i, x_j, y_j) | i, j \in [n] \}} is an independent set of size {n^2}. If not, it means some edge {e_j \subseteq S}, so {\sigma_{j,1} = \overline {b_{j,1}} \land \sigma_{j,2} = \overline {b_{j,2}} \land \sigma_{j,3} = \overline {b_{j,3}} \land \sigma_{j,4} = \overline {b_{j,4}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

From here, we can observe that {G_f} is not a random graph because some edges are forbidden, for example when the two vertices of the edge has some element in common. But it’s very close to a random graph. In fact, we can apply the same spectral techniques to get a certifiable upper bound on the size of the largest independent set if the average degree {d > \ln n}, i.e. if {m = \Omega(n^2 \log n)}, we can certify unsatisfiability with high probability, by upper bounding the size of the largest independent set in the constructed graph.

We can generalize this results for all even {k}‘s. For random {k}-SAT where {k} is even, if {m > c_k n^{k/2} \log n}, we can certify unsatisfiability with high probability, which is better than the previous method which requires {m = \Omega(n^{k - 1})}. The same {n^{k/2}(\log n)^{O(1)}} is achievable for odd {k}, but the argument is significantly more complicated.

2.3. Certifiable upper bound for independent sets in modified random sparse graphs

Despite odd {k}‘s, another question is that in this setup, can we do better and get rid of the {\log n} term? This term is coming from the fact that spectral norm break down when the average degree {d < \ln n}. However it’s still true that random graph doesn’t have any large independent sets even when the average degree {d} is constant. It’s just that the spectral norm isn’t giving us good bounds any more, since the spectral norm is at most {O\left(\sqrt{\max d}\right) = O\left(\sqrt \frac{\log n}{\log \log n}\right)}. So is there something tighter than spectral bounds that could help us get rid of the {\log n} term? Could we fix this by removing all the high degree vertices in the random graph?

This construction is due to Feige-Ofek. Given random graph {G \sim G_{n, p}}, where the average degree {d = np} is some large constant. Construct {G'} by taking {G} and removing all edges incident on nodes with degree higher than {2\bar d} where {\bar d} is the average degree of {G}. We denote {A} for the adjacency matrix of {G} and {A'} for that of {G'}. And it turns out,

With high probability, {\left\lVert A' - \frac d n J \right\rVert \le O\left(\sqrt d\right)}.

It turns out to be rather difficult to prove. Previously we saw spectral results on random graphs that uses matrix traces to bound the largest eigenvalue. In this case, it’s hard to do so because the contribution to the trace of a closed walk is complicated by the fact that edges have dependencies. The other approach is that given random matrix {M}, we will try to upper bound {\left\lVert M \right\rVert = \max\limits_x \frac {x^T M x} {\lVert x \rVert^2}}. A standard way for this is to that for every solution, count the instances of {M} in which the fixed solution is good, and argue that the number of the fixed solutions is small, which tells us that there’s no good solution. The problem here is that the set of solutions is infinitely large. So Feige-Ofek discretize the set of vectors, and then reduce the bound on the quadratic form of a discretized vector to a sum of several terms, each of which has to be carefully bounded.

We always have

\displaystyle  \max \textrm{IndSet}(G) \le \max \textrm{IndSet}(G') \le \frac n d \left\lVert A' - \frac d n J \right\rVert

and so, with high probability, we get an {O\left(\frac n {\sqrt d}\right)} polynomial time upper bound certificate to the size of the independent set for a {G_{n,d/n}} random graph. This removes the extra {\log n} term from our analysis of certificates of unsatisfiability for random {k}-SAT when {k} is even.

3. SDP relaxation of independent sets in random sparse graphs

In order to show a random graph has no large independent sets, a more principled way is to argue that there is some polynomial time solvable relaxation of the problem whose solution is an upper bound of the problem.

Let SDPIndSet{(G)} be the optimum of the following semidefinite programming relaxation of the Independent Set problem, which is due to Lovász:

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \langle {\bf x}_0, {\bf x}_i \rangle = ||{\bf x}_i ||^2 \ \ \ \forall i\in V\\ && \langle {\bf x}_i, {\bf x}_j \rangle = 0 \ \ \ \forall (i,j)\in E \end{array}

Since it’s the relaxation of the problem of finding the maximum independent set, {\max \textrm{IndSet}(G) \le \textrm{SDPIndSet}(G)} for any graph {G}. And this relaxation has a nice property.

For every {0 < p < 1}, and for every graph {G}, we have \begin{equation*} {\rm SDPIndSet}(G) \leq \frac 1p \cdot || pJ – A || \end{equation*} where {J} is the all-one matrix and {A} is the adjacency matrix of {G}.

Proof: First we note that SDPIndSet{(G)} is at most

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

and this is equal to

\displaystyle  \begin{array}{rcl}  \max && \frac { \left( \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle \right) ^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

because

\displaystyle  \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle = \left\langle \sum_{i\in V} {\bf x}_i , {\bf x}_0\right \rangle \leq \left \| \sum_{i\in V} {\bf x}_i \right \| \cdot || {\bf x}_0 || = \left \| \sum_{i\in V} {\bf x}_i \right \|

Finally, the above optimization is equivalent to the following

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most the unconstrained problem

\displaystyle \begin{aligned} \max \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} &= \max \frac { \sum_{i,j} \left( J - \frac 1p A\right)_{i,j} \langle {\bf x}_i,{\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} \\ &= \lambda_{\max} \left (J - \frac 1p A \right) \\ &\leq \frac 1p || pJ - A||. \end{aligned}

\Box

Recall from the previous section that we constructed {G'} by removing edges from {G}, which corresponds to removing constraints in our semidefinite programming problem, so {\textrm{SDPIndSet}(G) \le \textrm{SDPIndSet}(G') \le \left\lVert J - \frac 1 p A' \right\rVert}, which is by theorem 3 at most {O\left(\frac n{\sqrt d}\right)} with high probability.

4. SDP relaxation of random k-SAT

From the previous section, we get an idea that we can use semidefinite programming to relax the problem directly and find a certificate of unsatisfiability for the relaxed problem.

Given a random {k}-SAT formula {f}:

\displaystyle  \begin{array}{rcl}  f({\bf x}) &= & \bigwedge\limits_{i = 1}^m \bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\overline{\bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j}}} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\bigwedge\limits_{j = 1}^k x_{\sigma_{i,j}} = \overline{b_{i,j}}}. \end{array}

The satisfiability of {f} is equivalent of the satisfiability of the following equations:

\displaystyle  \begin{array}{rcl}  && x_i^2 = x_i \forall i \in [n] \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}x_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \end{array}

Notice that if we expand the polynomial on the left side, there are some of the monomials having degree higher than 2 which prevents us relaxing these equations to a semidefinite programming problem. In order to resolve this, {\forall A \subseteq {\bf x}} and {|A| \le k/2} we introduce {x_A = \prod_{i \in A} x_i}. Then we can relax all variables to be vectors, i.e.

\displaystyle  \begin{array}{rcl}  && \lVert {\bf x}_\emptyset \rVert^2 = 1 \\ && \langle {\bf x}_A, {\bf x}_B \rangle = \langle {\bf x}_C, {\bf x}_D \rangle \ \ \ \forall A \cup B = C \cup D \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}{\bf x}_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \ \ \ \textrm{rewritten as quadratic forms of } {\bf x}_A \end{array}

For example, if we have a 4-SAT clause

\displaystyle  x_3 \lor \overline{x_4} \lor x_7 \lor \overline{x_{10}},

we can rewrite it as

\displaystyle  \begin{array}{rcl}  1 - (1 - {\bf x}_3) \cdot {\bf x}_4 \cdot (1 - {\bf x}_7) \cdot {\bf x}_{10} &= &1 - {\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_7{\bf x}_{10} - {\bf x}_3{\bf x}_4{\bf x}_7{\bf x}_{10} \\ &= &1 - {\bf x}_{\{4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,7\}}{\bf x}_{\{10\}} - {\bf x}_{\{3,4\}}{\bf x}_{\{7,10\}}. \end{array}

For this relaxation, we have:

  1. If {m < c(k, n) n^{k/2}}, the SDP associated with the formula is feasible with high probability, where {c(k, n) = 1/n^{o(1)}} for every fixed {k}.
  2. If {m > c'(k) n^{k/2}}, the SDP associated with the formula is not feasible with high probability, where {c'(k, n)} is a constant for every fixed even {k}, and {c'(k, n) = \textrm{poly}(\log n)} for every fixed odd {k}.

Beyond Worst-Case Analysis: Lecture 7

Scribed by Jeff Xu

In which we discussed planted clique distribution, specifically, we talked about how to find a planted clique in a random graph. We heavily relied upon our material back in lecture 2 and lecture 3 in which we covered the upper bound certificate for max clique in {G_{n,\frac{1}{2}}}. At the end of this class, we wrapped up this topic and started the topic of {k}-SAT.

1. Planted Clique

To start with, we describe a distribution of graphs with a planted clique. Suppose that we sample {G} from {G_{n,\frac{1}{2}}} and we want to modify {G} s.t. it has a size {k} clique, i.e., we have a clique {S \subseteq V} with {\left|S\right|=k}. The following code describes a sampler for the distribution.

  • {G \leftarrow G_{n,\frac{1}{2}}}
  • Pick a subset of vertices {S} from {V} s.t. {|S|=k}
  • Independently for each pair {\{u,v\}}, make {\{u,v\}} an edge with probability
    • {1} if {u,v \in S}
    • {\frac 12} otherwise

Note: We are only interested in the case {k \geq 2\log n}, which is the case in which the planted clique is, with high probability, larger than any pre-existing clique

Continue reading

Beyond Worst Case Analysis: Lecture 5

Scribed by Haaris Khan

In which we study the SDP relaxation of Max Cut in random graphs.

1. Quick Review of Chernoff Bounds

Suppose {X_1, ..., X_n} are mutually independent random variables with values {{0, 1}}. \newline Let {X := \sum_{i=1}^{n} X_i}. The Chernoff Bounds claim the following: \newline

1. { \textrm{ } \forall \textrm{ } \epsilon \textrm{ such that } 0 \leq \epsilon \leq 1, }

\displaystyle  \mathop{\mathbb P}(\vert X - \mathop{\mathbb E}[X] \vert) > \epsilon \cdot \mathop{\mathbb E}[X]) \leq \exp(\Omega(\epsilon^2 \cdot \mathop{\mathbb E}[X]))

2. { \forall \textrm{ } t \textrm{ } > 1, }

\displaystyle  \mathop{\mathbb P} (\vert X - \mathop{\mathbb E}[X] \vert \geq t \cdot \mathop{\mathbb E}[X]) \leq \exp(-\Omega((t\log(t)) \cdot \mathop{\mathbb E}[X]))

3. When we do not know {\mathop{\mathbb E}[X]}, we can bound as follows:

\displaystyle  \mathop{\mathbb P}(\vert X - \mathop{\mathbb E}[X] \vert \geq \epsilon \cdot n) \leq \exp(- \Omega(\epsilon^2 \cdot n))

2. Cutting a Near-Optimal Number of Edges in {G_{n,p}} Via SDP Rounding

Consider {G_{n, p}} where {p > \frac{\log(n)}{n}}. We show that with {1 - o(1)} probability, the max-degree will be {O(d)}

  • Fix v
  • For some constant c,

    \displaystyle \mathop{\mathbb P}(\textrm{v has degree} > c \cdot d) = \mathop{\mathbb P}(\vert deg(v) - \mathop{\mathbb E}[v] \vert > (c - 1) \mathop{\mathbb E}[deg(v)])

    \displaystyle \leq \exp(- \Omega((c - 1)\log(c - 1) \cdot d)) \textrm{ (by Chernoff Bounds)}

    \displaystyle  \leq \exp(-\Omega((c - 1)\log(c - 1) \cdot \log(n))

    \displaystyle  \leq \frac{1}{n^2}, \textrm{ for some choice of constant c}

So {\mathop{\mathbb P}(\exists \text{ } v \textrm{ with degree } > c \cdot d) \leq n \cdot \frac{1}{n^2} \leq \frac{1}{n}}

Next, we compute the number of vertices that participate in a triangle. Recall that degree {d} can be bounded by {o(n^{\frac{1}{3}})}

\displaystyle \mathop{\mathbb E}[\textrm{number vertices in triangles}] = n \cdot \mathop{\mathbb P}(\textrm{v participates in a triangle})

If a vertex participates in a triangle, there are {\binom{n - 1}{2}} ways of choosing the other two vertices that participate with v in the triangle. \newline So the expected number of vertices in triangles can be bounded by

\displaystyle  \mathop{\mathbb E}[\textrm{number vertices in triangles}] \leq n \cdot p^3 \cdot \binom{n - 1}{2}

\displaystyle  \leq n^3 \cdot p^3

\displaystyle  = o(n) \textrm{ if } p = o \left(\frac{1}{n^{\frac{2}{3}}}\right), \textrm{ } d = o(n^{\frac{1}{3}})

So with {o(1)} probability,

  • All vertices have degree {O(d)}
  • {o(n)} vertices participate in triangles.

3. Eigenvalue Computations and SDP

Problems like finding the largest / smallest eigenvalue can be solved using SDP

Let {M} be symmetric, {\lambda_{\max}} be the largest eigenvalue of M: {\lambda_{\max} = \max_x \frac{\boldsymbol{x}^T M \boldsymbol{x}}{\|\boldsymbol{x} \|^2}} We can formulate this as Quadratic Programming:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \sum_{i, j} M_{i, j} x_i y_j{\rm s.t.} \\ && \sum_i x_i^2 = 1 \\ \end{array}

We showed previously that we can relax a Quadratic Program to SDP:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \sum_{i, j} M_{i, j} \langle \boldsymbol{x_i}, \boldsymbol{x_j} \rangle{\rm s.t.} \\ && \sum_i \|\boldsymbol{x_i}\|^2 = 1 \\ \end{array}

In fact, it happens that these two are equivalent. To show this, we must show that a vector solution {x} of SDP can hold as a solution to the QP and vice versa.

Proving {x} for QP is valid for SDP: Trivial. Any solution {x} to our Quadratic Program must be a solution for our SDP since it is a relaxation of the problem; then the optimum of our QP must be less than or equal to the optimum of our SDP

Proving {x} for SDP is valid for QP: Consider {x := \text{vector solution of cost c}}. We note that our SDP can be transformed into an unconstrained optimization problem as follows:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \frac{\sum_{i, j} M_{i, j} \langle \boldsymbol{x_i}, \boldsymbol{x_j} \rangle}{\sum_i \|\boldsymbol{x_i}\|^2} \end{array}

The cost c can be defined as the value of our solution:

\displaystyle  c = \frac{\sum_{i, j} M_{i, j} \sum_k \boldsymbol{x_k}^i \boldsymbol{x_k}^j}{\sum_i \sum_k \|\boldsymbol{x_k}^i|^2}

\displaystyle  \leq \max_k \frac{\sum_{i, j} M_{i, j} \boldsymbol{x_k}^i \boldsymbol{x_k}^j}{\sum_i \|\boldsymbol{x_k}^i\|^2}

We get a one-dimensional solution when we use the {k^{th}} element of {x}, and wish to find the {k} that maximizes this.

We use the following inequality:

\displaystyle \frac{a_1 + ... + a_m}{b_1 + ... + b_m} \leq \max_{k = 1, ..., m} \frac{a_k}{b_k}, b_k > 0

Proof:

\displaystyle \sum_i a_i = \sum_i b_i \cdot \frac{a_i}{b_i} \leq \sum_i b_i \cdot \max_k \frac{a_k}{b_k}

\displaystyle  = \max_k \text{ } \frac{a_k}{b_k} \cdot \sum_i b_i

4. SDP Max-Cut: Spectral Norm as a SDP Certificate

Consider the SDP relaxation of Max-Cut on Graph {G}:

\displaystyle  \begin{array}{rcl}  \max & & \sum_{(i, j) \in E} \frac{1}{4} \|\boldsymbol{X_i} - \boldsymbol{X_j}\|^2 \\ {\rm s.t.} \\ && \forall v \in V, \|\boldsymbol{X_v}^2\| = 1 \\ \end{array}

Let the optimum value for this SDP be {SDPMaxCut(G)}. It’s obvious that {MaxCut(G) \leq SDPMaxCut(G)}. Under our constraints, we can rewrite our SDP as

\displaystyle  \sum_{(i, j) \in E} \frac{1}{2} - \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle

So our new optimization problem is

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} - \sum_{(i, j) \in E} \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle \\ {\rm s.t.} \\ && \forall i \in V, \| \boldsymbol{X_i} \|^2 = 1 \\ \end{array}

We can relax our constraint to the following: {\forall i \in V, \sum_i \| \boldsymbol{X_i} \|^2 = n}. Relaxing our constraint will yield an optimization problem with a solution less than the stricter constraint (call this {SDP'MaxCut(G)}):

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} - \sum_{(i, j) \in E} \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle \\ {\rm s.t.} \\ && \sum_v \|\boldsymbol{X_v}\|^2 = n \\ \end{array}

Clearly, we have the following inequalities: { MaxCut(G) \leq SDPMaxCut(G) \leq SDP'MaxCut(G) }. We can rewrite {SDP'MaxCut(G)} as

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \sum_{i, j} \frac{-A_{i, j} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle}{\sum_i \|\boldsymbol{X_i}\|^2} \\ && \sum_v \|\boldsymbol{X_v}\|^2 = n \\ \end{array}

Note that our objective function computes the largest eigenvalue of {-A}:

\displaystyle  = \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(-A)

For every graph {G_{n, p}} with {0 \leq p \leq 1},

\displaystyle  MaxCut(G) \leq SDPMaxCut(G) \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(-A)

\displaystyle  \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(pJ - A)

\displaystyle  \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \|pJ - A\|

Recall from previous lectures that for {p > \frac{\log(n)}{n}}, the adjacency matrix of {A} sampled from {G_{n, p}} has {\|pJ - A\| \leq O(\sqrt{np})} with high probability. This implies that {SDPMaxCut(G) \leq \frac{\vert E \vert}{2} + O(n \cdot \sqrt{d})}. Semantically, this means that { SDPMaxCut(G) } computes in poly-time a correct upper-bound of {MaxCut(G)}.

5. Trace and Eigenvalues

Suppose matrix {M} is symmetric with eigenvalues {\lambda_1 \hdots \lambda_n}. The following are true:

  • {M^k} eigenvalues are {\lambda_1^k \hdots \lambda_n^k}
  • {trace(M) := \sum_{i, i} M_{i, i} } ; {trace(M) = \sum_i \lambda_i}

Then, for {M^{2k}, trace(M^{2k}) = \lambda_1^{2k} + \hdots + \lambda_n^{2k}}.

\displaystyle  (\max_i \vert \lambda_i \vert)^{2k} \leq trace(M^{2k}) \leq n \cdot (\max_i \vert \lambda_i \vert)^{2k}

Also,

\displaystyle  \|M\| \leq (trace(M^{2k})^{\frac{1}{2k}} \leq n^{\frac{1}{2k}} \cdot \|M\|

{A_{i, j}} is defined as the number of expected paths from {i} to {j} that take {k} steps (not necessarily simple paths in a graph)

{ = \sum_{\text{paths from i to j}} M_{i, a_1} \hdots M_{a_n, j}}

Our goal with this is to compute the eigenvalues {\lambda}. Since traces relates the sum of the diagonal and the sum of eigenvalues for symmetric {M}, we can use this to provide an upper bound for symmetric {M}.

Beyond Worst-Case Analysis: Lecture 6

Scribed by Theo McKenzie

In which we study the spectrum of random graphs.

1. Overview

When attempting to find in polynomial time an upper bound certificate on the max cut and maximum independent set of a graph, we have used the following property. If {G\sim G_{n,\frac{1}{2}}}, then with high probability {\|A-\mathop{\mathbb E} (A)\|\leq O(\sqrt{n})}, where {\|\cdot\|} is the spectral norm. Generally, if {G\sim G_{n,p}} and {p>\frac{\log n}{n}} then w.h.p.

\displaystyle \|A-\mathop{\mathbb E} (A)\|\leq O(\sqrt{np}).

Today we will prove how to obtain the bound in Proposition 1 with an extra term of {\log n}, as well as show an outline of the method of finding the bound in Proposition 1. We will also show how when {p} is small this bound breaks down, namely how when {p=\Theta(\frac 1 n)},

\displaystyle \|A-\mathop{\mathbb E} (A)\|\geq\Omega \left(\sqrt{\frac{\log n}{\log\log n}} \right).

2. Introducing the Trace

Henceforth {M_{ij}^k} signifies {(M^k)_{ij}}. Take {M} symmetric and real. All eigenvalues of this matrix are real, and we can enumerate them {\lambda_1,\lambda_2,\ldots,\lambda_n} such that {|\lambda_1|\geq|\lambda_2|\geq\ldots\geq |\lambda_n|}.

The trace {\textnormal{Tr}(A)} is defined to be {\textnormal{Tr}(A)=\sum_{i=1}^n A_{ii}} where {A} is an {n\times n} matrix.

Moreover we know that \textnormal{Tr}{(A)=\sum_{1=1}^n \lambda_i}. If we take {k} large and even, the eigenvalues of {M^k} are {|\lambda_1|^k\geq|\lambda_2|^k\geq\ldots\geq |\lambda_n|^k}. Therefore we have

\displaystyle  \sum_i M^k_{ii}=\text{Tr}(M^k)=\sum_i |\lambda_i|^k\geq |\lambda_1|^k=\|M\|^k.

Moreover we have

\displaystyle \textnormal{Tr}(M^k)^{\frac{1}{k}}=\left(\sum_i |\lambda_i|^k\right)^\frac 1 k\leq n^{\frac{1}{k}}|\lambda_1|= n^{\frac{1}{k}}\|M\|.

This gives us an estimation of the norm, {\|M\|\leq (\sum_i M^k_{ii})^\frac{1}{k}\leq n^{\frac1 k}\|M\|}, which for {k>\log n} gives a constant factor approximation of {\|M\|}.

3. Using the Trace to Bound the Spectral Norm

Assume that {G\sim G_{n,\frac{1}{2}}} and {A} is the adjacency matrix of {G}. We will prove the following. {\displaystyle{\mathop{\mathbb E}_{G\sim G_{n,\frac{1}{2}}}}(\textnormal{Tr}((A-\mathop{\mathbb E} (A))^k))} is bounded above by {2^{O(k)}n^{1+k/2}k^{k/2}}. If {k>\log n}, by taking the {k}th root we achieve a bound of {O(\sqrt{n\log n})} on {\|A-\mathop{\mathbb E} (A)\|}.

3.1. Expected Value of Matrix Entries

First, we examine the matrix {M=2A-2\mathop{\mathbb E} (A)}. We have {M_{ii}=0} and {M_{ij}\in\{\pm1\}} with equal probability of each when {i\neq j}. Moreover {M_{ij}=M_{ji}}. If {i\neq j,\mathop{\mathbb E}( M_{ij}^k)=0} if {k} is odd and {\mathop{\mathbb E} (M_{ij}^k)=1} for {k} even.

{\mathop{\mathbb E} (\sum_i M^k_{ii})=n\mathop{\mathbb E} (M^k_{11})} by the linearity of expectation and symmetry between the entries. We evalute {M^k_{11}}.

\displaystyle M^k_{11}=\sum_{\{i_1,\ldots,i_{k-1}\}} M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1,1}}

where {{i_1,\ldots i_{k-1}}} represents the intermediate steps on a “path” between vertices that starts at 1 and returns to 1. For example, {M_{11}^2=\sum M_{1i}M_{i1}}. Note that we can repeat edges in these paths. By the linearity of expectation

\displaystyle  \mathop{\mathbb E} (M_{11}^k)=\sum_{\{i_1,\ldots,i_{k-1}\}}\mathop{\mathbb E} (M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1,1}}).

If any pair {\{i,j\}} occurs {\ell} times in the sequence of pairs {\{ 1,i_1\}, \{ i_1, i_2 \}, \ldots, \{ i_{k-1}, 1 \}}, where {\ell} is odd, then as the value of this term is independent from all other terms and {\mathop{\mathbb E} M^\ell_{ij}=0} for odd {\ell}, then {\mathop{\mathbb E} (M_{1i_1}M_{i_1i_2}\cdots M_{i_{k-1}1})=0}. If all pairs occur an even number of times, their product’s expectation is 1. Therefore {\mathop{\mathbb E} (M_{11}^k)} is the number of sequences {i_1,\ldots,i_{k-1}\in V^{k-1}} such that, in the sequence of pairs {\{ 1,i_1\}, \{ i_1, i_2 \}, \ldots, \{ i_{k-1}, 1 \}}, each pair occurs an even number of times.

3.2. Encoding argument

In order to give an upper bound on the number of such sequences, we will show how to encode a sequence where there are {m} distinct edges. In the sequence {i_1,\ldots i_{k-1}}, the element {i_j} is represented either as {(0,i_j)}, which takes {1 + \log n} bits, if {i_ji} appears for the first time in the sequence at location {j}, and as {(0,\ell)} otherwise, where {\ell < j} is such that {i_\ell = i_j}, which requires {1 + \log k} bits. Notice that, if {i_j} occurs for the first time at location {j}, then the pair {\{ i_{j-1}, i_j \}} also occurs for the first time at the location {j-1} and {j}. Thus the number of times that we encounter a vertex for the first time is at most the number of distinct edges. If we have {t} distinct vertices (other than vertex 1), then we are using {k + t \log n + (k-t) \log k}; for {k<n}, this value increases with {t}, but we have {t \leq m \leq k/2} (because every edge has to appear an even number of times and so there can be at most {k/2} distinct edges. This means that we use at most {k + \frac k2 \log n + \frac k2 \log k} bits in the encoding. The number of strings that can be encoded using at most {L} bits is {2^{L+1} }. If we assume {k<n}, we have the bound {\mathop{\mathbb E}(M_{11}^k)\leq k^\frac{k}{2}n^{\frac{k}{2}}2^{k+1}}, meaning

\displaystyle \textnormal{Tr}(M)=n\mathop{\mathbb E}( M_{11}^k)\leq n^{1+\frac{k}{2}}2^{k+1} k^\frac{k}{2}.

Therefore using suitable {k} and {t} we achieve our bound on {\|M\|}. For example, choose {k=\log n} and {t=10\sqrt{n}\sqrt{\log n}}. We use Markov’s inequality to obtain

\displaystyle  \mathbf{P}(\|M\|>t)=\mathbf{P}(\|M\|^k>t^k)\leq \frac{\mathop{\mathbb E}\|M\|^k}{t^k}\leq\left(\frac{2n^{\frac{1}{k}}\sqrt n \sqrt k }{t}\right)^k\leq e^{-\Omega(\log n)}\rightarrow 0.

4. Tightening the Bound

To obtain the sharper bound of {O(\sqrt n)}, we need to count the number of pairs more sharply and remove the {k^{\frac{k}{2}}} term, namely improve the way we talk about repetitions. Here we give an outline for how to find a tighter bound.

The worst case in the above analysis is when the number of distinct vertices (not counting vertex {1}) is maximal, which is {k/2}. In that case, the number of distinct “edges” {\{ i_j, i_{j+1} \}} is {k/2}, and they must form a connected graph over {1 + k/2} vertices, that is, they have to form a tree. Furthermore, each edges is repeated exactly twice in the closed walk, otherwise we would not have enough distinct edges to connect {1+k/2} distinct vertices.

If the pairs form a tree, then the only way we can have closed walk in which every edge is repeated twice is that the closed walk is a depth-first visit of the tree. In this case, we can improve our encoding in the following way. In a depth-first visit of a tree only two events are possible at each step: either we discover a new vertex, or we backtrack on the edge between the current node and the parent node. Thus we only need to pay {1 + \log n} bits to encode a new node in the sequence and {1} bit to encode an already seen node, and we obtain a bound of {2^{\frac{k}{2}+\frac k 2 \log n + \frac k 2}= 2^kn^\frac k 2}. By taking the {k}th root we obtain a bound on {\|M\|} of {O(\sqrt n)}.

5. Generalizing to any {p}

Now assume {G\sim G_{n,p}} and {A} is the adjacency matrix of {G}. We also assume {p<\frac{1}{2}}. We define

\displaystyle M=A-\mathop{\mathbb E}(A).

In this matrix {M_{ii}=0} and if {i\neq j, M_{i,j}=1-p} with probability {p} and {-p} with probability {1-p}. Therefore {\mathop{\mathbb E} (M_{ij})=0, \mathop{\mathbb E} (M_{ij}^2)=p-p^2\leq p}. In fact, {\mathop{\mathbb E} (M_{ij}^k)\leq p} for all {k\geq 1}.

From this we see we need to sum over sequences such that the multiset has each pair occuring at least two times, as if any pair occurs once, the expectation is {0}.

Therefore the bound is

\displaystyle  \mathop{\mathbb E} (M_{11}^k)\leq \sum_{i_1,\ldots i_{k-1}} p^\ell

where {\ell} is the number of distinct pairs and the sum is taken over multisets where each pair occurs at least twice. For large {\ell}, the number of sequences where each pair occurs at least twice with {\ell} distinct pairs is approximately {2^{O(\ell)}n^\ell}. This would give us

\displaystyle  \sum_{i_1,\ldots i_{k-1}}p^\ell=\sum_\ell p^\ell 2^{O(\ell)}n^\ell\leq O(p^\frac{k}{2}2^{O(k)}n^{\frac{k}{2}})

so the bound on {\|M\|} is {O(\sqrt{np})}. However, the bound on the number of sequences with {\ell} distict pairs breaks down when {\ell} is much smaller than {k}. In a full proof much more complicated calculations must be done.

6. Problems with sparse graphs

If {p=\Theta(\frac{1}{n})} , then {\|A-\mathop{\mathbb E}(A)\|\geq \Omega\left(\sqrt\frac{\log n}{\log\log n}\right)} w.h.p.

This breaks down the nice bound we obtained in section 5. This follows from the irregularity of sparse graphs. There will be isolated vertices and vertices with degree much higher than average.

Lemma 1 If {p=\Theta(\frac{1}{n})} then w.h.p. the highest degree vertex of {G} is of order {\Theta\left(\frac{\log n}{\log \log n}\right)}.

If G has a node of degree {\geq d}, then, for every {p< \frac 1 {4\sqrt d}}, {\lambda_{\max} (A-pJ) \geq\Omega(\sqrt d)}. This implies that {\forall 0< p < . \frac 1 {4\sqrt d}, \ \|A-pJ\|\geq \Omega(\sqrt d)}.

Proof: We have

\displaystyle  \lambda_{\max}(A-pJ)=\max_{{\bf x}}\frac{{\bf x}^T(A-pJ){\bf x}}{\|{\bf x}\|^2}

where the maximum is taken over all nonzero vectors {{\bf x}}. Call {v} a node of degree {\geq d} and call {d} of its neighbors {u_1,\ldots, u_d}.

Consider the vector {{\bf x}} such that {x_v=1, x_{u_i}=\frac{1}{\sqrt d}} and {x_w=0} for other vertices {w}. We have

\displaystyle {\bf x}^TA{\bf x}\geq 2 \sqrt d

\displaystyle  {\bf x}^TpJ{\bf x}=p\cdot \left(\sum x_i\right)^2=p\cdot (1+\sqrt d)^2\leq 4pd

\displaystyle  || {\bf x}||^2=2

Therefore if {p\leq \frac 1{4\sqrt d}},

\displaystyle \frac{{\bf x}^T(A-pJ){\bf x}}{\|{\bf x}\|^2}\geq \sqrt d - \frac 12 \sqrt d=\Omega(\sqrt d)

yielding the desired bound.

\Box

Theorem 4 proceeds immediately from Proposition 5 and Lemma 1.

Beyond Worst Case Analysis: Lecture 4

Scribed by Rachel Lawrence

In which we introduce semidefinite programming and apply it to Max Cut.

1. Overview

We begin with an introduction to Semidefinite Programming (SDP). We will then see that, using SDP, we can find a cut with the same kind of near-optimal performance for Max Cut in random graphs as we got from the greedy algorithm — that is,

\displaystyle cut > \frac{|E|}{2} + \Omega(n\cdot\sqrt[]{d})

in random graphs {G_n, \frac{d}{n}}. More generally, we will prove that you can always find a cut at least this large in the case that G is triangle-free and with maximum vertex degree {\geq d}, which will imply the bound in random graphs. We will also see how to use SDP to certify an upper bound:

\displaystyle max\ cut < \frac{|E|}{2} + O(n\cdot \sqrt[]{d})

with high probability in {G_{n, \frac{d}{n}}}

Methods using SDP will become particularly helpful in future lectures when we consider planted-solution models instead of fully random graphs: greedy algorithms will fail on some analogous problems where methods using SDP can succeed.

2. Semidefinite Programming

Semidefinite Programming (SDP) is a form of convex optimization, similar to linear programming but with the addition of a constraint stating that, if the variables in the linear program are considered as entries in a matrix, that matrix is positive semidefinite. To formalize this, we begin by recalling some basic facts from linear algebra.

2.1. Linear algebra review

Definition 1 (Positive Semidefinite) A matrix {M\in {\mathbb R}^{n \times n}} is positive semidefinite (abbreviated PSD and written {M \succeq {\bf 0}}) if it is symmetric and all its eigenvalues are non-negative.

We will also make use of the following facts from linear algebra:

  1. If {M \in {\mathbb R}^{n \times n}} is a symmetric matrix, then all the eigenvalues of {M} are real, and, if we call {\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n} the eigenvalues of {M} with repetition, we have

    \displaystyle  M = \sum_i \lambda_i {\bf v}^{(i)} ({\bf v}^{(i)})^T

    where the {{\bf v}^{(i)}} are orthonormal eigenvectors of the {\lambda_i}.

  2. The smallest eigenvalue of {M} has the characterization

    \displaystyle  \lambda_1 = \min_{{\bf y} \neq {\bf 0}} \frac{{\bf y}^T M {\bf y}}{||{\bf y}||^2}

    and the optimization problem in the right-hand side is solvable up to arbitrarily good accuracy

This gives us the following lemmas:

Lemma 2 {M \succeq {\bf 0}} if and only if for every vector {{\bf y}} we have {{\bf y}^T M {\bf y} \geq 0}.

Proof: From part (2) above, the smallest eigenvalue of M is given by

\displaystyle  \lambda_1 = \min_{{\bf y} \neq {\bf 0}} \frac{{\bf y}^T M {\bf y}}{||{\bf y}||^2}

Noting that we always have {||{\bf y}||^2 \geq 0}, then {\lambda_1 \geq 0} if and only if the numerator {{\bf y}^T M {\bf y}} on the right is always non-negative. \Box

Lemma 3 If {A, B \succeq {\bf 0}}, then {A + B \succeq {\bf 0}}

Proof: {\forall {\bf y}}, {{\bf y}^T (A+B) {\bf y} = {\bf y}^T A {\bf y} + {\bf y}^T B {\bf y} \geq 0}. By Lemma 2, this implies {A+B \succeq 0}. \Box

Lemma 4 If {A \succeq 0} and {a \geq 0}, then {aA \succeq 0}

Proof: {\forall y}, {{\bf y}^T a A {\bf y} = a({\bf y}^T A {\bf y}) \geq 0}. By Lemma 2, this implies {aA \succeq 0}. \Box

2.2. Formulation of SDP

With these characterizations in mind, we define a semidefinite program as an optimization program in which we have {n^2} real variables {X_{i,j}}, with {1 \leq i,j \leq n}, and we want to maximize, or minimize, a linear function of the variables such that linear constraints over the variables are satisfied (so far this is the same as a linear program) and subject to the additional constraint that the matrix {X} is PSD. Thus, a typical semidefinite program (SDP) looks like

\displaystyle  \begin{array}{rcl}  \max && \sum_{i,j} C_{i,j} X_{i,j} \\ s.t.\\ && \sum_{i,j} A^{(1)}_{i,j} X_{i,j} \leq b_1\\ && \vdots\\ && \sum_{i,j} A^{(m)}_{i,j} X_{i,j} \leq b_m\\ && X \succeq {\bf 0} \end{array}

where the matrices {C,A^{(1)},\ldots, A^{(m)}} and the scalars {b_1,\ldots,b_m} are given, and the entries of {X} are the variables over which we are optimizing.

We will also use the following alternative characterization of PSD matrices

Lemma 5 A matrix {M\in {\mathbb R}^{n \times n}} is PSD if and only if there is a collection of vectors {{\bf x}^{(1)},\ldots, {\bf x}^{(n)}} such that, for every {i,j}, we have {M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle }.

Proof: Suppose that {M} and {{\bf x}^{(1)},\ldots, {\bf x}^{(n)}} are such that {M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle } for all {i} and {j}. Then {M} is PSD because for every vector {{\bf y}} we have

\displaystyle  {\bf y}^T M {\bf y} = \sum_{i,j} y_i y_j M_{i,j} = \sum_{i,j} y_iy_j \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle = \left\|\sum_i y_i {\bf x}^{(i)} \right\|^2 \geq 0

Conversely, if {M} is PSD and we write it as

\displaystyle  M = \sum_k \lambda_k {\bf v}^{(k)} ({\bf v}^{(k)})^T

we have

\displaystyle  M_{i,j} = \sum_k \lambda_k v^{(k)}_i v_j^{(k)}

and we see that we can define {n} vectors {{\bf x}^{(1)},\cdots,{\bf x}^{(n)}} by setting

\displaystyle  x^{(i)}_k := \sqrt {\lambda_k} \cdot v^{(k)}_i

and we do have the property that

\displaystyle  M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle

\Box

This leads to the following equivalent formulation of the SDP optimization problem:

\displaystyle  \begin{array}{rcl}  \max && \sum_{i,j} C_{i,j}\langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \\ s.t.\\ && \sum_{i,j} A^{(1)}_{i,j} \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \leq b_1\\ && \vdots\\ && \sum_{i,j} A^{(m)}_{i,j} \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \leq b_m\\ \end{array}

where our variables are vectors {{\bf x}^{(1)},\cdots,{\bf x}^{(n)}}. This is the statement of the optimization problem that we will most commonly use.

2.3. Polynomial time solvability

From lemmas 3 and 4, we recall that if {A} and {B} are two matrices such that {A\succeq {\bf 0}} and {B \succeq {\bf 0}}, and if {a\geq 0} is a scalar, then {a \cdot A \succeq {\bf 0}} and {A+B \succeq 0}. This means that the set of PSD matrices is a convex subset of {{\mathbb R}^{n \times n}}, and that the above optimization problem is a convex problem.

Using the ellipsoid algorithm, one can solve in polynomial time (up to arbitrarily good accuracy) any optimization problem in which one wants to optimize a linear function over a convex feasible region, provided that one has a separation oracle for the feasible region: that is, an algorithm that, given a point,

  1. Checks whether it is feasible and, if not,
  2. Constructs an inequality that is satisfied by all feasible point but not satisfied by the given point.

In order to construct a separation oracle for a SDP, it is enough to solve the following problem: given a matrix {M}, decide if it is PSD or not and, if not, construct an inequality {\sum_{ij}a_{ij}x_{ij} \geq 0} that is satisfied by the entries of all PSD matrices but that is not satisfied by {M}. In order to do so, recall that the smallest eigenvalue of {M} is

\displaystyle  \min_{{\bf y}} \frac {{\bf y}^T M {\bf y} }{|| {\bf y}||^2 }

and that the above minimization problem is solvable in polynomial time (up to arbitrarily good accuracy). If the above optimization problem has a non-negative optimum, then {M} is PSD. If it is a negative optimum {{\bf y}^*}, then the matrix is not PSD, and the inequality

\displaystyle  \sum_{i,j} X_{i,j} y^*_i y^*_j \geq 0

is satisfied for all PSD matrices {X} but fails for {X:= M}. Thus we have a separation oracle and we can solve SDPs in polynomial time up to arbitrarily good accuracy.

3. SDP Relaxation of Max Cut and Random Hyperplane Rounding

The Max Cut problem in a given graph {G=(V,E)} has the following equivalent characterization, as a quadratic optimization problem over real variables {x_1,\ldots,x_n}, where {V = \{ 1,\ldots,n\}}:

\displaystyle  \begin{array}{rcl}  {\rm MaxCut} (G) =& \max & \sum_{(i,j) \in E} \frac 14 (x_i - x_j)^2 \\ & s.t.\\ && x_i^2 = 1 \ \ \ \ \ \forall i \in V \end{array}

We can interpret this as associating every vertex {v} with a value {x_v = \pm 1}, so that the cut edges are those with one vertex of value {+1} and one of value {-1}.

While quadratic optimization is NP-hard, we can instead use a relaxation to a polynomial-time solvable problem. We note that any quadratic optimization problem has a natural relaxation to an SDP, in which we relax real variables to take vector values and we change multiplication to inner product:

\displaystyle  \begin{array}{rcl}  {\rm MaxCut} (G) \leq & \max & \sum_{(i,j) \in E} \frac 14 || {\bf x}_i - {\bf x}_j ||^2 \\ & s.t.\\ && || {\bf x}_i|| ^2 = 1 \ \ \ \ \ \forall i \in V \end{array}

Figure 1: The hyperplane through the origin defines a cut partitioning the vertices into sets {\{x_1, x_2\}} and {\{x_3, x_4\}}.

Solving the above SDP, which is doable in polynomial time up to arbitrarily good accuracy, gives us a unit vector {{\bf x}_i} for each vertex {i}. A simple way to convert this collection to a cut {(S,V-S)} is to take a random hyperplane through the origin, and then define {S} to be the set of vertices {i} such that {{\bf x}_i} is above the hyperplane. Equivalently, we pick a random vector {{\bf g}} according to a rotation-invariant distribution, for example a Gaussian distribution, and let {S} be the set of vertices {i} such that {\langle {\bf g}, {\bf x}_i \rangle \geq 0}.

Let {(i,j)} be an edge: One sees that if {\theta} is the angle between {{\bf x}_i} and {{\bf x}_j}, then the probability {(i,j)} is cut is proportional to {\theta}:

\displaystyle  \mathop{\mathbb P} [ (i,j) \mbox{ is cut } ] = \frac {\theta}{\pi}

and the contribution of {(i,j)} to the cost function is

\displaystyle  \frac 14 || {\bf x}_i - {\bf x}_j ||^2 = \frac 12 - \frac 12 \langle {\bf x}_i , {\bf x}_j \rangle = \frac 12 - \frac 12 \cos \theta

Some calculus shows that for every {0 \leq \theta \leq \pi} we have

\displaystyle  \frac {\theta}{\pi} > .878 \cdot \left( \frac 12 - \frac 12 \cos \theta \right)

and so

\displaystyle  \mathop{\mathbb E} [ \mbox{ number of edges cut by } (S,V-S) ] \geq .878 \cdot \sum_{(i,j) \in E} \frac 14 || {\bf x}_i - {\bf x}_j ||^2

\displaystyle  = .878 \cdot {\rm SDPMaxCut}(G) \geq .878 \cdot {\rm MaxCut} (G)

so we have a polynomial time approximation algorithm with worst-case approximation guarantee {.878}.

Next time, we will see how the SDP relaxation behaves on random graphs, but first let us how it behaves on a large class of graphs.

4. Max Cut in Bounded-Degree Triangle-Free Graphs

Theorem 6 If {G= (V,E)} is a triangle-free graph in which every vertex has degree at most {d}, then

\displaystyle  MaxCut(G) \geq \left( \frac 12 +\Omega \left( \frac 1 {\sqrt d} \right) \right) \cdot |E|

Proof: Consider the following feasible solution for the SDP: we associate to each node {i} an {n}-dimensional vector {{\bf x}^{(i)}} such that {x^{(i)}_i = \frac 1{\sqrt 2}}, {x^{(i)}_j = -1/\sqrt{2deg(i)}} if {(i,j) \in E}, and {x^{(i)}_j = 0} otherwise. We immediately see that {||{\bf x}^{(i)} ||^2 = 1} for every {i} and so the solution is feasible.

For example, if we have a graph such that vertex 1 is adjacent to vertices 3 and 5:

{1} 2 3 4 5 {\cdots}
{x^{(1)}: } {\frac 1{\sqrt 2}} 0 {-\frac{1}{\sqrt[]{2deg(1)}}} 0 {-\frac{1}{\sqrt[]{2deg(1)}}}
{x^{(2)}: } 0 {\frac 1{\sqrt 2}} 0 0 0
{x^{(3)}: } {-\frac{1}{\sqrt[]{2deg(3)}}} 0 {\frac 1{\sqrt 2}} 0 0
{\vdots} {\vdots}
{x^{(n)}: } 0 0 0 0 0 {\cdots}

Let us transform this SDP solution into a cut {(S,V-S)} using a random hyperplane.

We see that, for every edge {(i,j)} we have

\displaystyle  \langle {\bf x}^{(i)}, {\bf x}^{(j)} \rangle = - \frac 1 {\sqrt{2d(i)}} - \frac 1 {\sqrt{2d(j)}} \leq - \frac 1 {\sqrt d}

The probability that {(i,j)} is cut by {(S,V-S)} is

\displaystyle  \frac { \arccos \left( \frac 12 - \frac 1 {2 \sqrt d} \right ) }{\pi}

and

\displaystyle  \frac { \arccos \left( \frac 12 - \frac 1 {2 \sqrt d} \right )}{\pi } = \frac 12 + \frac {\arcsin \left( \frac 1 {2 \sqrt d} \right) }{\pi} \geq \frac 12 + \Omega \left( \frac 1 {\sqrt d} \right)

so that the expected number of cut edges is at least {\left( \frac 12 + \Omega \left( \frac 1 {\sqrt d} \right) \right) \cdot |E|}. \Box

Beyond Worst Case Analysis: Lecture 3

Scribed by Keyhan Vakil

In which we complete the study of Independent Set and Max Cut in {G_{n,p}} random graphs.

1. Maximum Independent Set

Last time we proved an upper bound of {O\left( \frac 1p \log np \right)} to the probable value of the maximum independent set in a {G_{n,p}} random graph. This bound also holds if {p} is a function of {n}. There is a simple greedy algorithm which can be shown to achieve an independent set of size {\Omega(n/d)} where {d} is the average degree of the graph. For a {G_{n,p}} random graph, this gives us an independent of size {\Omega(1/p)}. However we will see how to specialize this analysis to sparse {G_{n,p}} random graphs, and close the remaining gap between the probable value and the greedy algorithm.

Consider the greedy algorithm below.

  • {S:= \emptyset}
  • for each {v\in V}
    • if {v} has no neighbors in {S} then {S:= S \cup \{ v \}}
  • return {S}

1.1. First attempt

We might try to model our analysis of this algorithm based on our discussion from Lecture~2.

To wit, let {R} be the set of vertices not in {S} which have no neighbors in {S}. Let {R_i} be the size of {R} when {S} contains {i} vertices. If {R_k = 0}, then our algorithm outputs an independent set of size {k}. Therefore we can determine the expected size of the algorithm’s output (up to a constant factor) by determining {k} such that {\mathop{\mathbb E}[R_k] = O(1)}.

Now we determine {\mathop{\mathbb E}[R_{i+1} \mid R_i]}. A proportion of {p} vertices are connected to the {(i+1)}th vertex in expectation. Of the {R_i} vertices, we expect that {1-p} of them will remain unconnected to all the vertices in {S}. This gives us that {\mathop{\mathbb E}[R_{i+1} \mid R_i] = (1-p)R_i}, and by induction {\mathop{\mathbb E}[R_k] = (1-p)^k n}.

Let {k} be such that {\mathop{\mathbb E}[R_k] = 1}. Then:

\displaystyle  \mathop{\mathbb E}[R_k] = (1-p)^k n = 1 \implies k = \log_{\frac 1{1-p}} n \approx \frac 1p \ln n

We conclude that our independent set has expected size {\Theta(\frac1p \log n)}. However if we take {p = \Theta(1/n)}, that would lead us to believe that we could get an independent set of size {\Theta(n \log n)} in a graph with only {n} vertices, which is impossible.

The error is that {\mathop{\mathbb E}[R_{i+1} \mid R_i]} should be {(1-p)(R_i - 1)}, not {(1-p)R_i}. Note that once we add the {(i+1)}th vertex to {S}, it can no longer be in {R} by definition. When {p} is a constant, the difference is negligible, but when {p} is small then the difference becomes more significant.

It is possible to salvage this analysis, but the result is less elegant. Instead we will now present a different analysis, which will also let us conclude more about higher moments as well.

1.2. Analysis of the greedy algorithm

To analyze the algorithm, consider the following random variables: let {t_i} be the number of for-loop iterations between the time the {i}-th element is added to {S} and the time the {(i+1)}-th element is added to {S}. We leave {t_i} undefined if the algorithm terminates with a set {S} of size less than {i+1}. Thus the size of the independent set found by the algorithm is the largest {i} such that {t_{i-1}} is defined. Consider the following slightly different probabilistic process: in addition to our graph over {n} vertices {\{1,\ldots , n \}}, we also consider a countably infinite number of other vertices {n+1,n+2,\ldots}. We sample an infinite super-graph of our graph over this larger vertex set, so that each possible edge has probability {p} of being generated.

We continue to run the greedy algorithm for every vertex of this infinite graph, and we call {t_i} the (now, always defined) number of for-loop iterations between the {i}-th and the {(i+1)}-th time that we add a node to {S}. In this revised definition, the size of the independent set found by algorithm in our actual graph is the largest {k} such that {t_0 + t_1 + \cdots + t_{k-1} \leq n}.

Now we will reason about the distribution of {t_i}. Say that we have {i} vertices in {S} and we are trying to determine if we should add some vertex {v} to {S}. Note that the probability of {v} being disconnected from all of {S} is {(1-p)^i}. So we add a vertex at each iteration with probability {(1-p)^i}, which shows that {t_i} is geometrically distributed with success probability {(1-p)^i}.

Based on this, we can find the expected value and variance of our sum from before

\displaystyle \mathop{\mathbb E} \left[ t_0 + t_1 + \cdots t_{k-1} \right] = \frac { \frac 1 {(1-p)^k} - 1 }{\frac 1 {1-p} - 1} \leq \frac { \frac 1 {(1-p)^k}}{\frac 1 {1-p} - 1} = \frac 1 {p\cdot (1-p)^{k-1}}

and likewise

\displaystyle  \begin{array}{rcl}  \mathop{\bf Var}[t_0 + t_1 + \cdots t_{k-1}] & \leq & \sum_{i=0}^{k-1} \frac 1 {(1-p)^{2i}} \\ &= & \frac { \frac 1 {(1-p)^{2k}} - 1 }{\frac 1 {(1-p)^2} - 1} \\ & \leq & \frac 1 {(1 - (1-p)^2 ) \cdot (1-p)^{2k-2 } } \\ & \leq & \frac 1 {p \cdot (1-p)^{2k - 2} } \\ & = & p \left( \mathop{\mathbb E}[t_0 + \cdots + t_{k-1}] \right)^2. \end{array}

We want to choose {k} so that the sum is at most {n} with high probability. Let

\displaystyle  k = \log_{\frac {1}{1-p}} \frac {pn}2 \approx \frac 1p \ln pn .

This makes the expected value of the sum {\le n/2} and the standard deviation {\le \sqrt{p}n / 2}. Thus, if {p(n) \rightarrow 0} sufficiently fast, the greedy algorithm has a {1-o(1)} probability of finding an independent set of size {\Omega( p^{-1} \log pn ) = \Omega\left( \frac nd \log d \right)}, where {d := np} is a measure of the average degree.

1.3. Certifiable upper bound

We now derive a polynomial time computable upper bound certificate for maximum independent set in {G_{n,p}}. We use the following lemma without proof. Note its similarity to Lemma~2 from Lecture~1.

Lemma 1 If {p = p(n) \ge \frac {\log n}n}, {G} is sampled from {G_{n,p}}, {A} is the adjacency matrix of {G}, and {J} is the matrix of all ones, then there is a {1-o(1)} probability that

\displaystyle  \lVert A - p J \rVert \leq O( \sqrt {pn })

Since {A - pJ} is a real symmetric matrix its spectral norm can be computed as:

\displaystyle  \lVert A - pJ \rVert = \max_{{\bf x} \neq {\bf 0}} \frac{|{\bf x}^T(A - pJ){\bf x}|}{{\bf x}^T {\bf x}} \;.

If {S} is an independent set of size {k}, then {{\bf 1}_S^T A {\bf 1}_S = 0}, {{\bf 1}_S^T J {\bf 1}_S = k^2}, and {{\bf 1}_S^T {\bf 1}_S = k}, so that

\displaystyle  \begin{array}{rcl}  \lVert A - pJ \rVert &\geq & \frac{|{\bf 1}_S^T(A - pJ){\bf 1}_S|}{{\bf 1}_S^T {\bf 1}_S} \\ &= & pk. \end{array}

This bound holds for any independent set, so it also holds for the largest one. If we denote by {\alpha(G)} the size of the largest independent set in {G}, we have that

\displaystyle  \alpha(G) \leq \frac 1p \lVert A - p J \rVert .

For a {G_{n,p}} random graph, the above upper bound is {O(\sqrt{n/p}) = O(n/\sqrt d)} with high probability.

2. Max Cut

We will now reconsider Max Cut for the general case {G_{n,p}}. In Lecture~2, we dealt with the special case of {p=\frac12}. Unlike maximum independent set, our arguments for the case {p=\frac12} apply to Max Cut without much modification.

2.1. High probability upper bound

Let {G} be a random graph from {G_{n,p}}, and define {d := pn} as a measure of its average degree. We will prove that the size of a maximum cut of {G} is at most {dn/4 + O(\sqrt d n)} with high probability. The proof of this statement is nearly identical to the version in Lecture~2, where it was presented for the case {p=\frac12}. We know that the expected value of a cut {S} is {|S| \cdot |V-S| \le dn / 4}. By a Chernoff bound, the probability that any particular cut exceeds expectation by an additive factor of {O(\epsilon n)} is exponentially decreasing by a factor of {\epsilon^2 dn}. By taking {\epsilon = 1/\sqrt{d}} and taking a union bound over all {2^n} possible cuts {S}, we have that our expected cut has value at most {dn / 4 + O(\sqrt d n)} with probability {1 - 2^{-\Omega(n)}}.

2.2. Greedy algorithm

Consider the greedy algorithm

  • {A:= \emptyset}
  • {B:= \emptyset}
  • for each {v\in V}
    • if {v} has more neighbors in {B} than in {A} then {A:= A \cup \{ v \}}
    • else {B:= B \cup \{ v\}}
  • return {(A,B)}.

Label {V = \{ 1,\ldots,n \}}. Let {A_i} and {B_i} be the sets {A} and {B} when vertex {i} is considered in the for-loop. For the purpose of analysis, we delay the random decisions in {G} until a vertex is considered. In particular, we delay the choice of which of {1, 2, \ldots, i - 1} is a neighbor until {i} is vertex {i} is considered. Note that no edge needs to be considered twice, and so we can treat each one as an independent biased coin flip.

Let {a_i} and {b_i} be the neighbors of {i} in {A_i} and {B_i} respectively. We can show that {|a_i - b_i| = \max(a_i, b_i) - \frac12 (a_i + b_i)}, and so {\sum_i |a_i - b_i|} is the gain our algorithm achieves over cutting half the edges.

Now {|a_i - b_i|} has expectation {\Omega( \sqrt {pi} )} and variance {O(pi)}. Adding over all {i}, the sum of the differences has mean {\Omega( n \sqrt{pn} )} and variance {O(pn^2)}. This gives us an expected gain of {\Omega( n \sqrt {pn}) = \Omega( n \sqrt d)} with {1-o(1)} probability. The value of cutting half the edges is approximately {dn / 4}. This gives a final value of {dn/4 + \Omega(n\sqrt d)} w.h.p. as stated.

2.3. Certifiable upper bound

Again, we will derive a certifiable upper bound by looking at the spectral norm. If {(S,V-S)} is a cut with value {\frac {dn}4 + C}, then we have

\displaystyle  {\bf 1}_S^T A {\bf 1}_{V-S} = \frac {dn}4 + C

\displaystyle  {\bf 1}_S^T p J {\bf 1}_{V-S} = p \cdot |S| \cdot |V-S| \leq p \cdot \frac {n^2} 4 = \frac {dn}4

\displaystyle  \lVert {\bf 1}_S \rVert \cdot \lVert {\bf 1}_{V-S} \rVert = \sqrt { |S| \cdot |V-S| } \leq \sqrt { \frac {n^2}4 }

so

\displaystyle  C \leq 2n \cdot \lVert {\bf 1}_S \rVert \cdot \lVert {\bf 1}_{V-S} \rVert .

This means that, in every graph, the maximum cut is upper bounded by

\displaystyle  \frac {dn}4 + \frac n2 \left\lVert A - \frac dn J \right\rVert

which if {d \ge \log n} is with high probability at most {\frac {dn}4 + O( n \sqrt d)} (by Lemma~1).

3. Conclusion

We conclude with the following table, which summarizes our results for a random graph sampled from {G_{n, d/n}}.

Problem Expected Value Greedy Algorithm Certifiable Upper Bound
Independent Set {O\left(\frac nd \log d\right)} {\Omega\left(\frac nd \log d\right)} w.h.p. {O\left(\frac n{\sqrt{d}} \right)} w.h.p.*
Max Cut {\frac{dn}4 + O(n \sqrt d)} {\frac {dn}4 + \Omega(n \sqrt d)} w.h.p. {\frac {dn} 4 + O(n \sqrt d)} w.h.p.*

* Note that both certifiable upper bounds require {d \ge \log n}.

Both greedy algorithms perform very well in comparison to the probable value. In Max~Cut, our greedy algorithm is particularly strong, matching our certifiable upper bound up to a lower order term. This supports one of our major theses: while greedy algorithms exhibit poor worst-case performance, they tend to do well over our given distribution.

Beyond Worst Case Analysis: Lecture 2

Scribe: Mahshid Montazer

In this lecture, we study the Max Cut problem in random graphs. We compute the probable value of its optimal solution, we give a greedy algorithm which is nearly optimal on random graphs and we compute a polynomial time upper bound certificate for it using linear algebra methods. We also study the problem of Maximum Independent Set in random graphs and we compute an upper bound to the probable value for its optimal solution.

1. Max Cut

Definition 1 Max Cut: In an un-weighted graph {G=(V,E)}, a cut is defined as a partition of its vertices into two sets {V_1} and {V_2}. Let {E(V_1, V_2)} be the size of the cut {(V_1, V_2)} which is the number of the edges with one endpoint in {V_1} and one endpoint in {V_2}. Max Cut is the the problem of finding a cut of largest size.

To give a clear example, in every bipartite graph, a bipartition is a maximum cut. It is easy to show that the size of the maximum cut would be at least half of the number of the graph edges. One question that arises here is that how much more than half of the edges can we cut. The answer is: not that much in random graphs. We will show this claim in the following section.

2. Probable Value of Max Cut Optimal Solution

In this section, we compute the probable value of Max Cut optimal solution in random graphs. Our result is for samples of {G_{n,\frac{1}{2}}}, but the analysis will generalize to {G_{n,p}}.

Lemma 2 For every fixed cut {(S,V-S)}, {\mathop{\mathbb E} [E(S, V\setminus S)] \leq \frac{n^2}{8}}.

Proof: {\mathop{\mathbb E} [E(S, V\setminus S)] = \left\vert S \right\vert \left\vert V\setminus S \right\vert \frac{1}{2} = \frac{n^2}{8}.} \Box

Lemma 3 {\mathop{\mathbb P} [E(S, V\setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] \leq e^{-\Omega(\epsilon^2 n^2)}} where {0 \leq \epsilon \leq \frac{1}{2}}.

Proof: The proof is by applying Chernoff bounds on the result of lemma 2. \Box

Lemma 4 There is a constant {c>0} such that

\displaystyle  \mathop{\mathbb P} [\exists (S,V \setminus S) \mid E(S,V \setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] \leq 2^{-n}

where {\epsilon = \frac{c}{\sqrt{n}}} and the probability is taken over the choice of {G=(V,E)} from the distribution {G_{n,\frac 12 }}.

Proof:

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P} [\exists (S,V \setminus S) \mid E(S,V \setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] & \leq & 2^n \cdot e^ {-\Omega(\epsilon^2 n^2)} \\  & \leq & 2^{-n}. \end{array}

for an appropriate choice of {c}. \Box

The above lemma clearly leads us to the following theorem.

Theorem 5 There is a constant {c} such that w.h.p. Max Cut in {G_{n,\frac{1}{2}}} is of size at most {\frac{n^2}{8} + c \cdot n^{1.5}.}

Thus, we showed that in {G_{n,1/2}}, the probable value of Max Cut is at most {\frac{n^2}{8} + c \cdot n^{1.5}}.

3. Greedy Algorithm for Max Cut

Consider the following greedy algorithm for Max Cut:

  • {A \leftarrow \emptyset , B \leftarrow \emptyset}
  • for {v \in V}
    • if {v} has more neighbors in {A} than in {B}, then {B \leftarrow B \cup \{v\}}
    • else {A \leftarrow A \cup \{v\}}
  • return {A} and {B}

The above algorithm can be applied to any graph, but we will analyze it on random graphs. A naive analysis of the algorithm guarantees that our greedy algorithm cuts at least half of the edges, giving us an approximation ratio of 2. The reason is that at each step, we add at least half of the processing vertex’s incident edges to the cut. However, a more careful analysis of the algorithm shows that it is near-optimal for random graphs. Below, we prove our claim for {G_{n,\frac{1}{2}}}.

Lemma 6 With high probability over the choice of {G} from {G_{n,\frac{1}{2}}}, the greedy algorithm finds a cut of size {\frac {n^2}8 + \Omega(n^{1.5})}.

Proof: Let {G(V,E) \sim G_{n,\frac{1}{2}}} be the given graph and let {v_1, v_2 , \cdots , v_n} be the order in which we process the vertices. Note that at the time of processing {v_i} {(1 \leq i <n)}, we do not need to know the edges that connect {v_i} to any vertex {v_j} {(j>i)}. Let { a_i = |A|} and {b_i = |B|} be the size of sets {A} and {B} before processing {v_i}, respectively. Although {G} is given before we run the algorithm, for the sake of the analysis, we can assume that we are building it on the go and while processing each of the vertices. Remember that each edge of the graph would exists independently with probability {\frac{1}{2}}. For deciding where to put {v_i}, we generate {a_i} random bits and call their summation {X_i}. We also generate {b_i} random bits and call their summation {Y_i}. We put {v_i} in set {A} (respectively, {B}) if {X_i \leq Y_i} (respectively, {Y_i < X_i}). Note that the more balanced {A} and {B} get, the worse it gets for the analysis. Also, note that the extra edges that the algorithm cuts other than half of the edges would be:

\displaystyle \sum_{1\leq i \leq n} {|X_i-Y_i|} = E(A, B) -\frac{|E|}{2}.

We know that

\displaystyle X_i-Y_i = \frac{a_i-b_i}{2}.

Note that

\displaystyle \mathop{\mathbb E}[|X_i - Y_i|] = \Omega(\sqrt{i})

and

\displaystyle \mathop{\bf Var}(|X_i - Y_i|) = O(i).

Thus, we have that {\sum_{1\leq i \leq n} {|X_i-Y_i|} } has mean {\Omega(n^{1.5})} and standard deviation {O(n)}. Thus, with {1-O(1)} probability we have:

\displaystyle  \sum_{1\leq i \leq n} {|X_i-Y_i|} = \sum_{1\leq i \leq n} {\Omega(\sqrt{i})} \geq \Omega(n^{1.5}).

\displaystyle \Rightarrow E(A,B) \geq \frac{n^2}{8} + \Omega(n^{1.5}).

\Box

4. Polynomial Time Upper Bound for Max Cut

In this section, we find polynomial time upper bound certificates for Max Cut in random graphs using linear algebra techniques.

Lemma 7 Let {G=(V,E)} be a graph, {A} be its adjacency matrix, {J} be the matrix all whose entries are 1 and {(S, V\setminus S)} be the Max Cut of {G}. Then

\displaystyle  E(S, V \setminus S) \leq \frac{n^2}{8} + \frac n2 || A - J/2 ||

Proof: we have:

\displaystyle  \begin{array}{rcl}  E(S, V \setminus S) - \frac{n^2}{8} & \leq & {\bf 1}^T_S \cdot ( A - J/2) \cdot {\bf 1}_{V\setminus S} \\ & \leq & || A - J/2 || \cdot || {\bf 1}_S || \cdot ||{\bf 1}_{V\setminus S} || \\ & \leq & || A - J/2 || \cdot \sqrt{|S|} \cdot \sqrt{|V \setminus S|} \\ & \leq & || A - J/2 || \cdot \frac{n}{2}\\ \end{array}

\Box

Recall that, with high probability over the choice of a graph {G} from {G_{n,\frac 12}}, if {A} is the adjacency matrix of {G} then we have {||A - J/2|| \leq O(\sqrt n)} with high probability.

We conclude that, with high probability over the choice of {G} from {G_{n,\frac 12}} we can find in polynomial time a certificate the max cut optimum of {G} is at most {\frac {n^2} 8 + O(n^{1.5})}.

5. Maximum Independent Set

In this section, we discuss the Maximum Independent Set problem for {G_{n,p}} (especially {G_{n,\frac{1}{2}}}) and we show its close connection with Max Clique problem. Finally, we compute its optimal solution’s probable value.

Definition 8 Maximum Independent Set: In a graph {G(V,E)}, an independent set is a set of vertices that are mutually disconnected. A Maximum Independent Set in {G} is an independent set of largest possible size. The Maximum Independent Set problem is the problem of finding such a set.

Note that the Maximum Independent Set in {G_{n,p}} corresponds to the Maximum Clique in {G_{n,1-p}}. Thus, for {p = \frac{1}{2}}, everything that we argued for Max Clique is usable for Maximum Independent Set as well.

In this section, we compute an upper bound to the probable value of Maximum Independent Set’s optimal solution in {G_{n,p}}.

Fix a set {S \subset V} of size {k}. We have

\displaystyle \mathop{\mathbb P} [S \text{ is an independent set in } G] = (1-p)^{\binom{k}{2}}

where the probability is over the choice of {G\sim G_{n,p}}.

The following lemma holds.

Lemma 9 {\mathop{\mathbb P}[\exists \text{ Independent Set of size } k] \leq e^{-\frac{k}{2} \left( ((k-1) \cdot \ln{\frac{1}{1-p}} - 2\ln{\frac{n}{k}} \right) }}

Proof:

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P}[\exists \text{ Independent Set of size } k] & \leq & \mathop{\mathbb E}[\text{\#Independent Sets of size k}]\nonumber \\ &= & \binom{n}{k} \cdot (1-p)^{\binom{k}{2}} \nonumber \\ & \leq & \left(\frac{n}{k} \right)^k \cdot (1-p)^{\frac{k^2}{2} - \frac{k}{2}} \nonumber \\ & =& e^{k \cdot \ln{\frac{n}{k}} - \left( \frac{k^2}{2} - \frac k2 \right) \cdot \ln{\frac{1}{1-p}}} \nonumber \\ & = & e^{-\frac{k}{2} ((k-1) \cdot \ln{\frac{1}{1-p}} - 2\ln{\frac{n}{k}})}.  \end{array}

\Box

Now, what would be the maximum value of {k} such that with high probability we can still make sure that there exists an independent set of size {k}? Note that the value of (0) goes to 0 when {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2}.

A sufficient condition for {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2} is to have {k = 2\log_\frac{1}{1-p} n +2}, showing us that there is a high probability that maximum independent set in {G_{n,p}} is at most {O\left ( \log_\frac{1}{1-p} n \right) = O \left ( \frac 1p \log n \right)}. A more careful bound is that we can have {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2} provided, say, {k \geq 3 \log_{\frac 1{1-p}} np + 100}, and so with high probability the maximum independent set in {G_{n,p}} is at most {O \left ( \frac 1p \log p n \right)}. If we call {d=pn}, then the bound is {O \left ( \frac nd \log d \right)}