Beyond Worst Case Analysis: Lecture 5

Scribed by Haaris Khan

In which we study the SDP relaxation of Max Cut in random graphs.

1. Quick Review of Chernoff Bounds

Suppose {X_1, ..., X_n} are mutually independent random variables with values {{0, 1}}. \newline Let {X := \sum_{i=1}^{n} X_i}. The Chernoff Bounds claim the following: \newline

1. { \textrm{ } \forall \textrm{ } \epsilon \textrm{ such that } 0 \leq \epsilon \leq 1, }

\displaystyle  \mathop{\mathbb P}(\vert X - \mathop{\mathbb E}[X] \vert) > \epsilon \cdot \mathop{\mathbb E}[X]) \leq \exp(\Omega(\epsilon^2 \cdot \mathop{\mathbb E}[X]))

2. { \forall \textrm{ } t \textrm{ } > 1, }

\displaystyle  \mathop{\mathbb P} (\vert X - \mathop{\mathbb E}[X] \vert \geq t \cdot \mathop{\mathbb E}[X]) \leq \exp(-\Omega((t\log(t)) \cdot \mathop{\mathbb E}[X]))

3. When we do not know {\mathop{\mathbb E}[X]}, we can bound as follows:

\displaystyle  \mathop{\mathbb P}(\vert X - \mathop{\mathbb E}[X] \vert \geq \epsilon \cdot n) \leq \exp(- \Omega(\epsilon^2 \cdot n))

2. Cutting a Near-Optimal Number of Edges in {G_{n,p}} Via SDP Rounding

Consider {G_{n, p}} where {p > \frac{\log(n)}{n}}. We show that with {1 - o(1)} probability, the max-degree will be {O(d)}

  • Fix v
  • For some constant c,

    \displaystyle \mathop{\mathbb P}(\textrm{v has degree} > c \cdot d) = \mathop{\mathbb P}(\vert deg(v) - \mathop{\mathbb E}[v] \vert > (c - 1) \mathop{\mathbb E}[deg(v)])

    \displaystyle \leq \exp(- \Omega((c - 1)\log(c - 1) \cdot d)) \textrm{ (by Chernoff Bounds)}

    \displaystyle  \leq \exp(-\Omega((c - 1)\log(c - 1) \cdot \log(n))

    \displaystyle  \leq \frac{1}{n^2}, \textrm{ for some choice of constant c}

So {\mathop{\mathbb P}(\exists \text{ } v \textrm{ with degree } > c \cdot d) \leq n \cdot \frac{1}{n^2} \leq \frac{1}{n}}

Next, we compute the number of vertices that participate in a triangle. Recall that degree {d} can be bounded by {o(n^{\frac{1}{3}})}

\displaystyle \mathop{\mathbb E}[\textrm{number vertices in triangles}] = n \cdot \mathop{\mathbb P}(\textrm{v participates in a triangle})

If a vertex participates in a triangle, there are {\binom{n - 1}{2}} ways of choosing the other two vertices that participate with v in the triangle. \newline So the expected number of vertices in triangles can be bounded by

\displaystyle  \mathop{\mathbb E}[\textrm{number vertices in triangles}] \leq n \cdot p^3 \cdot \binom{n - 1}{2}

\displaystyle  \leq n^3 \cdot p^3

\displaystyle  = o(n) \textrm{ if } p = o \left(\frac{1}{n^{\frac{2}{3}}}\right), \textrm{ } d = o(n^{\frac{1}{3}})

So with {o(1)} probability,

  • All vertices have degree {O(d)}
  • {o(n)} vertices participate in triangles.

3. Eigenvalue Computations and SDP

Problems like finding the largest / smallest eigenvalue can be solved using SDP

Let {M} be symmetric, {\lambda_{\max}} be the largest eigenvalue of M: {\lambda_{\max} = \max_x \frac{\boldsymbol{x}^T M \boldsymbol{x}}{\|\boldsymbol{x} \|^2}} We can formulate this as Quadratic Programming:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \sum_{i, j} M_{i, j} x_i y_j{\rm s.t.} \\ && \sum_i x_i^2 = 1 \\ \end{array}

We showed previously that we can relax a Quadratic Program to SDP:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \sum_{i, j} M_{i, j} \langle \boldsymbol{x_i}, \boldsymbol{x_j} \rangle{\rm s.t.} \\ && \sum_i \|\boldsymbol{x_i}\|^2 = 1 \\ \end{array}

In fact, it happens that these two are equivalent. To show this, we must show that a vector solution {x} of SDP can hold as a solution to the QP and vice versa.

Proving {x} for QP is valid for SDP: Trivial. Any solution {x} to our Quadratic Program must be a solution for our SDP since it is a relaxation of the problem; then the optimum of our QP must be less than or equal to the optimum of our SDP

Proving {x} for SDP is valid for QP: Consider {x := \text{vector solution of cost c}}. We note that our SDP can be transformed into an unconstrained optimization problem as follows:

\displaystyle  \begin{array}{rcl}  \max_{i, j} & & \frac{\sum_{i, j} M_{i, j} \langle \boldsymbol{x_i}, \boldsymbol{x_j} \rangle}{\sum_i \|\boldsymbol{x_i}\|^2} \end{array}

The cost c can be defined as the value of our solution:

\displaystyle  c = \frac{\sum_{i, j} M_{i, j} \sum_k \boldsymbol{x_k}^i \boldsymbol{x_k}^j}{\sum_i \sum_k \|\boldsymbol{x_k}^i|^2}

\displaystyle  \leq \max_k \frac{\sum_{i, j} M_{i, j} \boldsymbol{x_k}^i \boldsymbol{x_k}^j}{\sum_i \|\boldsymbol{x_k}^i\|^2}

We get a one-dimensional solution when we use the {k^{th}} element of {x}, and wish to find the {k} that maximizes this.

We use the following inequality:

\displaystyle \frac{a_1 + ... + a_m}{b_1 + ... + b_m} \leq \max_{k = 1, ..., m} \frac{a_k}{b_k}, b_k > 0

Proof:

\displaystyle \sum_i a_i = \sum_i b_i \cdot \frac{a_i}{b_i} \leq \sum_i b_i \cdot \max_k \frac{a_k}{b_k}

\displaystyle  = \max_k \text{ } \frac{a_k}{b_k} \cdot \sum_i b_i

4. SDP Max-Cut: Spectral Norm as a SDP Certificate

Consider the SDP relaxation of Max-Cut on Graph {G}:

\displaystyle  \begin{array}{rcl}  \max & & \sum_{(i, j) \in E} \frac{1}{4} \|\boldsymbol{X_i} - \boldsymbol{X_j}\|^2 \\ {\rm s.t.} \\ && \forall v \in V, \|\boldsymbol{X_v}^2\| = 1 \\ \end{array}

Let the optimum value for this SDP be {SDPMaxCut(G)}. It’s obvious that {MaxCut(G) \leq SDPMaxCut(G)}. Under our constraints, we can rewrite our SDP as

\displaystyle  \sum_{(i, j) \in E} \frac{1}{2} - \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle

So our new optimization problem is

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} - \sum_{(i, j) \in E} \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle \\ {\rm s.t.} \\ && \forall i \in V, \| \boldsymbol{X_i} \|^2 = 1 \\ \end{array}

We can relax our constraint to the following: {\forall i \in V, \sum_i \| \boldsymbol{X_i} \|^2 = n}. Relaxing our constraint will yield an optimization problem with a solution less than the stricter constraint (call this {SDP'MaxCut(G)}):

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} - \sum_{(i, j) \in E} \frac{1}{2} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle \\ {\rm s.t.} \\ && \sum_v \|\boldsymbol{X_v}\|^2 = n \\ \end{array}

Clearly, we have the following inequalities: { MaxCut(G) \leq SDPMaxCut(G) \leq SDP'MaxCut(G) }. We can rewrite {SDP'MaxCut(G)} as

\displaystyle  \begin{array}{rcl}  \max & & \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \sum_{i, j} \frac{-A_{i, j} \langle \boldsymbol{X_i}, \boldsymbol{X_j} \rangle}{\sum_i \|\boldsymbol{X_i}\|^2} \\ && \sum_v \|\boldsymbol{X_v}\|^2 = n \\ \end{array}

Note that our objective function computes the largest eigenvalue of {-A}:

\displaystyle  = \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(-A)

For every graph {G_{n, p}} with {0 \leq p \leq 1},

\displaystyle  MaxCut(G) \leq SDPMaxCut(G) \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(-A)

\displaystyle  \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \lambda_{\max}(pJ - A)

\displaystyle  \leq \frac{\vert E \vert}{2} + \frac{n}{4} \cdot \|pJ - A\|

Recall from previous lectures that for {p > \frac{\log(n)}{n}}, the adjacency matrix of {A} sampled from {G_{n, p}} has {\|pJ - A\| \leq O(\sqrt{np})} with high probability. This implies that {SDPMaxCut(G) \leq \frac{\vert E \vert}{2} + O(n \cdot \sqrt{d})}. Semantically, this means that { SDPMaxCut(G) } computes in poly-time a correct upper-bound of {MaxCut(G)}.

5. Trace and Eigenvalues

Suppose matrix {M} is symmetric with eigenvalues {\lambda_1 \hdots \lambda_n}. The following are true:

  • {M^k} eigenvalues are {\lambda_1^k \hdots \lambda_n^k}
  • {trace(M) := \sum_{i, i} M_{i, i} } ; {trace(M) = \sum_i \lambda_i}

Then, for {M^{2k}, trace(M^{2k}) = \lambda_1^{2k} + \hdots + \lambda_n^{2k}}.

\displaystyle  (\max_i \vert \lambda_i \vert)^{2k} \leq trace(M^{2k}) \leq n \cdot (\max_i \vert \lambda_i \vert)^{2k}

Also,

\displaystyle  \|M\| \leq (trace(M^{2k})^{\frac{1}{2k}} \leq n^{\frac{1}{2k}} \cdot \|M\|

{A_{i, j}} is defined as the number of expected paths from {i} to {j} that take {k} steps (not necessarily simple paths in a graph)

{ = \sum_{\text{paths from i to j}} M_{i, a_1} \hdots M_{a_n, j}}

Our goal with this is to compute the eigenvalues {\lambda}. Since traces relates the sum of the diagonal and the sum of eigenvalues for symmetric {M}, we can use this to provide an upper bound for symmetric {M}.

Beyond Worst Case Analysis: Lecture 4

Scribed by Rachel Lawrence

In which we introduce semidefinite programming and apply it to Max Cut.

1. Overview

We begin with an introduction to Semidefinite Programming (SDP). We will then see that, using SDP, we can find a cut with the same kind of near-optimal performance for Max Cut in random graphs as we got from the greedy algorithm — that is,

\displaystyle cut > \frac{|E|}{2} + \Omega(n\cdot\sqrt[]{d})

in random graphs {G_n, \frac{d}{n}}. More generally, we will prove that you can always find a cut at least this large in the case that G is triangle-free and with maximum vertex degree {\geq d}, which will imply the bound in random graphs. We will also see how to use SDP to certify an upper bound:

\displaystyle max\ cut < \frac{|E|}{2} + O(n\cdot \sqrt[]{d})

with high probability in {G_{n, \frac{d}{n}}}

Methods using SDP will become particularly helpful in future lectures when we consider planted-solution models instead of fully random graphs: greedy algorithms will fail on some analogous problems where methods using SDP can succeed.

2. Semidefinite Programming

Semidefinite Programming (SDP) is a form of convex optimization, similar to linear programming but with the addition of a constraint stating that, if the variables in the linear program are considered as entries in a matrix, that matrix is positive semidefinite. To formalize this, we begin by recalling some basic facts from linear algebra.

2.1. Linear algebra review

Definition 1 (Positive Semidefinite) A matrix {M\in {\mathbb R}^{n \times n}} is positive semidefinite (abbreviated PSD and written {M \succeq {\bf 0}}) if it is symmetric and all its eigenvalues are non-negative.

We will also make use of the following facts from linear algebra:

  1. If {M \in {\mathbb R}^{n \times n}} is a symmetric matrix, then all the eigenvalues of {M} are real, and, if we call {\lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_n} the eigenvalues of {M} with repetition, we have

    \displaystyle  M = \sum_i \lambda_i {\bf v}^{(i)} ({\bf v}^{(i)})^T

    where the {{\bf v}^{(i)}} are orthonormal eigenvectors of the {\lambda_i}.

  2. The smallest eigenvalue of {M} has the characterization

    \displaystyle  \lambda_1 = \min_{{\bf y} \neq {\bf 0}} \frac{{\bf y}^T M {\bf y}}{||{\bf y}||^2}

    and the optimization problem in the right-hand side is solvable up to arbitrarily good accuracy

This gives us the following lemmas:

Lemma 2 {M \succeq {\bf 0}} if and only if for every vector {{\bf y}} we have {{\bf y}^T M {\bf y} \geq 0}.

Proof: From part (2) above, the smallest eigenvalue of M is given by

\displaystyle  \lambda_1 = \min_{{\bf y} \neq {\bf 0}} \frac{{\bf y}^T M {\bf y}}{||{\bf y}||^2}

Noting that we always have {||{\bf y}||^2 \geq 0}, then {\lambda_1 \geq 0} if and only if the numerator {{\bf y}^T M {\bf y}} on the right is always non-negative. \Box

Lemma 3 If {A, B \succeq {\bf 0}}, then {A + B \succeq {\bf 0}}

Proof: {\forall {\bf y}}, {{\bf y}^T (A+B) {\bf y} = {\bf y}^T A {\bf y} + {\bf y}^T B {\bf y} \geq 0}. By Lemma 2, this implies {A+B \succeq 0}. \Box

Lemma 4 If {A \succeq 0} and {a \geq 0}, then {aA \succeq 0}

Proof: {\forall y}, {{\bf y}^T a A {\bf y} = a({\bf y}^T A {\bf y}) \geq 0}. By Lemma 2, this implies {aA \succeq 0}. \Box

2.2. Formulation of SDP

With these characterizations in mind, we define a semidefinite program as an optimization program in which we have {n^2} real variables {X_{i,j}}, with {1 \leq i,j \leq n}, and we want to maximize, or minimize, a linear function of the variables such that linear constraints over the variables are satisfied (so far this is the same as a linear program) and subject to the additional constraint that the matrix {X} is PSD. Thus, a typical semidefinite program (SDP) looks like

\displaystyle  \begin{array}{rcl}  \max && \sum_{i,j} C_{i,j} X_{i,j} \\ s.t.\\ && \sum_{i,j} A^{(1)}_{i,j} X_{i,j} \leq b_1\\ && \vdots\\ && \sum_{i,j} A^{(m)}_{i,j} X_{i,j} \leq b_m\\ && X \succeq {\bf 0} \end{array}

where the matrices {C,A^{(1)},\ldots, A^{(m)}} and the scalars {b_1,\ldots,b_m} are given, and the entries of {X} are the variables over which we are optimizing.

We will also use the following alternative characterization of PSD matrices

Lemma 5 A matrix {M\in {\mathbb R}^{n \times n}} is PSD if and only if there is a collection of vectors {{\bf x}^{(1)},\ldots, {\bf x}^{(n)}} such that, for every {i,j}, we have {M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle }.

Proof: Suppose that {M} and {{\bf x}^{(1)},\ldots, {\bf x}^{(n)}} are such that {M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle } for all {i} and {j}. Then {M} is PSD because for every vector {{\bf y}} we have

\displaystyle  {\bf y}^T M {\bf y} = \sum_{i,j} y_i y_j M_{i,j} = \sum_{i,j} y_iy_j \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle = \left\|\sum_i y_i {\bf x}^{(i)} \right\|^2 \geq 0

Conversely, if {M} is PSD and we write it as

\displaystyle  M = \sum_k \lambda_k {\bf v}^{(k)} ({\bf v}^{(k)})^T

we have

\displaystyle  M_{i,j} = \sum_k \lambda_k v^{(k)}_i v_j^{(k)}

and we see that we can define {n} vectors {{\bf x}^{(1)},\cdots,{\bf x}^{(n)}} by setting

\displaystyle  x^{(i)}_k := \sqrt {\lambda_k} \cdot v^{(k)}_i

and we do have the property that

\displaystyle  M_{i,j} = \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle

\Box

This leads to the following equivalent formulation of the SDP optimization problem:

\displaystyle  \begin{array}{rcl}  \max && \sum_{i,j} C_{i,j}\langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \\ s.t.\\ && \sum_{i,j} A^{(1)}_{i,j} \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \leq b_1\\ && \vdots\\ && \sum_{i,j} A^{(m)}_{i,j} \langle {\bf x}^{(i)}, {\bf x}^{(j)}\rangle \leq b_m\\ \end{array}

where our variables are vectors {{\bf x}^{(1)},\cdots,{\bf x}^{(n)}}. This is the statement of the optimization problem that we will most commonly use.

2.3. Polynomial time solvability

From lemmas 3 and 4, we recall that if {A} and {B} are two matrices such that {A\succeq {\bf 0}} and {B \succeq {\bf 0}}, and if {a\geq 0} is a scalar, then {a \cdot A \succeq {\bf 0}} and {A+B \succeq 0}. This means that the set of PSD matrices is a convex subset of {{\mathbb R}^{n \times n}}, and that the above optimization problem is a convex problem.

Using the ellipsoid algorithm, one can solve in polynomial time (up to arbitrarily good accuracy) any optimization problem in which one wants to optimize a linear function over a convex feasible region, provided that one has a separation oracle for the feasible region: that is, an algorithm that, given a point,

  1. Checks whether it is feasible and, if not,
  2. Constructs an inequality that is satisfied by all feasible point but not satisfied by the given point.

In order to construct a separation oracle for a SDP, it is enough to solve the following problem: given a matrix {M}, decide if it is PSD or not and, if not, construct an inequality {\sum_{ij}a_{ij}x_{ij} \geq 0} that is satisfied by the entries of all PSD matrices but that is not satisfied by {M}. In order to do so, recall that the smallest eigenvalue of {M} is

\displaystyle  \min_{{\bf y}} \frac {{\bf y}^T M {\bf y} }{|| {\bf y}||^2 }

and that the above minimization problem is solvable in polynomial time (up to arbitrarily good accuracy). If the above optimization problem has a non-negative optimum, then {M} is PSD. If it is a negative optimum {{\bf y}^*}, then the matrix is not PSD, and the inequality

\displaystyle  \sum_{i,j} X_{i,j} y^*_i y^*_j \geq 0

is satisfied for all PSD matrices {X} but fails for {X:= M}. Thus we have a separation oracle and we can solve SDPs in polynomial time up to arbitrarily good accuracy.

3. SDP Relaxation of Max Cut and Random Hyperplane Rounding

The Max Cut problem in a given graph {G=(V,E)} has the following equivalent characterization, as a quadratic optimization problem over real variables {x_1,\ldots,x_n}, where {V = \{ 1,\ldots,n\}}:

\displaystyle  \begin{array}{rcl}  {\rm MaxCut} (G) =& \max & \sum_{(i,j) \in E} \frac 14 (x_i - x_j)^2 \\ & s.t.\\ && x_i^2 = 1 \ \ \ \ \ \forall i \in V \end{array}

We can interpret this as associating every vertex {v} with a value {x_v = \pm 1}, so that the cut edges are those with one vertex of value {+1} and one of value {-1}.

While quadratic optimization is NP-hard, we can instead use a relaxation to a polynomial-time solvable problem. We note that any quadratic optimization problem has a natural relaxation to an SDP, in which we relax real variables to take vector values and we change multiplication to inner product:

\displaystyle  \begin{array}{rcl}  {\rm MaxCut} (G) \leq & \max & \sum_{(i,j) \in E} \frac 14 || {\bf x}_i - {\bf x}_j ||^2 \\ & s.t.\\ && || {\bf x}_i|| ^2 = 1 \ \ \ \ \ \forall i \in V \end{array}

Figure 1: The hyperplane through the origin defines a cut partitioning the vertices into sets {\{x_1, x_2\}} and {\{x_3, x_4\}}.

Solving the above SDP, which is doable in polynomial time up to arbitrarily good accuracy, gives us a unit vector {{\bf x}_i} for each vertex {i}. A simple way to convert this collection to a cut {(S,V-S)} is to take a random hyperplane through the origin, and then define {S} to be the set of vertices {i} such that {{\bf x}_i} is above the hyperplane. Equivalently, we pick a random vector {{\bf g}} according to a rotation-invariant distribution, for example a Gaussian distribution, and let {S} be the set of vertices {i} such that {\langle {\bf g}, {\bf x}_i \rangle \geq 0}.

Let {(i,j)} be an edge: One sees that if {\theta} is the angle between {{\bf x}_i} and {{\bf x}_j}, then the probability {(i,j)} is cut is proportional to {\theta}:

\displaystyle  \mathop{\mathbb P} [ (i,j) \mbox{ is cut } ] = \frac {\theta}{\pi}

and the contribution of {(i,j)} to the cost function is

\displaystyle  \frac 14 || {\bf x}_i - {\bf x}_j ||^2 = \frac 12 - \frac 12 \langle {\bf x}_i , {\bf x}_j \rangle = \frac 12 - \frac 12 \cos \theta

Some calculus shows that for every {0 \leq \theta \leq \pi} we have

\displaystyle  \frac {\theta}{\pi} > .878 \cdot \left( \frac 12 - \frac 12 \cos \theta \right)

and so

\displaystyle  \mathop{\mathbb E} [ \mbox{ number of edges cut by } (S,V-S) ] \geq .878 \cdot \sum_{(i,j) \in E} \frac 14 || {\bf x}_i - {\bf x}_j ||^2

\displaystyle  = .878 \cdot {\rm SDPMaxCut}(G) \geq .878 \cdot {\rm MaxCut} (G)

so we have a polynomial time approximation algorithm with worst-case approximation guarantee {.878}.

Next time, we will see how the SDP relaxation behaves on random graphs, but first let us how it behaves on a large class of graphs.

4. Max Cut in Bounded-Degree Triangle-Free Graphs

Theorem 6 If {G= (V,E)} is a triangle-free graph in which every vertex has degree at most {d}, then

\displaystyle  MaxCut(G) \geq \left( \frac 12 +\Omega \left( \frac 1 {\sqrt d} \right) \right) \cdot |E|

Proof: Consider the following feasible solution for the SDP: we associate to each node {i} an {n}-dimensional vector {{\bf x}^{(i)}} such that {x^{(i)}_i = \frac 1{\sqrt 2}}, {x^{(i)}_j = -1/\sqrt{2deg(i)}} if {(i,j) \in E}, and {x^{(i)}_j = 0} otherwise. We immediately see that {||{\bf x}^{(i)} ||^2 = 1} for every {i} and so the solution is feasible.

For example, if we have a graph such that vertex 1 is adjacent to vertices 3 and 5:

{1} 2 3 4 5 {\cdots}
{x^{(1)}: } {\frac 1{\sqrt 2}} 0 {-\frac{1}{\sqrt[]{2deg(1)}}} 0 {-\frac{1}{\sqrt[]{2deg(1)}}}
{x^{(2)}: } 0 {\frac 1{\sqrt 2}} 0 0 0
{x^{(3)}: } {-\frac{1}{\sqrt[]{2deg(3)}}} 0 {\frac 1{\sqrt 2}} 0 0
{\vdots} {\vdots}
{x^{(n)}: } 0 0 0 0 0 {\cdots}

Let us transform this SDP solution into a cut {(S,V-S)} using a random hyperplane.

We see that, for every edge {(i,j)} we have

\displaystyle  \langle {\bf x}^{(i)}, {\bf x}^{(j)} \rangle = - \frac 1 {\sqrt{2d(i)}} - \frac 1 {\sqrt{2d(j)}} \leq - \frac 1 {\sqrt d}

The probability that {(i,j)} is cut by {(S,V-S)} is

\displaystyle  \frac { \arccos \left( \frac 12 - \frac 1 {2 \sqrt d} \right ) }{\pi}

and

\displaystyle  \frac { \arccos \left( \frac 12 - \frac 1 {2 \sqrt d} \right )}{\pi } = \frac 12 + \frac {\arcsin \left( \frac 1 {2 \sqrt d} \right) }{\pi} \geq \frac 12 + \Omega \left( \frac 1 {\sqrt d} \right)

so that the expected number of cut edges is at least {\left( \frac 12 + \Omega \left( \frac 1 {\sqrt d} \right) \right) \cdot |E|}. \Box

Beyond Worst Case Analysis: Lecture 3

Scribed by Keyhan Vakil

In which we complete the study of Independent Set and Max Cut in {G_{n,p}} random graphs.

1. Maximum Independent Set

Last time we proved an upper bound of {O\left( \frac 1p \log np \right)} to the probable value of the maximum independent set in a {G_{n,p}} random graph. This bound also holds if {p} is a function of {n}. There is a simple greedy algorithm which can be shown to achieve an independent set of size {\Omega(n/d)} where {d} is the average degree of the graph. For a {G_{n,p}} random graph, this gives us an independent of size {\Omega(1/p)}. However we will see how to specialize this analysis to sparse {G_{n,p}} random graphs, and close the remaining gap between the probable value and the greedy algorithm.

Consider the greedy algorithm below.

  • {S:= \emptyset}
  • for each {v\in V}
    • if {v} has no neighbors in {S} then {S:= S \cup \{ v \}}
  • return {S}

1.1. First attempt

We might try to model our analysis of this algorithm based on our discussion from Lecture~2.

To wit, let {R} be the set of vertices not in {S} which have no neighbors in {S}. Let {R_i} be the size of {R} when {S} contains {i} vertices. If {R_k = 0}, then our algorithm outputs an independent set of size {k}. Therefore we can determine the expected size of the algorithm’s output (up to a constant factor) by determining {k} such that {\mathop{\mathbb E}[R_k] = O(1)}.

Now we determine {\mathop{\mathbb E}[R_{i+1} \mid R_i]}. A proportion of {p} vertices are connected to the {(i+1)}th vertex in expectation. Of the {R_i} vertices, we expect that {1-p} of them will remain unconnected to all the vertices in {S}. This gives us that {\mathop{\mathbb E}[R_{i+1} \mid R_i] = (1-p)R_i}, and by induction {\mathop{\mathbb E}[R_k] = (1-p)^k n}.

Let {k} be such that {\mathop{\mathbb E}[R_k] = 1}. Then:

\displaystyle  \mathop{\mathbb E}[R_k] = (1-p)^k n = 1 \implies k = \log_{\frac 1{1-p}} n \approx \frac 1p \ln n

We conclude that our independent set has expected size {\Theta(\frac1p \log n)}. However if we take {p = \Theta(1/n)}, that would lead us to believe that we could get an independent set of size {\Theta(n \log n)} in a graph with only {n} vertices, which is impossible.

The error is that {\mathop{\mathbb E}[R_{i+1} \mid R_i]} should be {(1-p)(R_i - 1)}, not {(1-p)R_i}. Note that once we add the {(i+1)}th vertex to {S}, it can no longer be in {R} by definition. When {p} is a constant, the difference is negligible, but when {p} is small then the difference becomes more significant.

It is possible to salvage this analysis, but the result is less elegant. Instead we will now present a different analysis, which will also let us conclude more about higher moments as well.

1.2. Analysis of the greedy algorithm

To analyze the algorithm, consider the following random variables: let {t_i} be the number of for-loop iterations between the time the {i}-th element is added to {S} and the time the {(i+1)}-th element is added to {S}. We leave {t_i} undefined if the algorithm terminates with a set {S} of size less than {i+1}. Thus the size of the independent set found by the algorithm is the largest {i} such that {t_{i-1}} is defined. Consider the following slightly different probabilistic process: in addition to our graph over {n} vertices {\{1,\ldots , n \}}, we also consider a countably infinite number of other vertices {n+1,n+2,\ldots}. We sample an infinite super-graph of our graph over this larger vertex set, so that each possible edge has probability {p} of being generated.

We continue to run the greedy algorithm for every vertex of this infinite graph, and we call {t_i} the (now, always defined) number of for-loop iterations between the {i}-th and the {(i+1)}-th time that we add a node to {S}. In this revised definition, the size of the independent set found by algorithm in our actual graph is the largest {k} such that {t_0 + t_1 + \cdots + t_{k-1} \leq n}.

Now we will reason about the distribution of {t_i}. Say that we have {i} vertices in {S} and we are trying to determine if we should add some vertex {v} to {S}. Note that the probability of {v} being disconnected from all of {S} is {(1-p)^i}. So we add a vertex at each iteration with probability {(1-p)^i}, which shows that {t_i} is geometrically distributed with success probability {(1-p)^i}.

Based on this, we can find the expected value and variance of our sum from before

\displaystyle \mathop{\mathbb E} \left[ t_0 + t_1 + \cdots t_{k-1} \right] = \frac { \frac 1 {(1-p)^k} - 1 }{\frac 1 {1-p} - 1} \leq \frac { \frac 1 {(1-p)^k}}{\frac 1 {1-p} - 1} = \frac 1 {p\cdot (1-p)^{k-1}}

and likewise

\displaystyle  \begin{array}{rcl}  \mathop{\bf Var}[t_0 + t_1 + \cdots t_{k-1}] & \leq & \sum_{i=0}^{k-1} \frac 1 {(1-p)^{2i}} \\ &= & \frac { \frac 1 {(1-p)^{2k}} - 1 }{\frac 1 {(1-p)^2} - 1} \\ & \leq & \frac 1 {(1 - (1-p)^2 ) \cdot (1-p)^{2k-2 } } \\ & \leq & \frac 1 {p \cdot (1-p)^{2k - 2} } \\ & = & p \left( \mathop{\mathbb E}[t_0 + \cdots + t_{k-1}] \right)^2. \end{array}

We want to choose {k} so that the sum is at most {n} with high probability. Let

\displaystyle  k = \log_{\frac {1}{1-p}} \frac {pn}2 \approx \frac 1p \ln pn .

This makes the expected value of the sum {\le n/2} and the standard deviation {\le \sqrt{p}n / 2}. Thus, if {p(n) \rightarrow 0} sufficiently fast, the greedy algorithm has a {1-o(1)} probability of finding an independent set of size {\Omega( p^{-1} \log pn ) = \Omega\left( \frac nd \log d \right)}, where {d := np} is a measure of the average degree.

1.3. Certifiable upper bound

We now derive a polynomial time computable upper bound certificate for maximum independent set in {G_{n,p}}. We use the following lemma without proof. Note its similarity to Lemma~2 from Lecture~1.

Lemma 1 If {p = p(n) \ge \frac {\log n}n}, {G} is sampled from {G_{n,p}}, {A} is the adjacency matrix of {G}, and {J} is the matrix of all ones, then there is a {1-o(1)} probability that

\displaystyle  \lVert A - p J \rVert \leq O( \sqrt {pn })

Since {A - pJ} is a real symmetric matrix its spectral norm can be computed as:

\displaystyle  \lVert A - pJ \rVert = \max_{{\bf x} \neq {\bf 0}} \frac{|{\bf x}^T(A - pJ){\bf x}|}{{\bf x}^T {\bf x}} \;.

If {S} is an independent set of size {k}, then {{\bf 1}_S^T A {\bf 1}_S = 0}, {{\bf 1}_S^T J {\bf 1}_S = k^2}, and {{\bf 1}_S^T {\bf 1}_S = k}, so that

\displaystyle  \begin{array}{rcl}  \lVert A - pJ \rVert &\geq & \frac{|{\bf 1}_S^T(A - pJ){\bf 1}_S|}{{\bf 1}_S^T {\bf 1}_S} \\ &= & pk. \end{array}

This bound holds for any independent set, so it also holds for the largest one. If we denote by {\alpha(G)} the size of the largest independent set in {G}, we have that

\displaystyle  \alpha(G) \leq \frac 1p \lVert A - p J \rVert .

For a {G_{n,p}} random graph, the above upper bound is {O(\sqrt{n/p}) = O(n/\sqrt d)} with high probability.

2. Max Cut

We will now reconsider Max Cut for the general case {G_{n,p}}. In Lecture~2, we dealt with the special case of {p=\frac12}. Unlike maximum independent set, our arguments for the case {p=\frac12} apply to Max Cut without much modification.

2.1. High probability upper bound

Let {G} be a random graph from {G_{n,p}}, and define {d := pn} as a measure of its average degree. We will prove that the size of a maximum cut of {G} is at most {dn/4 + O(\sqrt d n)} with high probability. The proof of this statement is nearly identical to the version in Lecture~2, where it was presented for the case {p=\frac12}. We know that the expected value of a cut {S} is {|S| \cdot |V-S| \le dn / 4}. By a Chernoff bound, the probability that any particular cut exceeds expectation by an additive factor of {O(\epsilon n)} is exponentially decreasing by a factor of {\epsilon^2 dn}. By taking {\epsilon = 1/\sqrt{d}} and taking a union bound over all {2^n} possible cuts {S}, we have that our expected cut has value at most {dn / 4 + O(\sqrt d n)} with probability {1 - 2^{-\Omega(n)}}.

2.2. Greedy algorithm

Consider the greedy algorithm

  • {A:= \emptyset}
  • {B:= \emptyset}
  • for each {v\in V}
    • if {v} has more neighbors in {B} than in {A} then {A:= A \cup \{ v \}}
    • else {B:= B \cup \{ v\}}
  • return {(A,B)}.

Label {V = \{ 1,\ldots,n \}}. Let {A_i} and {B_i} be the sets {A} and {B} when vertex {i} is considered in the for-loop. For the purpose of analysis, we delay the random decisions in {G} until a vertex is considered. In particular, we delay the choice of which of {1, 2, \ldots, i - 1} is a neighbor until {i} is vertex {i} is considered. Note that no edge needs to be considered twice, and so we can treat each one as an independent biased coin flip.

Let {a_i} and {b_i} be the neighbors of {i} in {A_i} and {B_i} respectively. We can show that {|a_i - b_i| = \max(a_i, b_i) - \frac12 (a_i + b_i)}, and so {\sum_i |a_i - b_i|} is the gain our algorithm achieves over cutting half the edges.

Now {|a_i - b_i|} has expectation {\Omega( \sqrt {pi} )} and variance {O(pi)}. Adding over all {i}, the sum of the differences has mean {\Omega( n \sqrt{pn} )} and variance {O(pn^2)}. This gives us an expected gain of {\Omega( n \sqrt {pn}) = \Omega( n \sqrt d)} with {1-o(1)} probability. The value of cutting half the edges is approximately {dn / 4}. This gives a final value of {dn/4 + \Omega(n\sqrt d)} w.h.p. as stated.

2.3. Certifiable upper bound

Again, we will derive a certifiable upper bound by looking at the spectral norm. If {(S,V-S)} is a cut with value {\frac {dn}4 + C}, then we have

\displaystyle  {\bf 1}_S^T A {\bf 1}_{V-S} = \frac {dn}4 + C

\displaystyle  {\bf 1}_S^T p J {\bf 1}_{V-S} = p \cdot |S| \cdot |V-S| \leq p \cdot \frac {n^2} 4 = \frac {dn}4

\displaystyle  \lVert {\bf 1}_S \rVert \cdot \lVert {\bf 1}_{V-S} \rVert = \sqrt { |S| \cdot |V-S| } \leq \sqrt { \frac {n^2}4 }

so

\displaystyle  C \leq 2n \cdot \lVert {\bf 1}_S \rVert \cdot \lVert {\bf 1}_{V-S} \rVert .

This means that, in every graph, the maximum cut is upper bounded by

\displaystyle  \frac {dn}4 + \frac n2 \left\lVert A - \frac dn J \right\rVert

which if {d \ge \log n} is with high probability at most {\frac {dn}4 + O( n \sqrt d)} (by Lemma~1).

3. Conclusion

We conclude with the following table, which summarizes our results for a random graph sampled from {G_{n, d/n}}.

Problem Expected Value Greedy Algorithm Certifiable Upper Bound
Independent Set {O\left(\frac nd \log d\right)} {\Omega\left(\frac nd \log d\right)} w.h.p. {O\left(\frac n{\sqrt{d}} \right)} w.h.p.*
Max Cut {\frac{dn}4 + O(n \sqrt d)} {\frac {dn}4 + \Omega(n \sqrt d)} w.h.p. {\frac {dn} 4 + O(n \sqrt d)} w.h.p.*

* Note that both certifiable upper bounds require {d \ge \log n}.

Both greedy algorithms perform very well in comparison to the probable value. In Max~Cut, our greedy algorithm is particularly strong, matching our certifiable upper bound up to a lower order term. This supports one of our major theses: while greedy algorithms exhibit poor worst-case performance, they tend to do well over our given distribution.

Beyond Worst Case Analysis: Lecture 2

Scribe: Mahshid Montazer

In this lecture, we study the Max Cut problem in random graphs. We compute the probable value of its optimal solution, we give a greedy algorithm which is nearly optimal on random graphs and we compute a polynomial time upper bound certificate for it using linear algebra methods. We also study the problem of Maximum Independent Set in random graphs and we compute an upper bound to the probable value for its optimal solution.

1. Max Cut

Definition 1 Max Cut: In an un-weighted graph {G=(V,E)}, a cut is defined as a partition of its vertices into two sets {V_1} and {V_2}. Let {E(V_1, V_2)} be the size of the cut {(V_1, V_2)} which is the number of the edges with one endpoint in {V_1} and one endpoint in {V_2}. Max Cut is the the problem of finding a cut of largest size.

To give a clear example, in every bipartite graph, a bipartition is a maximum cut. It is easy to show that the size of the maximum cut would be at least half of the number of the graph edges. One question that arises here is that how much more than half of the edges can we cut. The answer is: not that much in random graphs. We will show this claim in the following section.

2. Probable Value of Max Cut Optimal Solution

In this section, we compute the probable value of Max Cut optimal solution in random graphs. Our result is for samples of {G_{n,\frac{1}{2}}}, but the analysis will generalize to {G_{n,p}}.

Lemma 2 For every fixed cut {(S,V-S)}, {\mathop{\mathbb E} [E(S, V\setminus S)] \leq \frac{n^2}{8}}.

Proof: {\mathop{\mathbb E} [E(S, V\setminus S)] = \left\vert S \right\vert \left\vert V\setminus S \right\vert \frac{1}{2} = \frac{n^2}{8}.} \Box

Lemma 3 {\mathop{\mathbb P} [E(S, V\setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] \leq e^{-\Omega(\epsilon^2 n^2)}} where {0 \leq \epsilon \leq \frac{1}{2}}.

Proof: The proof is by applying Chernoff bounds on the result of lemma 2. \Box

Lemma 4 There is a constant {c>0} such that

\displaystyle  \mathop{\mathbb P} [\exists (S,V \setminus S) \mid E(S,V \setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] \leq 2^{-n}

where {\epsilon = \frac{c}{\sqrt{n}}} and the probability is taken over the choice of {G=(V,E)} from the distribution {G_{n,\frac 12 }}.

Proof:

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P} [\exists (S,V \setminus S) \mid E(S,V \setminus S) \geq \frac{n^2}{8} + \epsilon \frac{n^2}{4}] & \leq & 2^n \cdot e^ {-\Omega(\epsilon^2 n^2)} \\  & \leq & 2^{-n}. \end{array}

for an appropriate choice of {c}. \Box

The above lemma clearly leads us to the following theorem.

Theorem 5 There is a constant {c} such that w.h.p. Max Cut in {G_{n,\frac{1}{2}}} is of size at most {\frac{n^2}{8} + c \cdot n^{1.5}.}

Thus, we showed that in {G_{n,1/2}}, the probable value of Max Cut is at most {\frac{n^2}{8} + c \cdot n^{1.5}}.

3. Greedy Algorithm for Max Cut

Consider the following greedy algorithm for Max Cut:

  • {A \leftarrow \emptyset , B \leftarrow \emptyset}
  • for {v \in V}
    • if {v} has more neighbors in {A} than in {B}, then {B \leftarrow B \cup \{v\}}
    • else {A \leftarrow A \cup \{v\}}
  • return {A} and {B}

The above algorithm can be applied to any graph, but we will analyze it on random graphs. A naive analysis of the algorithm guarantees that our greedy algorithm cuts at least half of the edges, giving us an approximation ratio of 2. The reason is that at each step, we add at least half of the processing vertex’s incident edges to the cut. However, a more careful analysis of the algorithm shows that it is near-optimal for random graphs. Below, we prove our claim for {G_{n,\frac{1}{2}}}.

Lemma 6 With high probability over the choice of {G} from {G_{n,\frac{1}{2}}}, the greedy algorithm finds a cut of size {\frac {n^2}8 + \Omega(n^{1.5})}.

Proof: Let {G(V,E) \sim G_{n,\frac{1}{2}}} be the given graph and let {v_1, v_2 , \cdots , v_n} be the order in which we process the vertices. Note that at the time of processing {v_i} {(1 \leq i <n)}, we do not need to know the edges that connect {v_i} to any vertex {v_j} {(j>i)}. Let { a_i = |A|} and {b_i = |B|} be the size of sets {A} and {B} before processing {v_i}, respectively. Although {G} is given before we run the algorithm, for the sake of the analysis, we can assume that we are building it on the go and while processing each of the vertices. Remember that each edge of the graph would exists independently with probability {\frac{1}{2}}. For deciding where to put {v_i}, we generate {a_i} random bits and call their summation {X_i}. We also generate {b_i} random bits and call their summation {Y_i}. We put {v_i} in set {A} (respectively, {B}) if {X_i \leq Y_i} (respectively, {Y_i < X_i}). Note that the more balanced {A} and {B} get, the worse it gets for the analysis. Also, note that the extra edges that the algorithm cuts other than half of the edges would be:

\displaystyle \sum_{1\leq i \leq n} {|X_i-Y_i|} = E(A, B) -\frac{|E|}{2}.

We know that

\displaystyle X_i-Y_i = \frac{a_i-b_i}{2}.

Note that

\displaystyle \mathop{\mathbb E}[|X_i - Y_i|] = \Omega(\sqrt{i})

and

\displaystyle \mathop{\bf Var}(|X_i - Y_i|) = O(i).

Thus, we have that {\sum_{1\leq i \leq n} {|X_i-Y_i|} } has mean {\Omega(n^{1.5})} and standard deviation {O(n)}. Thus, with {1-O(1)} probability we have:

\displaystyle  \sum_{1\leq i \leq n} {|X_i-Y_i|} = \sum_{1\leq i \leq n} {\Omega(\sqrt{i})} \geq \Omega(n^{1.5}).

\displaystyle \Rightarrow E(A,B) \geq \frac{n^2}{8} + \Omega(n^{1.5}).

\Box

4. Polynomial Time Upper Bound for Max Cut

In this section, we find polynomial time upper bound certificates for Max Cut in random graphs using linear algebra techniques.

Lemma 7 Let {G=(V,E)} be a graph, {A} be its adjacency matrix, {J} be the matrix all whose entries are 1 and {(S, V\setminus S)} be the Max Cut of {G}. Then

\displaystyle  E(S, V \setminus S) \leq \frac{n^2}{8} + \frac n2 || A - J/2 ||

Proof: we have:

\displaystyle  \begin{array}{rcl}  E(S, V \setminus S) - \frac{n^2}{8} & \leq & {\bf 1}^T_S \cdot ( A - J/2) \cdot {\bf 1}_{V\setminus S} \\ & \leq & || A - J/2 || \cdot || {\bf 1}_S || \cdot ||{\bf 1}_{V\setminus S} || \\ & \leq & || A - J/2 || \cdot \sqrt{|S|} \cdot \sqrt{|V \setminus S|} \\ & \leq & || A - J/2 || \cdot \frac{n}{2}\\ \end{array}

\Box

Recall that, with high probability over the choice of a graph {G} from {G_{n,\frac 12}}, if {A} is the adjacency matrix of {G} then we have {||A - J/2|| \leq O(\sqrt n)} with high probability.

We conclude that, with high probability over the choice of {G} from {G_{n,\frac 12}} we can find in polynomial time a certificate the max cut optimum of {G} is at most {\frac {n^2} 8 + O(n^{1.5})}.

5. Maximum Independent Set

In this section, we discuss the Maximum Independent Set problem for {G_{n,p}} (especially {G_{n,\frac{1}{2}}}) and we show its close connection with Max Clique problem. Finally, we compute its optimal solution’s probable value.

Definition 8 Maximum Independent Set: In a graph {G(V,E)}, an independent set is a set of vertices that are mutually disconnected. A Maximum Independent Set in {G} is an independent set of largest possible size. The Maximum Independent Set problem is the problem of finding such a set.

Note that the Maximum Independent Set in {G_{n,p}} corresponds to the Maximum Clique in {G_{n,1-p}}. Thus, for {p = \frac{1}{2}}, everything that we argued for Max Clique is usable for Maximum Independent Set as well.

In this section, we compute an upper bound to the probable value of Maximum Independent Set’s optimal solution in {G_{n,p}}.

Fix a set {S \subset V} of size {k}. We have

\displaystyle \mathop{\mathbb P} [S \text{ is an independent set in } G] = (1-p)^{\binom{k}{2}}

where the probability is over the choice of {G\sim G_{n,p}}.

The following lemma holds.

Lemma 9 {\mathop{\mathbb P}[\exists \text{ Independent Set of size } k] \leq e^{-\frac{k}{2} \left( ((k-1) \cdot \ln{\frac{1}{1-p}} - 2\ln{\frac{n}{k}} \right) }}

Proof:

\displaystyle  \begin{array}{rcl}  \mathop{\mathbb P}[\exists \text{ Independent Set of size } k] & \leq & \mathop{\mathbb E}[\text{\#Independent Sets of size k}]\nonumber \\ &= & \binom{n}{k} \cdot (1-p)^{\binom{k}{2}} \nonumber \\ & \leq & \left(\frac{n}{k} \right)^k \cdot (1-p)^{\frac{k^2}{2} - \frac{k}{2}} \nonumber \\ & =& e^{k \cdot \ln{\frac{n}{k}} - \left( \frac{k^2}{2} - \frac k2 \right) \cdot \ln{\frac{1}{1-p}}} \nonumber \\ & = & e^{-\frac{k}{2} ((k-1) \cdot \ln{\frac{1}{1-p}} - 2\ln{\frac{n}{k}})}.  \end{array}

\Box

Now, what would be the maximum value of {k} such that with high probability we can still make sure that there exists an independent set of size {k}? Note that the value of (0) goes to 0 when {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2}.

A sufficient condition for {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2} is to have {k = 2\log_\frac{1}{1-p} n +2}, showing us that there is a high probability that maximum independent set in {G_{n,p}} is at most {O\left ( \log_\frac{1}{1-p} n \right) = O \left ( \frac 1p \log n \right)}. A more careful bound is that we can have {k \geq 2\log_\frac{1}{1-p} \frac{n}{k} +2} provided, say, {k \geq 3 \log_{\frac 1{1-p}} np + 100}, and so with high probability the maximum independent set in {G_{n,p}} is at most {O \left ( \frac 1p \log p n \right)}. If we call {d=pn}, then the bound is {O \left ( \frac nd \log d \right)}

CS294 Lecture 5: Cheeger-type Inequalities for λn

In which we prove an analog of Cheeger’s inequalities for the largest Laplacian eigenvalue and we show how to use it to develop a spectral approximation algorithm for Max Cut.

1. Cheeger-type Inequalities for {\lambda_n}

Let {G=(V,E)} be an undirected graph (not necessarily regular), {D} its diagonal matrix of degrees, {A} its adjacency matrix, {L = I - D^{-1/2} A D^{-1/2}} its normalized Laplacian matrix, and {0 = \lambda_1 \leq \cdots \leq \lambda_n \leq 2} be the eigenvalues of {L}, counted with multiplicities and listed in non-decreasing order.

In Handout 2, we proved that {\lambda_k = 0} if and only if {G} has at least {k} connected component and {\lambda_n = 2} if and only if there is a connected component of {G} (possibly, all of {G}) that is bipartite.

A special case of the former fact is that {\lambda_2 = 0} if and only if the graph is disconnected, and the Cheeger inequalities give a “robust” version of this fact, showing that {\lambda_2} can be small if and only if the expansion of the graph is small. In these notes we will see a robust version of the latter fact; we will identify a combinatorial parameter that is zero if and only if the graph has a bipartite connected component, and it is small if and only if the graph is “close” (in an appropriate sense) to having a bipartite connected components, and we will show that this parameter is small if and only if {2-\lambda_n} is small.

Recall that

\displaystyle  2- \lambda_n = \min_{{\bf x} \in {\mathbb R}^n - \{ {\bf 0} \}} \ \ \frac { \sum_{\{ u,v \} \in E} (x_u + x_v)^2 } {\sum_{v\in V} d_v x_v^2 }

We will study the following combinatorial problem, which formalizes the task of finding an “almost bipartite connected component:” we are looking for a non-empty subset of vertices {S\subseteq V} (we allow {S=V}) and a bipartition {(A,B)} of {S} such that there is a small number of “violating edges” compared to the number of edges incident on {S}, where an edge {\{ u,v\}} is violating if it is in the cut {(S,V-S)}, if it has both endpoints in {A}, or if it has both endpoints in {B}. (Note that if there are no violating edges, then {S} is a bipartite connected component of {G}.)

It will be convenient to package the information about {A,B,S} as a vector {{\bf y} \in \{ -1,0,1 \}^n}, where the non-zero coordinates correspond to {S}, and the partition of {S} is given by the positive versus negative coordinates. We define the “bipartiteness ratio” of {{\bf y}} as

\displaystyle  \beta( {\bf y} ) := \frac { \sum_{\{ u,v \} \in E} |y_u + y_v| } {\sum_{v\in V} d_v| y_v| }

Note that in the numerator we have the number of violating edges, with edges contained in {A} or in {B} counted with a weight of 2, and edges from {S} to {V-S} counted with a weight of 1. In the denominator we have the sum of the degrees of the vertices of {S} (also called the volume of {S}, and written {vol(S)}) which is, up to a factor of 2, the number of edges incident on {S}.

(Other definitions would have been reasonable, for example in the numerator we could just count the number of violating edges without weights, or we could have the expression {\sum_{\{ u,v \} \in E} (y_u + y_v)^2 }. Those choices would give similar bounds to the ones we will prove, with different multiplicative constants.)

We define the bipartiteness ratio of {G} as

\displaystyle  \beta(G) = \min_{{\bf y} \in \{-1,0,1\}^n - \{ {\bf 0} \} } \ \ \beta({\bf y} )

We will prove the following analog of Cheeger’s inequalities:

\displaystyle  \frac {2-\lambda_n}{2} \leq \beta(G) \leq \sqrt { 2 \cdot (2 - \lambda_n) }

Continue reading

On the Unique Games Conjecture (part 1)

[I am preparing a survey talk on Unique Games for a mathematics conference, and writing a survey paper for a booklet that will be distributed at the conference. My first instinct was to write a one-line paper that would simply refer to Subhash’s own excellent survey paper. Fearing that I might come off as lazy, I am instead writing my own paper. Here is part 1 of some fragments. Comments are very welcome.]

Khot formulated the Unique Games Conjecture in a remarkably influential 2002 paper. In the subsequent eight years, the conjecture has motivated and enabled a large body of work on the computational complexity of approximating combinatorial optimization problems (the original context of the conjecture) and on the quality of approximation provided by “semidefinite programming” convex relaxations (a somewhat unexpected byproduct). Old and new questions in analysis, probability and geometry have played a key role in this development. Representative questions are:

  • The central limit theorem explains what happens when we sum several independent random variables. What happens if, instead of summing them, we apply a low degree polynomial function to them?
  • In Gaussian space, what is the body of a given volume with the smallest boundary?
  • What are the balanced boolean function whose value is most likely to be the same when evaluated at two random correlated inputs?
  • What conditions on the Fourier spectrum of a function of several variables imply that the function essentially depends on only few variables?
  • With what distortion is it possible to embed various classes of finite metric spaces into L1?

Continue reading

Max Cut-Gain and the Smallest Eigenvalue

In June, I wrote about my work on using spectral partitioning to approximate Max Cut.

I have now posted a revised paper with a couple new things.

One is an improved analysis due to Moses Charikar, of the algorithm described in the June paper. Moses shows that if one starts from a graph in which the optimum cuts a 1-\epsilon fraction of edges, then the algorithm cuts at least a 1-4\sqrt \epsilon + 8\epsilon fraction of edges (and also at least half of the edges). Cutting more than a 1-\frac 2 \pi \sqrt \epsilon + o_\epsilon(1) fraction of edges is Unique-Games-hard. Optimizing the fraction

\displaystyle \frac{ \max \{ \ \frac 12  \ , \ (1-4\sqrt \epsilon + 8\epsilon) \ \} }{1-\epsilon}

we see that the approximation ratio of the algorithm is always at least .531.

The second new result answers a question raised by an anonymous commenter in June: what about Max Cut Gain? Continue reading

Max Cut and the Smallest Eigenvalue

In the Max Cut problem, we are given an undirected graph G=(V,E) and we want to find a partition (L,\bar L) of the set of vertices such that as many edges as possible have one endpoint in L and one endpoint in \bar L, and are hence cut by the partition.

It is easy, as recognized since the 1970s, to find a partition that cuts half of the edges and that, thus, is at least half as good as an optimal solution. No approximation better than 1/2 was known for this problem, until Goemans and Williamson famously proved that one could achieve a .878… approximation using semidefinite programming (SDP).

No other approximation algorithm achieving an approximation asymptotically better than 1/2 is known, and it seems that a fundamental difficulty is the following. Suppose we prove that a certain algorithm achieves approximation > 51\%. Then, given a graph in which the optimum is, say, < 50.4 \%, the algorithm and its analysis must provide a certificate that the optimum cut in the given graph is < 50.4/51 < 99\%, and there is no general technique to prove upper bounds to the Max Cut optimum of a general graph other than Semidefinite Programming. (And see here and here for negative results showing that large classes of linear programming relaxations are unable to give such certificates.)

Spectral techniques can prove upper bounds to Max Cut in certain cases (and can be seen as special cases of the upper bounds provided by the Goemans-Williamson relaxation).

In the simplified case in which G=(V,E) is a d-regular graph, let A be the adjacency matrix of G and \lambda_1 \geq \lambda_2 \geq \cdots \lambda_n be the eigenvalues of A; then it is easy to show that

(1) \ \ \ \displaystyle maxcut(G) \leq \frac 12  + \frac 12 \cdot \frac {|\lambda_n|}{d}

where maxcut(G) is the fraction of edges cut by an optimal solution. Unfortunately (1) does not quite have an approximate converse: there are graphs where |\lambda_n|=d but maxcut(G) = \frac 12 + o_d(1).

The following fact, however, is always true and well known:

  • |\lambda_n|=d if and only if G contains a bipartite connected component.

Is there an “approximate” version of the above statement characterizing the cases in which d-|\lambda_n| is small? Surprisingly, as far as I know the question had not been considered before.

For comparison, the starting point of the theory of edge expansion is the related fact

  • \lambda_2=d if and only if G is disconnected.

Which can be rephrased as:

  • \lambda_2=d if and only if there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S)=0.

Cheeger’s inequality characterizes the case in which d-\lambda_2 is small:

  • If there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \epsilon \cdot d \cdot |S|, then \lambda_2 \geq d\cdot (1-2\epsilon);
  • If \lambda_2 \geq d\cdot (1-\epsilon) then there is a non-empty S\subseteq V, |S| \leq |V|/2 such that edges(S,V-S) \leq \sqrt{2 \epsilon} \cdot d \cdot |S|.

For a subset S\subseteq V, and a bipartition L,R=S-L of S, we say that an edge (i,j) fails [to be cut by] the bipartition if (i,j) is incident on S but it is not the case that one endpoint is in L and one endpoint is in R. (This means that either both endpoints are in L, or both endpoints are in R, or one endpoint is in S and one endpoint is not in S.) Then we can express the well-known fact about \lambda_n as

  • |\lambda_n|=d if and only if there is S\subseteq V and a bipartition of S with zero failed edges.

In this new paper I prove the following approximate version

  • If there is a non-empty S\subseteq V, and a bipartition of S with at most \epsilon \cdot d \cdot |S| failed edges, then |\lambda_n| \geq d\cdot (1-4\epsilon);
  • If |\lambda_n| \geq d\cdot (1-\epsilon), then there is a non-empty S\subseteq V, and a partition of S with at most \sqrt{2 \epsilon} \cdot d \cdot |S| failed edges.

The following notation makes the similarity with Cheeger’s inequality clearer. Define the edge expansion of a graph G as

\displaystyle h(G) = \min_{S\subseteq V. \ |S| \leq \frac {|V|}{2} } \frac {edges(S,V-S)} {d|S|}

Let us define the bipartiteness ratio of G as

\displaystyle \beta(G) = \min_{S, \ L \subseteq S, \ R=S-L} \frac{edges(L) + edges(R) + edges(S,V-S)}{d|S|}

that is, as the minimum ratio between failed edges of a partition of a set S over d|S|.

Then Cheeger’s inequality gives

\displaystyle \frac 12 \cdot \frac{d-\lambda_2}{d}  \leq h(G) \leq \sqrt{2 \cdot \frac{d-\lambda_2}{d} }

and our results give

\displaystyle \frac 14 \cdot \frac{d-|\lambda_n|}{d} \leq \beta(G) \leq \sqrt{2 \cdot \frac{d-|\lambda_n|}{d}}

This translates into an efficient algorithm that, given a graph G such that maxcut(G) \geq 1-\epsilon, finds a set S and a bipartition of S such that at least a 1- 4\sqrt{\epsilon} fraction of the edges incident on S are cut by the bipartition. Removing the vertices in S and continuing recursively on the residual graph yields a .50769… approximation algorithm for Max Cut. (The algorithm stops making recursive calls, and uses a random partition, when the partition of S found by the algorithm has too many failed edges.)

The paper is entirely a by-product of the ongoing series of posts on edge expansion: the question of relations between spectral techniques to max cut was asked by a commenter and the probabilistic view of the proof of Cheeger’s inequality that I wrote up in this post was very helpful in understanding the gap between \lambda_n and -d.