Search vs Decision vs Certification for Planted Problems

Edited 5/7/2018. Thanks to Sam Hopkins for several corrections and suggestions.

I am revising my notes from the course on “better-than-worst-case” analysis of algorithms. With the benefit of hindsight, in this post (and continuing in future posts) I would like to review again how one applies spectral methods and semidefinite programming to problems that involve a “planted” solution, and what is the role of concentration results for random matrices in the analysis of such algorithms.

Continue reading

Beyond Worst-Case Analysis: Lecture 11

Scribed by Neng Huang

In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.

1. A Brief Review of the Model

First, let’s briefly review the model. We have a random graph {G = (V, E)} with an unknown partition of the vertices into two equal parts {V_1} and {V_2}. Edges across the partition are generated independently with probability {q}, and edges inside the partition are generated independently with probability {p}. To abbreviate the notation, we let {a = pn/2}, which is the average internal degree, and {b = qn/2}, which is the average external degree. Intuitively, the closer are {a} and {b}, the more difficult it is to reconstruct the partition. We assume {a > b}, although there are also similar results in the complementary model where {b} is larger than {a}. We also assume {b > 1} so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

  1. For every {\epsilon > 0}, there exists a constant {c_\epsilon} such that if {a - b > c_\epsilon\sqrt{a+b}}, then we can reconstruct the partition up to less than {\epsilon n} misclassified vertices.
  2. There exists a constant {c} such that if {a-b \geq c \sqrt{\log n}\sqrt{a+b}}, then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every {\epsilon > 0}, there also exists a constant {c_\epsilon'} such that if {a-b < c_\epsilon'\sqrt{a+b}}, then it will be impossible to reconstruct the partition even if an {\epsilon} fraction of misclassified vertices is allowed. Also, the constant {c_\epsilon} will go to infinity as {\epsilon} goes to 0, so if we want more and more accuracy, {a-b} needs to be a bigger and bigger constant times {\sqrt{a+b}}. When the constant becomes {O(\sqrt{\log n})}, we will get an exact reconstruction as stated in the second result.

Continue reading

Beyond Worst-Case Analysis: Lecture 10

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

Continue reading

Beyond Worst-Case Analysis: Lecture 9

Scribed by Chinmay Nirkhe

In which we explore the Stochastic Block Model.

1. The {G_{n,p,q}} problem

The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the {G_{n,p,q}} problem.

Definition 1 ({G_{n,p,q}} graph distribution) The {G_{n,p,q}} distribution is a distribution on graphs of {n} vertices where {V} is partitioned into two 2 subsets of equal size: {V = V_1 \sqcup V_2}. Then for {\{i,j\}} pair of vertices in the same subset, {\Pr( (i,j) \in E) = p} and otherwise {\Pr( (i,j) \in E) = q}.

We will only consider the regime under which {p > q}. If we want to find the partition {V = V_1 \sqcup V_2}, it is intuitive to look at the problem of finding the minimum balanced cut. The cut {(V_1, V_2)} has expected size {q n^2 / 4} and any other cut will have greater expected size.

Our intuition should be that as {p \rightarrow q}, the problem only gets harder. And for fixed ratio {p/q}, as {p,q \rightarrow 1}, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for {p,q} then we can also solve it for {cp, cq} where {c > 1}, by keeping only {1/c} edges and reducing to the case we can solve.

Recall that for the {k}-planted clique problem, we found the eigenvector {{\bf x}} corresponding to the largest eigenvalue of {A - \frac{1}{2} J}. We then defined {S} as the vertices {i} with the {k} largest values of {|x_i|} and cleaned up {S} a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of {A - \left( \frac{p + q}{2} \right) J}. Note this is intuitive as the average degree of the graph is {p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}. The idea is simple: Solve {{\bf x}} the largest eigenvector corresponding to the largest eigenvalue and define

\displaystyle  V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)

Continue reading