# Search vs Decision vs Certification for Planted Problems

Edited 5/7/2018. Thanks to Sam Hopkins for several corrections and suggestions.

I am revising my notes from the course on “better-than-worst-case” analysis of algorithms. With the benefit of hindsight, in this post (and continuing in future posts) I would like to review again how one applies spectral methods and semidefinite programming to problems that involve a “planted” solution, and what is the role of concentration results for random matrices in the analysis of such algorithms.

# Beyond Worst-Case Analysis: Lecture 11

Scribed by Neng Huang

In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.

1. A Brief Review of the Model

First, let’s briefly review the model. We have a random graph ${G = (V, E)}$ with an unknown partition of the vertices into two equal parts ${V_1}$ and ${V_2}$. Edges across the partition are generated independently with probability ${q}$, and edges inside the partition are generated independently with probability ${p}$. To abbreviate the notation, we let ${a = pn/2}$, which is the average internal degree, and ${b = qn/2}$, which is the average external degree. Intuitively, the closer are ${a}$ and ${b}$, the more difficult it is to reconstruct the partition. We assume ${a > b}$, although there are also similar results in the complementary model where ${b}$ is larger than ${a}$. We also assume ${b > 1}$ so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

1. For every ${\epsilon > 0}$, there exists a constant ${c_\epsilon}$ such that if ${a - b > c_\epsilon\sqrt{a+b}}$, then we can reconstruct the partition up to less than ${\epsilon n}$ misclassified vertices.
2. There exists a constant ${c}$ such that if ${a-b \geq c \sqrt{\log n}\sqrt{a+b}}$, then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every ${\epsilon > 0}$, there also exists a constant ${c_\epsilon'}$ such that if ${a-b < c_\epsilon'\sqrt{a+b}}$, then it will be impossible to reconstruct the partition even if an ${\epsilon}$ fraction of misclassified vertices is allowed. Also, the constant ${c_\epsilon}$ will go to infinity as ${\epsilon}$ goes to 0, so if we want more and more accuracy, ${a-b}$ needs to be a bigger and bigger constant times ${\sqrt{a+b}}$. When the constant becomes ${O(\sqrt{\log n})}$, we will get an exact reconstruction as stated in the second result.

# Beyond Worst-Case Analysis: Lecture 10

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

# Beyond Worst-Case Analysis: Lecture 9

Scribed by Chinmay Nirkhe

In which we explore the Stochastic Block Model.

1. The ${G_{n,p,q}}$ problem

The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the ${G_{n,p,q}}$ problem.

Definition 1 ( ${G_{n,p,q}}$ graph distribution) The ${G_{n,p,q}}$ distribution is a distribution on graphs of ${n}$ vertices where ${V}$ is partitioned into two 2 subsets of equal size: ${V = V_1 \sqcup V_2}$. Then for ${\{i,j\}}$ pair of vertices in the same subset, ${\Pr( (i,j) \in E) = p}$ and otherwise ${\Pr( (i,j) \in E) = q}$.

We will only consider the regime under which ${p > q}$. If we want to find the partition ${V = V_1 \sqcup V_2}$, it is intuitive to look at the problem of finding the minimum balanced cut. The cut ${(V_1, V_2)}$ has expected size ${q n^2 / 4}$ and any other cut will have greater expected size.

Our intuition should be that as ${p \rightarrow q}$, the problem only gets harder. And for fixed ratio ${p/q}$, as ${p,q \rightarrow 1}$, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for ${p,q}$ then we can also solve it for ${cp, cq}$ where ${c > 1}$, by keeping only ${1/c}$ edges and reducing to the case we can solve.

Recall that for the ${k}$-planted clique problem, we found the eigenvector ${{\bf x}}$ corresponding to the largest eigenvalue of ${A - \frac{1}{2} J}$. We then defined ${S}$ as the vertices ${i}$ with the ${k}$ largest values of ${|x_i|}$ and cleaned up ${S}$ a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of ${A - \left( \frac{p + q}{2} \right) J}$. Note this is intuitive as the average degree of the graph is ${p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}$. The idea is simple: Solve ${{\bf x}}$ the largest eigenvector corresponding to the largest eigenvalue and define $\displaystyle V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)$