Scribed by Neng Huang

*In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.*

**1. A Brief Review of the Model **

First, let’s briefly review the model. We have a random graph with an unknown partition of the vertices into two equal parts and . Edges across the partition are generated independently with probability , and edges inside the partition are generated independently with probability . To abbreviate the notation, we let , which is the average internal degree, and , which is the average external degree. Intuitively, the closer are and , the more difficult it is to reconstruct the partition. We assume , although there are also similar results in the complementary model where is larger than . We also assume so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

- For every , there exists a constant such that if , then we can reconstruct the partition up to less than misclassified vertices.
- There exists a constant such that if , then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every , there also exists a constant such that if , then it will be impossible to reconstruct the partition even if an fraction of misclassified vertices is allowed. Also, the constant will go to infinity as goes to 0, so if we want more and more accuracy, needs to be a bigger and bigger constant times . When the constant becomes , we will get an exact reconstruction as stated in the second result.

**2. The Algorithm **

Our algorithm will be based on semi-definite programming. Intuitively, the problem of reconstructing the partition is essentially the same as min-bisection problem, which is to find a balanced cut with the fewest edges. This is because the balanced cut with the fewest expected edges is exactly our hidden cut. Unfortunately, the min-bisection problem is -hard, so we will use semi-definite programming. The min-bisection problem can be stated as the following program: \begin{equation*} & {\text{minimize}} & & \sum_{(u, v) \in E} \frac{1}{4}(x_u – x_v)^2

& \text{subject to} & & x_v^2 = 1, \forall v \in V

&&& \sum_{v \in V}x_v = 0. \end{equation*} Its semi-definite programming relaxation will be

Our algorithm will be as follows.

- Solve the semi-definite programming above.
- Let be the optimal solution and such that .
- Find , which is the eigenvector corresponding to the largest eigenvalue of .
- Let , .
- Output as our partition.

Ideally, we want half of the ‘s pointing to one direction, and the other half pointing to the opposite direction. In this ideal case we will have

Then will be a rank-one matrix and , which is the indicator vector of the hidden cut, will be its eigenvector with eigenvalue . The remaining eigenvalues of will be all zeros. So finding the largest eigenvector of will reveal the hidden cut. In reality, if , then our solution will be almost the same as that in the ideal case, so the cut we get will be almost the same as the hidden cut. Furthermore, if , then the unique optimal solution of the SDP will be the combinatorial solution of min-bisection problem, that is, in the vector language, the one-dimensional solution.\footnote{“A miracle”, said Luca.}

**3. Analysis of the Algorithm **

First, we rearrange the SDP to make it slightly simpler. We have the following SDP:

We note that SDP1 and SDP2 have the same optimal solution, because the cost function of SDP1 is

The first term is a constant and the second is the cost function of SDP2 with a factor of -1/4.

Now, consider the cost of SDP2 of where

The expected cost will be

Since each edge is chosen independently, with high probability our cost will be at least , which implies that the optimal solution of SDP2 will be at least . Let be the optimal solution of the SDP, then we have

n(a-b) – O(n) & \leq cost(**x**_1^\ast, \ldots, **x**_n^\ast)\nonumber

& = \sum_{u,v}A_{uv}\langle**x**_u^\ast, **x**_v^\ast\rangle\nonumber

& = \sum_{u,v}\left(A_{uv} – \frac{a+b}{n}\right)\langle**x**_u^\ast, **x**_v^\ast\rangle

In the last equality we used the fact that

When we used the spectral method last week, we said that the largest eigenvalue of is large, where is the average degree. This is because the hidden cut will give us a vector with large Rayleigh quotient. But has a relatively small spectral norm, so everything should come from , which when simplified will be 1 for entries representing vertices on the same side and -1 for entries representing vertices on different sides. We will redo this argument with SDP norm in place of spectral norm and every step appropriately adjusted.

Recall that the SDP norm of a matrix is defined to be

Let , then by Grothendieck inequality we have

We proved in the previous lecture that with high probability, so we know that the SDP norm with high probability as well. By definition, this means

Substracting 3 from 3, we obtain

where is the all-one matrix and . Plugging 5 into 4, we get

which can be simplified to

For simplicity, in the following analysis the term will be called . Notice that is a matrix with 1 for nodes from the same side of the cut and -1 for nodes from different sides of the cut, and is an inner product of two unit vectors. If is very close to zero, then the sum will be very close to . This means that should be 1 for almost every pair of , which shows that is actually very close to . Now, we will make this argument robust. To achieve this, we introduce the Frobenius norm of a matrix.

Definition 1 (Frobenius norm)Let be a matrix. The Frobenius norm of is

The following fact is a good exercise.

Fact 2Let be a matrix. Then

where denotes the spectral norm.

To see how close are and , we calculate the Frobenius norm of , which will be

This gives us a bound on the spectral norm of , namely

Let be the unit eigenvector of corresponding to its largest eigenvalue, then by Davis-Kahan theorem we have\footnote{When we apply Davis-Kahan theorem, what we get is actually an upper bound on . We have assumed here that the bound holds for , but the exact same proof will also work in the other case.}

For any , if is a large enough constant then we will have . Now we have the following standard argument:

The last inequality is because every with will contribute at least 1 in the sum . This shows that our algorithm will misclassify at most vertices.