Scribed by Chinmay Nirkhe

*In which we explore the Stochastic Block Model.*

**1. The problem **

The *Stochastic Block Model* is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the problem.

Definition 1 ( graph distribution)The distribution is a distribution on graphs of vertices where is partitioned into two 2 subsets of equal size: . Then for pair of vertices in the same subset, and otherwise .

We will only consider the regime under which . If we want to find the partition , it is intuitive to look at the problem of finding the minimum balanced cut. The cut has expected size and any other cut will have greater expected size.

Our intuition should be that as , the problem only gets harder. And for fixed ratio , as , the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for then we can also solve it for where , by keeping only edges and reducing to the case we can solve.

Recall that for the -planted clique problem, we found the eigenvector corresponding to the largest eigenvalue of . We then defined as the vertices with the largest values of and cleaned up a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of . Note this is intuitive as the average degree of the graph is . The idea is simple: Solve the largest eigenvector corresponding to the largest eigenvalue and define

As we proceed to the analysis of this procedure, we fix . Prior to fixing, the adjacency matrix was .\footnote{The diagonal should be zeroes, but this is close enough.} Upon fixing , the average adjacency matrix looks different. For ease of notation, if we write a bold constant for a matrix, we mean the matrix . It will be clear from context.

Here we have broken up into blocks according to the partition .

Theorem 2If then with high probability, .

*Proof:* Define the graph as the union of a graph on and graph on . Define the graph a a graph. Note that the graph is distributed according to picking a and graph and adding the partition crossing edges of to . Let and be the respective adjacency matrices and define the follow submatrices:

Then the adjacency matrix is defined by

Similarly, we can generate a decomposition for :

Then using triangle inequality we can bound by bounding the difference in the various terms.

The last line follows as the submatrices are adjacency matrices of graphs and we can apply the results we proved in that regime for .

But the difficulty is that we don’t know as . If we knew , then we would know the partition. What we can compute is .\footnote{The rest of this proof actually doesn’t even rely on knowing or . We can estimate by calculating the average vertex degree.} We can rewrite as

Call the matrix on the right . It is clearly rank-one as it has decomposition where . Therefore

Then is close (in operator norm) to the rank 1 matrix . Then their largest eigenvalues are close. But since has only one non-zero eigenvalue , finding the corresponding eigenvector to the largest eigenvalue of will be close to the ideal partition as describes the ideal partition. This can be formalized with the Davis-Kaham Theorem.

Theorem 3 (Davis-Kahan)Given matrices with where has eigenvalues and corresponding eigenvectors and has eigenvalues and corresponding eigenvectors , thenEquivalently,

The Davis Kahan Theorem with , and states that

where , the eigenvector associated with the largest eigenvalue of and , the expected degrees of the two parts of the graph. Choose between for the one closer to . Then

Recall that . If and disagree in sign, then this contributes at least to the value of . Equivalently, is at least the number of misclassified vertices. It is simple to see from here that if then we can bound the number of misclassified vertices by . This completes the proof that the proposed algorithm does well in calculating the partition of the Stochastic Block Model.