Scribed by Neng Huang
In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.
1. A Brief Review of the Model
First, let’s briefly review the model. We have a random graph with an unknown partition of the vertices into two equal parts
and
. Edges across the partition are generated independently with probability
, and edges inside the partition are generated independently with probability
. To abbreviate the notation, we let
, which is the average internal degree, and
, which is the average external degree. Intuitively, the closer are
and
, the more difficult it is to reconstruct the partition. We assume
, although there are also similar results in the complementary model where
is larger than
. We also assume
so that the graph is not almost empty.
We will prove the following two results, the first of which will be proved using Grothendieck inequality.
- For every
, there exists a constant
such that if
, then we can reconstruct the partition up to less than
misclassified vertices.
- There exists a constant
such that if
, then we can do exact reconstruct.
We note that the first result is essentially tight in the sense that for every , there also exists a constant
such that if
, then it will be impossible to reconstruct the partition even if an
fraction of misclassified vertices is allowed. Also, the constant
will go to infinity as
goes to 0, so if we want more and more accuracy,
needs to be a bigger and bigger constant times
. When the constant becomes
, we will get an exact reconstruction as stated in the second result.
2. The Algorithm
Our algorithm will be based on semi-definite programming. Intuitively, the problem of reconstructing the partition is essentially the same as min-bisection problem, which is to find a balanced cut with the fewest edges. This is because the balanced cut with the fewest expected edges is exactly our hidden cut. Unfortunately, the min-bisection problem is -hard, so we will use semi-definite programming. The min-bisection problem can be stated as the following program: \begin{equation*} & {\text{minimize}} & & \sum_{(u, v) \in E} \frac{1}{4}(x_u – x_v)^2
& \text{subject to} & & x_v^2 = 1, \forall v \in V
&&& \sum_{v \in V}x_v = 0. \end{equation*} Its semi-definite programming relaxation will be
Our algorithm will be as follows.
- Solve the semi-definite programming above.
- Let
be the optimal solution and
such that
.
- Find
, which is the eigenvector corresponding to the largest eigenvalue of
.
- Let
,
.
- Output
as our partition.
Ideally, we want half of the ‘s pointing to one direction, and the other half pointing to the opposite direction. In this ideal case we will have
Then will be a rank-one matrix and
, which is the indicator vector of the hidden cut, will be its eigenvector with eigenvalue
. The remaining eigenvalues of
will be all zeros. So finding the largest eigenvector of
will reveal the hidden cut. In reality, if
, then our solution will be almost the same as that in the ideal case, so the cut we get will be almost the same as the hidden cut. Furthermore, if
, then the unique optimal solution of the SDP will be the combinatorial solution of min-bisection problem, that is, in the vector language, the one-dimensional solution.\footnote{“A miracle”, said Luca.}
3. Analysis of the Algorithm
First, we rearrange the SDP to make it slightly simpler. We have the following SDP:
We note that SDP1 and SDP2 have the same optimal solution, because the cost function of SDP1 is
The first term is a constant and the second is the cost function of SDP2 with a factor of -1/4.
Now, consider the cost of SDP2 of where
The expected cost will be
Since each edge is chosen independently, with high probability our cost will be at least , which implies that the optimal solution of SDP2 will be at least
. Let
be the optimal solution of the SDP, then we have
n(a-b) – O(n) & \leq cost(x_1^\ast, \ldots, x_n^\ast)\nonumber
& = \sum_{u,v}A_{uv}\langlex_u^\ast, x_v^\ast\rangle\nonumber
& = \sum_{u,v}\left(A_{uv} – \frac{a+b}{n}\right)\langlex_u^\ast, x_v^\ast\rangle
In the last equality we used the fact that
When we used the spectral method last week, we said that the largest eigenvalue of is large, where
is the average degree. This is because the hidden cut will give us a vector with large Rayleigh quotient. But
has a relatively small spectral norm, so everything should come from
, which when simplified will be 1 for entries representing vertices on the same side and -1 for entries representing vertices on different sides. We will redo this argument with SDP norm in place of spectral norm and every step appropriately adjusted.
Recall that the SDP norm of a matrix is defined to be
Let , then by Grothendieck inequality we have
We proved in the previous lecture that with high probability, so we know that the SDP norm
with high probability as well. By definition, this means
Substracting 3 from 3, we obtain
where is the all-one matrix and
. Plugging 5 into 4, we get
which can be simplified to
For simplicity, in the following analysis the term will be called
. Notice that
is a matrix with 1 for nodes from the same side of the cut and -1 for nodes from different sides of the cut, and
is an inner product of two unit vectors. If
is very close to zero, then the sum will be very close to
. This means that
should be 1 for almost every pair of
, which shows that
is actually very close to
. Now, we will make this argument robust. To achieve this, we introduce the Frobenius norm of a matrix.
Definition 1 (Frobenius norm) Let
be a matrix. The Frobenius norm of
is
The following fact is a good exercise.
Fact 2 Let
be a matrix. Then
where
denotes the spectral norm.
To see how close are and
, we calculate the Frobenius norm of
, which will be
This gives us a bound on the spectral norm of , namely
Let be the unit eigenvector of
corresponding to its largest eigenvalue, then by Davis-Kahan theorem we have\footnote{When we apply Davis-Kahan theorem, what we get is actually an upper bound on
. We have assumed here that the bound holds for
, but the exact same proof will also work in the other case.}
For any , if
is a large enough constant then we will have
. Now we have the following standard argument:
The last inequality is because every with
will contribute at least 1 in the sum
. This shows that our algorithm will misclassify at most
vertices.