# Beyond Worst-Case Analysis: Lecture 10

Scribed by David Dinh

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

1. Fun with Norms

In the past few lectures, we’ve been heavily relying on the spectral norm,

$\displaystyle \Vert M\Vert=\max_{\Vert x\Vert=1}\vert\boldsymbol{x}^{T}M\boldsymbol{x}\vert=\max_{\Vert\boldsymbol{x}\Vert=1,\Vert\boldsymbol{y}\Vert=1}\boldsymbol{x}^{T}M\boldsymbol{y}$

which is efficiently computable and turns out to be pretty handy in a lot of cases.

Unfortunately, high-degree vertices have a disproportionately large influence on the spectral norm’s value, limiting its usefulness in graphs where such outliers exist. Often (as we did in lecture 9), people will try to modify the input so that there are no vertices of overly high degree or add some regularizing term to ameliorate this issue. Unfortunately, this can lead to less useful results – in lecture 9, for instance, we derived a bound that required that all high-degree vertices be excised from the input graph first.

In this lecture, we’ll attack this problem in a different way by introducing a different norm, the infinity-to-one norm, defined as follows:

$\displaystyle \Vert M\Vert_{\infty\rightarrow1}:=\max_{\boldsymbol{x},\boldsymbol{y}\in\{-1,1\}^{n}}\boldsymbol{x}^{T}M\boldsymbol{y}\ .$

It can be shown that

$\displaystyle \Vert M\Vert_{\infty\rightarrow1}\le n\Vert M\Vert\ .$

So spectral norm always gives us a bound for the infinity-to-one norm. However, the infinity-to-one norm can come in even more handy than the spectral norm (if we can actually calculate it):

Theorem 1 Let ${P}$ be a symmetric matrix with entries in ${[0,1]}$. Pick a random graph ${G}$ such that ${\{i,j\}}$ is in ${G}$ w.p. ${P_{ij}}$. Then whp:

$\displaystyle \Vert A-P\Vert_{\infty\rightarrow1}\le O\left(\sqrt{n\sum_{ij}P_{ij}}\right)=O(n\sqrt{d}).$

For context, recall that we proved a similar bound in Lecture 9 for the spectral norm, but we required that all nodes with degree greater than twice the average degree be removed first. The zero-to-one norm allows us to bypass that restriction entirely.

Proof: Fix ${\boldsymbol{x},\boldsymbol{y}\in\{+1,-1\}^{n}}$. Examine the expression

$\displaystyle \boldsymbol{P}\left[\boldsymbol{x}^{T}(A-P)\boldsymbol{y}\ge t\right]=\boldsymbol{P}\left[\sum_{ij}x_{y}y_{j}(A_{ij}-P_{ij})\ge t\right]$

We want to make sure that this probability exponentially decreases w.r.t. ${n}$.

Recall that Bernstein’s inequality states that given ${N}$ independent random variables, absolutely bounded by ${M}$, with expectation ${0}$, we have ${P[\sum_{i}X_{i}>t]\le\exp\left(-\frac{t^{2}/2}{\sum_{i}EX_{i}^{2}+3Mt}\right)}$.

${A_{ij}-P_{ij}}$ is either ${-P_{ij}}$ or ${1-P_{ij}}$ and therefore is bounded by ${[-1,+1]}$. Since ${\boldsymbol{x},\boldsymbol{y}\in\{+1,-1\}^{n}}$, we can have the ${M}$ from Bernstein’s inequality take on value ${1}$. Combining this with the fact that

$\displaystyle \mathbb{E}(x_{i}y_{i}(A_{ij}-P_{ij}))^{2}\le P_{ij},$

Bernstein’s inequality gives us

$\displaystyle \boldsymbol{P}\left[\sum_{ij}x_{y}y_{j}(A_{ij}-P_{ij})\ge t\right]\le\exp(-\frac{t^{2}/2}{\sum_{i}p_{ij}+3t})\le2^{-3n}$

provided that ${t>\sqrt{n\sum_{i,j}p_{ij}}}$, as desired. $\Box$

So this norm allows us to easily sidestep the issue of high-degree vertices. The problem is that it’s NP-hard to compute.

2. Grothendieck’s Inequality

However, it turns out that we can approximate the infinity-to-one norm to within a constant factor using SDP:

Theorem 2 (Grothendieck’s Inequality) There exists some ${c}$ (turns out to be around ${1.7}$, but we won’t worry about its exact value here) such that for all ${M}$,

$\displaystyle \max_{\begin{array}{c} \boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\\ \boldsymbol{y}_{1},...,\boldsymbol{y}_{n}\\ \Vert\boldsymbol{x}_{i}\Vert=1\\ \Vert\boldsymbol{y}_{y}\Vert=1 \end{array}}\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle \le c\max_{\boldsymbol{x},\boldsymbol{y}\in\{-1,1\}^{n}}\sum_{i,j}M_{ij}x_{i}y_{j}=\Vert M\Vert_{\infty\rightarrow1}$

So instead of dealing the combinatorially huge problem of optimizing over all ${\pm1}$ vectors, we can just solve an SDP instead to get a good approximation. For convenience, let’s denote the quantity on the left of the above expression as ${\Vert M\Vert_{SDP}}$.

Let’s start with a warmup lemma. The proof techniques in this lemma – specifically, the trick of replacing a vector from a continuous distribution with a random vector from some discrete distribution, and then taking the expectation to relate the two quantities, will come in handy later on.

Lemma 3 ${\max_{\boldsymbol{x},\boldsymbol{y}\in\{-1,1\}^{n}}\boldsymbol{x}^{T}M\boldsymbol{y}=\max_{\boldsymbol{x},\boldsymbol{y}\in[-1,1]^{n}}\boldsymbol{x}^{T}M\boldsymbol{y}}$. In other words, maximizing over the discrete space of ${\pm1}$ random vectors and maximizing over the continuous space of vectors in that range gives the same result.

Proof: It is obvious that the expression on the left is at least the right (since it’s just a relaxation). In order to show that the right is at least the left: given some continuous vectors ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$, we can find discrete ${-1}$,${1}$ vectors ${\boldsymbol{b}_{i},\boldsymbol{c}_{i}}$ such that their expectations are equal to ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$. We can do that by having ${\boldsymbol{b}_{i}}$ take on value ${+1}$ w.p. ${1/2+x_{i}/2}$ and ${-1}$ w.p. ${1/2-x_{i}/2}$, likewise for ${\boldsymbol{c}_{i}}$.

That means that:

$\displaystyle \begin{array}{rcl} \mathbb{E}\sum_{i,j}M_{i,j}\boldsymbol{b}_{i}\boldsymbol{c}_{j} & = & \sum_{i,j}M_{i,j}(\mathbb{E}\boldsymbol{b}_{i})(\mathbb{E}\boldsymbol{c}_{j})\\ & = & \sum_{i,j}M_{ij}\boldsymbol{x}_{i}\boldsymbol{y}_{j} \end{array}$

which gives us the desired result. $\Box$

Fact 4 ${\Vert M\Vert_{SDP}}$ is a norm.

Proof: Multiplicative scaling: obvious.

Nonnegativity (except iff ${M=0}$): It is obvious that the SDP norm is zero if ${M}$ is zero.

Now suppose ${M}$ is nonzero.

Notice that we can replace the constraints in the SDP norm requiring that ${\Vert\boldsymbol{x}_{i}\Vert=1}$, ${\Vert\boldsymbol{y}_{i}\Vert=1}$ with ${\Vert\boldsymbol{x}_{i}\Vert\le1}$, ${\Vert\boldsymbol{y}_{i}\Vert\le1}$. Why? We’ll use the same trick as we did in the proof of Lemma 3:

Obviously maximizing over ${\Vert\boldsymbol{x}_{i}\Vert\le1}$, ${\Vert\boldsymbol{y}_{i}\Vert\le1}$ will give us at least as good a result as maximizing over ${\Vert\boldsymbol{x}_{i}\Vert=1}$, ${\Vert\boldsymbol{y}_{i}\Vert=1}$, since it’s a relaxation, so it suffices to show that if we can obtain some value for ${\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle }$ using ${\Vert\boldsymbol{x}_{i}\Vert\le1}$, ${\Vert\boldsymbol{y}_{i}\Vert\le1}$, we can do at least as well using ${\Vert\boldsymbol{x}_{i}\Vert=1}$, ${\Vert\boldsymbol{y}_{i}\Vert=1}$.

Let’s suppose we have some vectors ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$ with length at most ${1}$. Now let’s replace them with random vectors ${\boldsymbol{r}_{i},\boldsymbol{s}_{i}}$ of length exactly ${1}$ whose expectation are ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$ respectively (just scale the ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$ up, and have ${\boldsymbol{r}_{i},\boldsymbol{s}_{i}}$ be either the scaled value or its negative with probability calibrated appropriately so their expectations work out to be ${\boldsymbol{x}_{i},\boldsymbol{y}_{i}}$). Then we can just say:

$\displaystyle \begin{array}{rcl} \mathbb{E}\sum_{i,j}M_{ij}\left\langle \boldsymbol{r}_{i},\boldsymbol{s}_{j}\right\rangle & = & \sum_{i,j}M_{ij}\left\langle \mathbb{E}\boldsymbol{r}_{i},\mathbb{E}\boldsymbol{s}_{j}\right\rangle \\ & = & \sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle \ . \end{array}$

So ${\boldsymbol{r}_{i},\boldsymbol{s}_{j}}$ must take on some values (of length exactly ${1}$) that make ${\sum_{i,j}M_{ij}\left\langle \boldsymbol{r}_{i},\boldsymbol{s}_{j}\right\rangle \ge\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle }$, as desired.

Since we assumed ${M}$ is nonzero, let ${a}$, ${b}$ be such that ${M_{a,b}\ne0}$. If we set ${\boldsymbol{x}_{a}=\pm\boldsymbol{y}_{b}}$ to be some arbitrary vector of unit length – using a plus if ${M_{a,b}}$ is positive and a minus if it’s negative – and all other ${\boldsymbol{x}_{i},\boldsymbol{y}_{j}}$ to zero (which we can do without affecting the value of the max by the fact we just proved), we can immediately see that

$\displaystyle \sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle =M_{ab}\langle\boldsymbol{x}_{a},\boldsymbol{y}_{b}\rangle>0,$

giving a positive lower bound for the maximizer, as desired.

Triangle inequality: Just look at the ${x}$ and ${y}$ that maximize ${\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{i}\right\rangle }$ for ${M=A}$ and ${M=B}$, and observe that

$\displaystyle \max_{\begin{array}{c} \boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\\ \boldsymbol{y}_{1},...,\boldsymbol{y}_{n}\\ \Vert\boldsymbol{x}_{i}\Vert=1\\ \Vert\boldsymbol{y}_{y}\Vert=1 \end{array}}\sum_{i,j}(A+B)_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle \le\max_{\begin{array}{c} \boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\\ \boldsymbol{y}_{1},...,\boldsymbol{y}_{n}\\ \Vert\boldsymbol{x}_{i}\Vert=1\\ \Vert\boldsymbol{y}_{y}\Vert=1 \end{array}}\sum_{i,j}A_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle +\max_{\begin{array}{c} \boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\\ \boldsymbol{y}_{1},...,\boldsymbol{y}_{n}\\ \Vert\boldsymbol{x}_{i}\Vert=1\\ \Vert\boldsymbol{y}_{y}\Vert=1 \end{array}}\sum_{i,j}B_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle$

since we can always match the quantity on the left hand side by choosing the same ${\boldsymbol{x_{i}}}$ and ${\boldsymbol{y_{j}}}$ for both terms on the right-hand-side. $\Box$

2.1. Proof of Grothendieck’s inequality

Now let’s prove Grothendieck’s inequality:

Proof: Observe that, by Lemma 3, maximizing over the choice of ${\boldsymbol{x},\boldsymbol{y}\in\{-1,1\}^{n}}$ in the zero-to-one norm is equivalent to maximizing over the choice of ${\boldsymbol{x},\boldsymbol{y}\in[-1,1]^{n}}$, so we can rewrite our proof goal as

$\displaystyle \max_{\begin{array}{c} \boldsymbol{x}_{1},...,\boldsymbol{x}_{n}\\ \boldsymbol{y}_{1},...,\boldsymbol{y}_{n}\\ \Vert\boldsymbol{x}_{i}\Vert=1\\ \Vert\boldsymbol{y}_{y}\Vert=1 \end{array}}\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle \le c\max_{\boldsymbol{x},\boldsymbol{y}\in[-1,1]^{n}}\sum_{i,j}M_{ij}x_{i}y_{j}$

Let ${\boldsymbol{x}_{1},...,\boldsymbol{x}_{n},\boldsymbol{y}_{1},...,\boldsymbol{y}_{n}}$ be of length ${1}$ and be the optimal choices used in ${\Vert M\Vert_{SDP}}$, i.e. the maximizers of ${\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{i}\right\rangle }$. It suffices to prove that there exist vectors ${\boldsymbol{b},\boldsymbol{c}\in[-B,B]^{n}}$ (for some fixed constant ${B}$, since we can just scale the result by changing ${c}$) to plug into the right-hand side of the above expression such that ${c\sum_{i,j}M_{ij}b_{i}c_{j}\ge\sum_{i,j}M_{ij}\left\langle \boldsymbol{x}_{i},\boldsymbol{y}_{j}\right\rangle }$.

Pick ${\boldsymbol{g}=(g_{1},...,g_{m})}$, with each coordinate being drawn from the normal distribution with mean ${0}$ and variance ${1}$, and let ${b_{i}=\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle}$ and ${c_{i}=\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle}$. Then

$\displaystyle \begin{array}{rcl} \mathbb{E}b_{i}c_{j} & = & \mathbb{E}_{\boldsymbol{g}}\boldsymbol{x}_{i}^{T}\boldsymbol{g}\boldsymbol{g}^{T}\boldsymbol{y}_{j}\\ & = & \boldsymbol{x}_{i}^{T}(\mathbb{E}_{\boldsymbol{g}}\boldsymbol{g}\boldsymbol{g}^{T})\boldsymbol{y}_{j} \end{array}$

But ${\mathbb{E}_{\boldsymbol{g}}\boldsymbol{g}\boldsymbol{g}^{T}}$ is just the identity, since the diagonal elements are just the expectation of the square of the Gaussian (which is ${1}$, its variance) and the off-diagonals are the expectation of the product of two independent Gaussians, which is zero (since the expectation of each individual Gaussian is zero).

So

$\displaystyle \mathbb{E}b_{i}c_{j}=\boldsymbol{x}_{i}^{T}\boldsymbol{y}_{j}=\langle\boldsymbol{x}_{i},\boldsymbol{y}_{j}\rangle\ .$

At this point we’ve got something that looks a lot like what we want: if the expectation of ${b_{i}c_{j}}$ is equal to ${\langle\boldsymbol{x}_{i},\boldsymbol{y}_{j}\rangle}$, then maximizing over them is definitely going to give us a quantity greater than or equal to ${\langle\boldsymbol{x}_{i},\boldsymbol{y}_{j}\rangle}$. Unfortunately, there’s an issue here: since Gaussian random variables are unbounded, ${b_{i}}$ and ${c_{i}}$ are unbounded; on the other hand, what we’re trying to do is maximize over vectors whose elements are bounded by ${[B,B]}$.

Our approach will be to “clip” the ${b_{i}}$ and ${c_{i}}$ to a finite range, and then bound the error introduced by the clipping process. Formally, fix a constant ${B}$ and pick a Gaussian random vector ${\boldsymbol{g}}$. Now define a ‘clipped’ inner product:

$\displaystyle b_{i}:=\begin{cases} -B & \text{ if }\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle<-B\\ \langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle & \text{ if }-B\le\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle\le B\\ B & \text{ if }\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle>B \end{cases}$

and likewise for ${\boldsymbol{c}_{j}}$. For convenience, let’s define the truncation error as follows, to represent how far the clipped value differs from the actual value.

$\displaystyle t(z):=\begin{cases} z+B & \text{ if }z<-B\\ 0 & \text{ if }-B\le z\le B\\ z-B & \text{ if }z>B \end{cases}$

So we can rewrite ${b}$ as:

$\displaystyle b_{i}:=\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle-t(\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle)$

and similarly we can define:

$\displaystyle c_{j}:=\langle\boldsymbol{y}_{j},\boldsymbol{g}\rangle-t(\langle\boldsymbol{y}_{j},\boldsymbol{g}\rangle)\ .$

So now we have

$\displaystyle \begin{array}{rcl} \mathbb{E}\sum_{ij}M_{ij}b_{i}c_{j} & = & \sum_{ij}M_{ij}\mathbb{E}\left[\left(\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle-t(\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle)\right)\left(\langle\boldsymbol{y}_{j},\boldsymbol{g}\rangle-t(\langle\boldsymbol{y}_{j},\boldsymbol{g}\rangle)\right)\right]\\ & = & \sum_{ij}M_{ij}\mathbb{E}\left[\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle\right]\\ & & -\sum_{ij}M_{ij}\mathbb{E}\left[\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle t(\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle)\right]\\ & & -\sum_{ij}M_{ij}\mathbb{E}\left[t(\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle)\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle\right]\\ & & +\sum_{ij}M_{ij}\mathbb{E}\left[t(\langle\boldsymbol{g},x_{i}\rangle)t(\langle\boldsymbol{g},y_{j}\rangle)\right] \end{array}$

Clearly ${\sum_{ij}M_{ij}\mathbb{E}\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle}$ is just the SDP norm , since${\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle}$ and ${\langle\boldsymbol{g},\boldsymbol{y}_{i}\rangle}$ were the ‘original’ ${b_{i}}$ and ${c_{i}}$.

What we will show now is that the remaining three terms are bounded by constant factors of SDP norm, so the entire sum of all four terms getting a constant factor approximation of it. The analysis is the same for all three terms so, for brevity, we’ll just look at the last one:

$\displaystyle \sum_{ij}M_{ij}\mathbb{E}t(\langle\boldsymbol{g},x_{i}\rangle)t(\langle\boldsymbol{g},y_{j}\rangle)\ .$

For convenience, let’s define ${f_{i,}h_{j}:\mathbb{R}^{m}\rightarrow\mathbb{R}}$ as:

$\displaystyle f_{i}(\boldsymbol{g}):= t(\langle\boldsymbol{x}_{i},\boldsymbol{g}\rangle)$

$\displaystyle h_{j}(\boldsymbol{g})=t(\langle\boldsymbol{y}_{j},\boldsymbol{g}\rangle)\ .$

Now, it’s convenient to think of the ${f_{i}}$ and ${h_{i}}$ as vectors of infinite dimension indexed by input ${g}$ (we’ll bring the dimensionality down to a finite value at the end of the proof). Let’s define the following inner product and norm in this space:

$\displaystyle \langle f_{1},f_{2}\rangle=\mathbb{E}_{g}f_{1}(\boldsymbol{g})f_{2}(\boldsymbol{g})$

$\displaystyle \Vert f\Vert^{2}=\mathbb{E}_{g}f^{2}(\boldsymbol{g})$

Now, since the Gaussian distribution is rotation-independent (we can just rotate a Gaussian random variable around without changing its distribution), the squared norms of the ${f_{i}}$ and the ${h_{j}}$ are all the same (since all the ${\boldsymbol{x}_{i}}$, ${\boldsymbol{y}_{i}}$ have length ${1}$, and dotting with them can be thought of a rotation). That means that the above norm takes on the same value for all ${f}$, ${h}$, so all we need to do is figure out a constant bound on it.

Fortunately, this value is pretty easy to bound. Notice that the function ${t}$ is zero if the dot product’s absolute value is smaller than ${B}$. If we have ${w\sim N(0,1)}$,

$\displaystyle \vert t(w)\vert^{2}\le\vert w\vert^{2}1_{\{w\ge B,w\le-B\}}\le e^{-\Omega(B^{2})}\le1/(10B)$

for sufficiently large ${B}$, which is a constant.

So if the squared norms are bounded above by ${1/(10B)}$, which means we can substitute this and the norm bound into the above expression to get:

Now, armed with this, we can conclude (with similar results for the other two error terms)

$\displaystyle \sum_{ij}M_{ij}\mathbb{E}t(\langle g,\boldsymbol{x}_{i}\rangle)t(\langle g,\boldsymbol{y}_{j}\rangle)=\sum_{ij}M_{ij}\langle f_{i},h_{j}\rangle\le\Vert M\Vert_{SDP}(1/100B^{2})\ .$

which tells us that ${\mathbb{E}\sum_{ij}M_{ij}b_{i}c_{j}}$ , is within a constant bound of the SDP norm ${\sum_{ij}M_{ij}\mathbb{E}\langle g,\boldsymbol{x}_{i}\rangle\langle g,\boldsymbol{y}_{j}\rangle}$ as desired.

There’s only one slightly fishy bit in the proof we used above, though, and that’s the treatment of functions of infinite-dimensional vectors indexed by a Gaussian vector. Let’s conclude by constructing a (finite-dimensional) solution to the SDP from the functions:

Claim: If ${X_{ij}=\langle f_{i},f_{j}\rangle}$, then ${X_{ij}\succeq0}$.

Proof:

$\displaystyle \begin{array}{rcl} a^{T}Xa & = & \sum_{iji}a_{i}\langle f_{i},f_{j}\rangle a_{j}\\ & = & \sum_{ij}\langle a_{i}f_{i},a_{j}f_{j}\rangle\\ & = & \langle\sum_{i}a_{i}f_{i},\sum_{j}a_{j}f_{j}\rangle\\ & = & \Vert\sum_{i}a_{i}f_{i}\Vert^{2}\\ & \ge & 0 \end{array}$

as desired. ${\square}$

So the matrix ${X_{ij}}$ comprises a nice finite-dimensional solution to the SDP, and we’re done with the proof.

Also, noticing that we have four terms in the expansion of ${\mathbb{E}\sum_{ij}M_{ij}b_{i}c_{j}}$, each one of which is a feasible value for the SDP, we can figure out how much we deviate from the actual optimum, i.e. ${\Vert M\Vert_{SDP}}$. Since each term can’t exceed the value of the SDP optimum at all, ${\mathbb{E}\sum_{ij}M_{ij}b_{i}c_{j}}$ is separated from ${\sum_{ij}M_{ij}\mathbb{E}\left[\langle\boldsymbol{g},\boldsymbol{x}_{i}\rangle\langle\boldsymbol{g},\boldsymbol{y}_{j}\rangle\right]}$ by a factor of four, giving us a bound on the constant ${c}$ in the inequality. $\Box$

To recap, notice that there were two key “tricks” here:

1) Assuming that we were rounding an optimal solution to our SDP (i.e. starting with ${\boldsymbol{x}_{i},\boldsymbol{y}_{j}}$ as optimizers). We don’t get any bounds otherwise!

2) Treating the rounding error itself as a feasible solution of the SDP.

This proof was communicated to us by James Lee.