# The Khot-Naor Approximation Algorithm for 3-XOR

Today I would like to discuss the Khot-Naor approximation algorithm for the 3-XOR problem, and an open question related to it.

In 3-XOR, we have a system of linear equations modulo 2, with three variables per equation, that might look something like $\displaystyle \begin{array}{ll} x_1 + x_2 + x_4 & \equiv 0 \pmod 2\\ x_1 + x_5 + x_6 & \equiv 1 \pmod 2\\ x_2 + x_3 + x_4 & \equiv 1 \pmod 2\\ x_5 + x_3 + x_6 & \equiv 1 \pmod 2 \end{array}$

The above system is not satisfiable (if we add up the left-hand sides we get 0, if we add up the right-hand sides we get 1), but it is possible to satisfy ${3/4}$ of the equations, for example by setting all the variables to 1. In Max 3-XOR problem (which we will simply refer to as “3-XOR” from now on), given a system of equations mod 2 with three variables per equation, we want to find an assignment that satisfies as many equations as possible.

Either setting all the variables to zero or setting all the variables to one will satisfy half of the equations, and the interesting question is how much better than 1/2 it is possible to do on a given instance. Khot and Naor provide an algorithm that, given an instance in which it is possible to satisfy an ${1/2 + \epsilon}$ fraction of equations, finds a solution that satisfies at least ${1/2 + \epsilon \cdot O \left( \sqrt{\frac {\log n} n} \right)}$ fraction of equations, where ${n}$ is the number of variables. The algorithm is randomized, it runs in polynomial time, and it succeeds with high probability. I believe that it is still the state of the art in terms of worst-case approximation guarantee.

Like the approximation algorithm for sparsest cut in Abelian Cayley graphs implied by the result of Bauer et al. that was the subject of the last two posts, the result of Khot and Naor does not prove a bound on the integrality gap of a relaxation of the problem.

I will describe the Khot-Naor algorithm and describe how it manages to use convex optimization to provide an approximation algorithm, but without establishing an integrality gap bound. I thank my student Lucas Pesenti for explaining the algorithm to me and for thinking together about this problem.

If our 3-XOR instance has ${m}$ equations and ${n}$ variables, then the problem of maximizing the number of satisfied equations can be rewritten as $\displaystyle \frac m2 + \max_{x \in \{ -1 , +1 \}^n} \ \sum_{i,j,k} c_{i,j,k} x_i x_j x_k$

so that our goal is to approximate the combinatorial optimization problem $\displaystyle \max_{x \in \{ -1 , +1 \}^n} \ \sum_{i,j,k} c_{i,j,k} x_i x_j x_k$

Up to a constant factor loss in the approximation guarantee, Khot and Naor show that the above is equivalent to $\displaystyle \max_{x,y,z \in \{ -1 , +1 \}^n} \ \sum_{i,j,k} T_{i,j,k} x_i y_j z_k \ \ \ \ \ (1)$

where ${T_{i,j,k}}$ is a symmetric 3-tensor with entries in ${\{ -1,0,1\}}$ and with ${O(m)}$ non-zero entries.

Before continuing, let us recall that if ${M}$ is an ${n \times n}$ matrix, then its ${\ell_\infty}$-to- ${\ell_1}$ operator norm has the characterization $\displaystyle || M ||_{\infty \rightarrow 1 } = \max_{x,y \in [-1,1]^n} x^T M y = \max_{x,y \in \{ -1,1 \}^n} x^T M y \ \ \ \ \ (2)$

We could also define the “Grothendieck norm” ${|| M ||_{Grot}}$ of a matrix as the following natural semidefinite programming relaxation of the ${\ell_\infty}$-to- ${\ell_1}$ norm: $\displaystyle \begin{array}{lll} \max & \sum_{i,j} M_{i,j} \langle x_i , y_j\rangle \\ {\rm s.t.}\\ & || x_i ||^2 = 1 & i = 1,\ldots,n\\ & ||y_j ||^2 = 1 & i = j,\ldots,n \end{array}$

where the ${x_i}$ and ${y_j}$ are arbitrary vectors. The Grothendieck inequality is $\displaystyle || M||_{\infty \rightarrow 1 } \leq || M ||_{Grot} \leq O(1) \cdot || M ||_{\infty \rightarrow 1 }$

where the ${O(1)}$ is an absolute constant, known to be less than ${1.8}$. Furthermore, the above inequality has a constructive proof, and it leads to a polynomial time constant factor approximation for the problem of finding values ${x_i}$ and ${y_i}$ in ${\pm 1}$ that maximize (2).

Basically, we can see problem (1) as the natural generalization of (2) to tensors, and one would like to see a semidefinite programming relaxation of (1) achieving something resembling the Grothendieck inequality, but with a loss of something like ${\tilde O(\sqrt n)}$. As I mentioned above, this remains an open question, as far as I know.

The idea of Khot and Naor is the following. Suppose that we are given an instance ${T}$ of problem (1), and suppose that ${x^*,y^*,z^*}$ is an optimal solution, and let us call $\displaystyle \epsilon = \sum_{i,j,k} T_{i,j,k} x^*_i y^*_j z^*_k$

the value of the optimum (the algorithm will not need to know or guess ${\epsilon}$).

The key step is now to see that if we pick a random ${x \in \{-1,1\}^n}$, there is at least a ${1/n^{O(1)}}$ probability that $\displaystyle \sum_{i,j,k} T_{i,j,k} x_i y^*_j z^*_k \geq \epsilon \cdot \Omega \left(\sqrt{\frac{\log n}{n}} \right)$

This is a bit difficult, but it is really easy to see that with ${1/n^{O(1)}}$ probability we have $\displaystyle \sum_{i,j,k} T_{i,j,k} x_i y^*_j z^*_k \geq \epsilon \cdot \Omega \left(\sqrt{\frac{1}{n}} \right) \ \ \ \ \ (3)$

and we can do that by seeing that by defining a vector ${w}$ such that ${w_i = \sum_{j,k} T_{i,j,k} y^*_j z^*_k}$, so that $\displaystyle \langle x, w \rangle = \sum_{i,j,k} T_{i,j,k} x_i y^*_j z^*_k$

So we have $\displaystyle \langle x^*,w \rangle = \epsilon$

which, using Cauchy-Schwarz, gives $\displaystyle || w||^2 \geq \epsilon^2 /n$

Now, for a random ${x\sim \{ -1,1\}^n}$, we have $\displaystyle \mathop{\mathbb E} \langle x,w \rangle^2 = || w||^2 \geq \epsilon^2 /n$

and $\displaystyle \mathop{\mathbb E} \langle x,w \rangle^4 \leq n^{O(1)}$

so by Paley–Zygmund we have, let’s say $\displaystyle \mathop{\mathbb P} \left[ \langle x,w \rangle^2 \geq \frac {\epsilon^2}{2n} \right] \geq \frac 1 {n^{O(1)}}$

which, together with the definition of ${w}$ and the fact that the distribution of ${\langle x, w\rangle}$ is symmetric around zero, gives us the claim (3).

Now suppose that we have been lucky and that we have found such an ${x}$. We define the matrix $\displaystyle X_{j,k} = \sum_i T_{i,j,k} x_i$

and we see that our claim can be written as $\displaystyle y^{*T} X z^* \geq \epsilon \cdot \Omega \left(\sqrt{\frac{1}{n}} \right)$

At this point we just apply the algorithm implied by the Grothendieck inequality to the matrix ${X}$, and we find ${y}$ and ${z}$ in ${\{ -1,1 \}^n}$ such that $\displaystyle y^T X z \geq \epsilon \cdot \Omega \left(\sqrt{\frac{1}{n}} \right)$

meaning that $\displaystyle \sum_{i,j,k} T_{i,j,k} x_i y_j z_k \geq \epsilon \cdot \Omega \left(\sqrt{\frac{1}{n}}\right)$

Summarizing, our algorithm is to pick a random vector ${x\sim \{-1,1\}}$ and to find a constant-factor approximation for the problem $\displaystyle \max_{y,z \in \{-1,1\}^n} \ \sum_{i,j,k} T_{i,j,k} x_iy_jz_k \ \ \ \ \ (4)$

using semidefinite programming. We do that ${n^{O(1)}}$ times, and take the best solution.

The analysis can be turned into an upper bound certificate in the following way. For the (suboptimal) analysis using Paley-Zygmund, we only need the entries of the random ${x}$ to be 4-wise independent, and there are distributions on ${\{-1,1\}^n}$ where the entries are unbiased and 4-wise independent, and such that the sample space is of size polynomial in ${n}$. Thus, one could write an SDP relaxation of (4) for each ${x}$ in the support of such a distribution, and then take the maximum of these SDPs, multiply it by ${O(\sqrt n)}$, and it would be a certified upper bound. Such an upper bound, however, would not come from a relaxation of the 3-XOR problem, and I find it really strange that it is not clear how to turn these ideas into a proof that, say, the standard degree-4 sum-of-squares semidefinite programming relaxation of 3-XOR has an integrality gap at most ${\tilde O(\sqrt n)}$.

## 1 thought on “The Khot-Naor Approximation Algorithm for 3-XOR”

1. Luca, thanks for the post!
I think this paper (https://arxiv.org/pdf/1611.05998.pdf) does something related to what you are looking for.
They are working over the sphere rather than the hypercube.
But my recollection is that they show integrality gap upper bounds for the usual SoS relaxation.