# Approximate Counting

This week I have been at the IPAM workshop on expander graphs, scheduled so that we could spend Valentine’s day attending lectures, thus symbolically demonstrating our love for math.

My talk was on the problem of checking whether a given hypergraph is an expander, focusing on the expansion property that is known as “quasirandomness.”

Because of the workshop, my Complexity class took a one-week break.

Last week, we talked about one of my favorite “elementary” results in complexity theory: approximate counting with an NP oracle.

Suppose that someone comes up with a polynomial time algorithm for SAT and proves that P=NP; then we know that a number of awesome consequences would follow (and some frightening ones, such as the impossibility of cryptography). We would still, however, not be able to solve combinatorial counting problems such as: given a graph, how many proper 3-colorings does it have, if any?

If the number of colorings is $k$, we can compute it in time polynomial in $k$, because the question “is the number of colorings at least $t$?” can be reduced to a SAT instance of size polynomial in $t$ and in the size of the graph. We might, however, be given a graph with $n$ vertices and a number of 3-colorings in the ballpark of $2^n$. Then, even the awesome power of a polynomial time algorithm for SAT does not seem to help. Indeed it follows from Toda’s Theorem that, if one could solve the 3-coloring counting problem efficiently using a SAT oracle, then the polynomial hierarchy would collapse. (If the previous sentence did not make sense to you, let’s just say that there is good evidence that there is no algorithm that solves the 3-coloring counting problem efficiently by “just relying” on an efficient algorithm for SAT.)

Interestingly enough, however, we can approximately count solutions to any NP problem in probabilistic polynomial time, given an oracle for SAT. The idea is cleverly simple. Suppose we have some guess $t$ for the number of 3-colorings (for concreteness I continue to talk about 3-coloring, but the argument works for any NP counting problem), and we want to distinguish the cases in which $t$ is roughly right, or much higher, or much lower then the right count. If we can do that, then we can do binary search on $t$ until we find a “roughly right” value. (In fact, we don’t need to do binary search, we can try $t=3,9,\ldots,3^k,\ldots$ and we only need to make $n$ tries, where $n$ is the number of vertices, in order to get a constant-factor approximation.)

Here is the idea: identify colorings with elements of $\{0,1,2\}^n$, where $n$ is the number of vertices, pick a random function $h: \{0,1,2\}^n \rightarrow [1,\ldots, t/100]$ and consider the question of how many 3-colorings $c \in \{0,1,2\}^n$ are there that are: (i) proper colorings, and (ii) such that $h(c) = 1$. If the number of proper 3-colorings is roughly $t$, then the expected number of proper 3-colorings satisfying (ii) is around 100, with a standard deviation around 10, so there is a good probability that it is between 70 and 130, otherwise it would be more than 3 standard deviations from its mean. If the number of proper 3-coloring is much more than $t$, say it’s more than $2t$, then very likely the number of proper 3-colorings mapping to 1 is more than 130, as otherwise it would be 4 standard deviations or more away from its mean. Similarly, it the number of proper 3-colorings is less than $t/2$ then it’s likely that fewer than 70 will map to 1.

Since we only need Chebyshev’s inequality in the above calculations, it is not necessary that $h$ be picked as a random function $h: \{0,1,2\}^n \rightarrow \{1,\ldots, t/100\}$, but it is enough if $h$ is selected from a distribution of “pairwise independent hash functions,” that is a distribution such that, for every two inputs $x,y$, if we pick $h$ at random from the family, then the values $h(x),h(y)$ are independent and uniformly distributed in $\{ 1,\ldots, t/100\}$. It is not hard to see that if we pick a random $n \times m$ matrix $A\in {\mathbb F}^{n \times m}_3$ and a random vector $b\in {\mathbb F}^m_3$, then the function $h_{A,b} (x) : {\mathbb F}^n_3 \rightarrow {\mathbb F}^m_3$ defined as

$h_{A,b} (x) := Ax + b$

is a pairwise independent hash function.

Our algorithm to check if the number of 3-colorings in a given graph is in the ballpark of $t$ is then to pick a pairwise independent hash function $h_{A,b} : {\mathbb F}^n_3 \rightarrow {\mathbb F}^m_3$ where $3^m \approx t/100$ and then check whether the number of proper 3-colorings $c$ in the graph such that $h(c) = (0,\ldots,0)$ is around 100. This is a problem that can solved with an NP oracle.

(There are a few more details one needs to work out: e.g. the above works well if $t$ is a large enough constants. For small $t$ we can do an exact calculation using the NP oracle as discussed at the beginning of the post. So far we are only getting a constant-factor approximation, but there are simple ways – that the reader is invited to reconstruct – to convert an algorithm that achieves an $O(1)$ factor approximation for the number of proper 3-colorings into an algorithm that achieves $1+\epsilon$, or even $1+ 1/n$ approximation. Alternatively, the analysis can be sharpened to directly give $1+\epsilon$, or even $1+ 1/n$ approximation.)

Is there a family of pairwise independent hash functions from $\mathbb Z_m$ to $\mathbb Z_n$ for any (sufficiently large) positive integers $m$ and $n$?