CS254 Lecture 8 – Approximate Counting

Counting Problems

Today we describe counting problems and the class {{\bf \#P}} that they define, and we show that every counting problem {{\bf \#P}} can be approximately solved in randomized polynomial given access to an {{\bf NP}} oracle.

1. Counting Classes

[Note: we did not cover most of the material of this section in class.]

Recall that {R} is an NP-relation, if there is a polynomial time algorithm {A} such that {(x,y) \in R \Leftrightarrow A(x,y)=1} and there is a polynomial {p} such that {(x,y) \in R \Rightarrow |y| \leq p(|x|)}.

Definition 1 If {R} is an {{\bf NP}} relation, then {\#R} is the problem that, given {x}, asks how many {y} satisfy {(x,y) \in R}.

{{\bf \#P}} is the class of all problems of the form {\#R}, where {R} is an NP-relation.

Observe that an NP-relation {R} naturally defines an {{\bf NP}} language {L_R}, where {L_R = \{ x : \exists y. (x,y) \in R \}}, and every {{\bf NP}} language can be defined in this way. Therefore problems in {{\bf \#P}} can always be seen as the problem of counting the number of witnesses for a given instance of an {{\bf NP}} problem.

Unlike for decision problems there is no canonical way to define reductions for counting classes. There are two common definitions.

Definition 2 We say there is a parsimonious reduction from {\#A} to {\#B} (written {\#A \leq_{\text{par}} \#B}) if there is a polynomial time transformation {f} such that for all {x}, {|\{y,(x,y)\in A \}| = |\{z:(f(x),z)\in B\}|}.

Often this definition is a little too restrictive and we use the following definition instead.

Definition 3 {\#A \leq \#B} if there is a polynomial time algorithm for {\#A} given an oracle that solves {\#B}.

{\#CIRCUITSAT} is the problem where given a circuit, we want to count the number of inputs that make the circuit output 1.

Theorem 4 {\#CIRCUITSAT} is {{\bf \#P}}-complete under parsimonious reductions.

Proof: Let {\#R} be in {{\bf \#P}} and {A} and {p} be as in the definition. Given {x} we want to construct a circuit {C} such that {|\{z:C(z)\}| = |\{y:|y|\leq p(|x|), A(x,y)=1\}|}. We then construct {\hat{C}_n} that on input {x,y} simulates {A(x,y)}. From earlier arguments we know that this can be done with a circuit with size about the square of the running time of {A}. Thus {\hat{C}_n} will have size polynomial in the running time of {A} and so polynomial in {x}. Then let {C(y) = \hat{C}(x,y)}. \Box

Theorem 5 {\#3SAT} is {{\bf \#P}}-complete.

Proof: We show that there is a parsimonious reduction from {\#CIRCUITSAT} to {\#3SAT}. That is, given a circuit {C} we construct a Boolean formula {\phi} such that the number of satisfying assignments for {\phi} is equal to the number of inputs for which {C} outputs 1. Suppose {C} has inputs {x_1,\ldots,x_n} and gates {1,\ldots,m} and {\phi} has inputs {x_1,\ldots,x_n,g_1,\ldots,g_m}, where the {g_i} represent the output of gate {i}. Now each gate has two input variables and one output variable. Thus a gate can be complete described by mimicking the output for each of the 4 possible inputs. Thus each gate can be simulated using at most 4 clauses. In this way we have reduced {C} to a formula {\phi} with {n+m} variables and {4m} clauses. So there is a parsimonious reduction from {\#CIRCUITSAT} to {\#3SAT}. \Box

Notice that if a counting problem {\#R} is {{\bf \#P}}-complete under parsimonious reductions, then the associated language {L_R} is NP-complete, because {\#3SAT \leq_{par} \#R} implies {3SAT \leq L_R}. On the other hand, with the less restrictive definition of reducibility, even some counting problems whose decision version is in P are {{\bf \#P}}-complete. For example, the problem of counting the number of satisfying assignments for a given 2CNF formula and the problem of counting the number of perfect matchings in a given bipartite graphs are both {{\bf \#P}}-complete.

2. Complexity of counting problems

We will prove the following theorem:

Theorem 6 For every counting problem {\#\mbox{A}} in {{\bf \#P}}, there is a probabilistic algorithm {C} that on input {x}, computes with high probability a value {v} such that

\displaystyle  (1-\epsilon)\#A(x) \leq v \leq (1+\epsilon)\#A(x) \ \ \ \ \ (1)

in time polynomial in {|x|} and in {\tfrac{1}{\epsilon}}, using an oracle for {{\bf NP}}.

The theorem says that {{\bf \#P}} can be approximate in {{\bf BPP}^{{\bf NP}}}. We remark that approximating {\#3SAT} is {{\bf NP}}-hard, and so to compute an approximation we need at least the power of {{\bf NP}}. Theorem 6 states that the power of {{\bf NP}} and randomization is sufficient.

Another remark concerns the following result.

Theorem 7 (Toda) For every {k}, {\Sigma_k \subseteq {\bf P}^{{\bf \#P}}}.

This implies that {\#3SAT} is {\Sigma_k}-hard for every {k}, i.e., {\#3SAT} lies outside the polynomial hierarchy, unless the hierarchy collapses. Recall that {{\bf BPP}} lies inside {\Sigma_2}, and hence approximating {\#3SAT} can be done in {\Sigma_3}. Therefore, approximating {\#3SAT} cannot be equivalent to computing {\#3SAT} exactly, unless the polynomial hierarchy collapses.\footnote{The above discussion was not very rigorous but it can be correctly formalized. In particular: (i) from the fact that {{\bf BPP} \subseteq \Sigma_2} and that approximate counting is doable in {{\bf BPP}^{{\bf NP}}} it does not necessarily follow that approximate counting is in {\Sigma_3}, although in this case it does because the proof that {{\bf BPP} \subseteq \Sigma_2} relativizes; (ii) we have defined {{\bf BPP}}, {\Sigma_3}, etc., as classes of decision problems, while approximate counting is not a decision problem (it can be shown, however, to be equivalent to a “promise problem,” and the inclusion {{\bf BPP} \subseteq \Sigma_2} holds also for promise problems.}

We first make some observations so that we can reduce the proof to the task of proving a simpler statement.

  • It is enough to prove the theorem for {\#3SAT}.

    If we have an approximation algorithm for {\#3SAT}, we can extend it to any {\#\mbox{A}} in {{\bf \#P}} using the parsimonious reduction from {\#\mbox{A}} to {\#3SAT}.

  • It is enough to give a polynomial time {O(1)}-approximation for {\#3SAT}.

    Suppose we have an algorithm {C} and a constant {c} such that

    \displaystyle  \frac{1}{c}\#3SAT(\varphi)\leq C(\varphi) \leq c\#3SAT(\varphi). \ \ \ \ \ (2)

    Given {\varphi}, we can construct {\varphi^k=\varphi_1\wedge \varphi_2 \wedge \cdots \wedge \varphi_k} where each {\phi_i} is a copy of {\phi} constructed using fresh variables. If {\varphi} has {t} satisfying assignments, {\varphi^k} has {t^k} satisfying assignments. Then, giving {\varphi^k} to the algorithm we get

    \displaystyle  \begin{aligned} \frac{1}{c}t^k & \leq C(\varphi^k) \leq c t^k \\ \frac{1}{c}^{1/k}t & \leq C(\varphi^k)^{1/k}\leq {c}^{1/k} t. \end{aligned}

    If {c} is a constant and {k=O(\frac{1}{\epsilon})}, {{c}^{1/k}=1+\epsilon}.

  • For a formula {\varphi} that has {O(1)} satisfying assignments, {\#3SAT(\varphi)} can be found in {{\bf P}^{{\bf NP}}}.

    This can be done by iteratively asking the oracle the questions of the form: “Are there {k} assignments satisfying this formula?” Notice that these are {{\bf NP}} questions, because the algorithm can guess these {k} assignments and check them.

3. An approximate comparison procedure

Suppose that we had available an approximate comparison procedure a-comp with the following properties:

  • If {\#3SAT(\varphi)\geq 2^{k+1}} then {{\it  a-comp}(\varphi,k)=\text{YES}} with high probability;
  • If {\#3SAT(\varphi)< 2^{k}} then {{\it  a-comp}(\varphi,k)=\text{NO}} with high probability.

Given a-comp, we can construct an algorithm that {2}-approximates {\#3SAT} as described below:

  • Input: {\varphi}
  • compute:
    • {\text{\it  a-comp}(\varphi,0)}
    • {\text{\it  a-comp}(\varphi,1)}
    • {\text{\it  a-comp}(\varphi,2)}
    • {\vdots}
    • {\text{\it  a-comp}(\varphi,n+1)}
  • if {\text{\it  a-comp}} outputs NO from the first time then
    • // The value is either {0} or {1} and the answer can be checked by one more query to the {{\bf NP}} oracle.
    • Query to the oracle and output an exact value.
  • else
    • Suppose that it outputs YES for {t=1,\ldots,i-1} and NO for {t=i}
    • Output {2^i}

We need to show that this algorithm approximates {\#3SAT} within a factor of {2}. If a-comp answers NO from the first time, the algorithm outputs the right answer because it checks for the answer explicitly. Now suppose a-comp says YES for all {t=1,2,\ldots,i-1} and says NO for {t=i}. Since {\text{\it  a-comp}(\varphi,i-1)} outputs YES, {\#3SAT(\varphi)\geq 2^{i-1}}, and also since {\text{\it  a-comp}(\varphi,2^{i})} outputs NO, {\#3SAT(\varphi)< 2^{i+1}}. The algorithm outputs {a=2^{i}}. Hence,

\displaystyle  \frac{1}{2}a\leq \#3SAT(\varphi) < 2\cdot a \ \ \ \ \ (3)

and the algorithm outputs the correct answer with in a factor of {2}.

Thus, to establish the theorem, it is enough to give a {{\bf BPP}^{{\bf NP}}} implementation of the a-comp procedure

4. Constructing a-comp

The procedure and its analysis is similar to the Valiant-Vazirani reduction: for a given formula {\phi} we pick a hash function {h} from a pairwise independent family, and look at the number of assignments {x} that satisfy {h} and such that {h(x)={\bf 0}}.

In the Valiant-Vazirani reduction, we proved that if {S} is a set of size approximately equal to the size of the range of {h()}, then, with constant probability, exactly one element of {S} is mapped by {h()} into {\bf 0}. Now we use a different result, a simplified version of the “Leftover Hash Lemma” proved by Impagliazzo, Levin, and Luby in 1989, that says that if {S} is sufficiently larger than the range of {h()} then the number of elements of {S} mapped into {\bf 0} is concentrated around its expectation.

Lemma 8 Let {H} be a family of pairwise independent hash functions {h \: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^m}. Let {S\subset \{ 0,1 \}^n}, {|S|\geq \tfrac{4\cdot 2^m}{\epsilon^2}}. Then,

\displaystyle  \mathop{\mathbb P}_{h\in H} \left[\left| | \{ a\in S: h(a)=0 \} | -\frac{\|S|}{2^m}\right| \geq \epsilon\frac{|S|}{2^m}\right] \leq \frac{1}{4}. \ \ \ \ \ (4)

From this, a-comp can be constructed as follows.

  • input: {\varphi,k}
  • if {k\leq 5} then check exactly whether {\#3SAT(\varphi)\geq 2^k}.
  • if {k\geq 6}
    • pick {h} from a set of pairwise independent hash functions {h : \{ 0,1 \}^n \rightarrow \{ 0,1 \}^{m}}, where {m=k-5}
    • answer YES iff there are more then {48} assignments {a} to {\varphi} such that {a} satisfies {\varphi} and {h(a)=0}.

Notice that the test at the last step can be done with one access to an oracle to {{\bf NP}}. We will show that the algorithm is in {{\bf BPP}^{{\bf NP}}}. Let {S\subseteq \{0,1\}^n} be the set of satisfying assignments for {\varphi}. There are 2 cases.

  • If {|S|\geq 2^{k+1}}, by Lemma 8 we have:

    \displaystyle  \mathop{\mathbb P}_{h\in H} \left[\left| \frac{ |S| }{2^m} - | \{a\in S: h(a)=0 \} | \right| \leq \frac{1}{4}\cdot \frac{ |S| }{2^m}\right] \geq \frac{3}{4}

    (set {\epsilon=\tfrac{1}{4}}, and { |S| \geq \tfrac{4\cdot 2^m}{\epsilon^2}=64\cdot 2^m}, because { |S| \geq 2^{k+1} = 2^{m+6}})

    \displaystyle  \begin{aligned} \mathop{\mathbb P}_{h\in H} \left[ | \{ a\in S: h(a)=0 \} | \geq \frac 34 \cdot \frac{ |S| }{2^m}\right] & \geq \frac{3}{4},\\ \mathop{\mathbb P}_{h\in H} \left[ | \{a\in S: h(a)=0 \} | \geq 48 \right] & \geq \frac{3}{4},\\ \end{aligned}

    which is the success probability of the algorithm.

  • If {|S| < 2^k}:

    Let {S'} be a superset of {S} of size {2^k}. We have

    \displaystyle  \begin{aligned} \mathop{\mathbb P}_{h\in H} [\text{answer YES}] & = \mathop{\mathbb P}_{h\in H} [| \{a\in S : h(s)=0\} | \geq 48]\\ & \leq \mathop{\mathbb P}_{h\in H} [| \{ a\in S' : h(s)=0\} | \geq 48]\\ & \leq \mathop{\mathbb P}_{h\in H} \left[\left| | \{a\in S' : h(s)=0\} |-\tfrac{|S'|}{2^m}\right| \geq\tfrac{| S' | }{2\cdot 2^m}\right]\\ & \leq \frac{1}{4} \end{aligned}

    (by Lemma 8 with {\epsilon=1/2, |S'|=32\cdot2^m}.)

Therefore, the algorithm will give the correct answer with probability at least {3/4}, which can then be amplified to, say, {1-1/4n} (so that all {n} invocations of a-comp are likely to be correct) by repeating the procedure {O(\log n)} times and taking the majority answer.

5. The Remaining Proof

We finish the lecture by proving Lemma 8.

Proof: We will use Chebyshev’s Inequality to bound the failure probability. Let {S=\{a_1,\ldots,a_k\}}, and pick a random {h\in H}. We define random variables {X_1,\ldots,X_k} as

\displaystyle  X_i=\left\{ \begin{aligned} 1 & \text{ if } h(a_i)=0 \\ 0 & \text{ otherwise.} \end{aligned} \right. \ \ \ \ \ (5)

Clearly, {| \{ a\in S: h(a)=0 \} | =\sum_i X_i}.

We now calculate the expectations. For each {i}, {\mathop{\mathbb P}[X_i=1]=\tfrac{1}{2^m}} and {\mathop{\mathbb E}[X_i]=\tfrac{1}{2^m}}. Hence,

\displaystyle  \mathop{\mathbb E}\left[\sum_i X_i\right]= \frac{ |S| }{2^m}. \ \ \ \ \ (6)

Also we calculate the variance

\displaystyle  \begin{aligned} \mathop{\bf E}[X_i] & = \mathop{\mathbb E}[X_i^2]-\mathop{\mathbb E}[X_i]^2 \\ & \leq \mathop{\mathbb E}[X_i^2] \\ & = \mathop{\mathbb E}[X_i]= \frac{1}{2^m}. \end{aligned}

Because {X_1,\ldots,X_k} are pairwise independent,

\displaystyle  \mathop{\bf E}\left[\sum_i X_i\right]=\sum_i \mathop{\bf E}[X_i] \leq \frac{ |S| }{2^m}. \ \ \ \ \ (7)

Using Chebyshev’s Inequality, we get

\displaystyle  \begin{aligned} \mathop{\mathbb P}\left[\left| | \{ a\in S: h(a)=0 \} | - \frac{ |S| }{2^m}\right| \geq \epsilon \frac{ |S| }{2^m}\right] & = \mathop{\mathbb P}\left[\left|\sum_i X_i - \mathop{\mathbb E}[\sum_i X_i]\right| \geq \epsilon \mathop{\mathbb E}[\sum_i X_i]\right] \\ & \leq \frac{\mathop{\bf E}[\sum_i X_i]}{\epsilon^2 \mathop{\mathbb E}[\sum_i X_i]^2} \leq \frac{ \frac{ |S| }{2^m}}{\epsilon^2 \frac{ |S| ^2}{(2^m)^2}} \\ & = \frac{2^m}{\epsilon^2 |S| } \leq \frac{1}{4}. \end{aligned}

\Box

6. Approximate Sampling

The content of this section was not covered in class; it’s here as bonus material. It’s good stuff.

So far we have considered the following question: for an NP-relation {R}, given an input {x}, what is the size of the set {R_x = \{ y : (x,y)\in R\}}? A related question is to be able to sample from the uniform distribution over {R_x}.

Whenever the relation {R} is “downward self reducible” (a technical condition that we won’t define formally), it is possible to prove that there is a probabilistic algorithm running in time polynomial in {|x|} and {1/\epsilon} to approximate within {1+\epsilon} the value {|R_x|} if and only if there is a probabilistic algorithm running in time polynomial in {|x|} and {1/\epsilon} that samples a distribution {\epsilon}-close to the uniform distribution over {R_x}.

We show how the above result applies to 3SAT (the general result uses the same proof idea). For a formula {\phi}, a variable {x} and a bit {b}, let us define by {\phi_{x \leftarrow b}} the formula obtained by substituting the value {b} in place of {x}.\footnote{ Specifically, {\phi_{x \leftarrow 1}} is obtained by removing each occurrence of {\neg x} from the clauses where it occurs, and removing all the clauses that contain an occurrence of {x}; the formula {\phi_{x \leftarrow 0}} is similarly obtained.}

If {\phi} is defined over variables {x_1,\ldots,x_n}, it is easy to see that

\displaystyle  \#\phi = \#\phi_{x \leftarrow 0} + \#\phi_{x \leftarrow 1}

Also, if {S} is the uniform distribution over satisfying assignments for {\phi}, we note that

\displaystyle  \mathop{\mathbb P}_{(x_1,\ldots,x_n) \leftarrow S} [ x_1 = b] = \frac {\#\phi_{x \leftarrow b}} {\#\phi }

Suppose then that we have an efficient sampling algorithm that given {\phi} and {\epsilon} generates a distribution {\epsilon}-close to uniform over the satisfying assignments of {\phi}.

Let us then ran the sampling algorithm with approximation parameter {\epsilon/2n} and use it to sample about {\tilde O(n^2/\epsilon^2)} assignments. By computing the fraction of such assignments having {x_1=0} and {x_1=1}, we get approximate values {p_0,p_1}, such that {| p_b - \mathop{\mathbb P}_{(x_1,\ldots,x_n) \leftarrow S} [ x_1 = b] | \leq \epsilon/n}. Let {b} be such that {p_b\geq 1/2}, then {\#\phi_{x \leftarrow b}/p_b} is a good approximation, to within a multiplicative factor {(1+2\epsilon/n)} to {\#\phi}, and we can recurse to compute {\#\phi_{x \leftarrow b}} to within a {(1+2\epsilon/n)^{n-1}} factor.

Conversely, suppose we have an approximate counting procedure. Then we can approximately compute {p_b = \frac {\#\phi_{x \leftarrow b}} {\#\phi }}, generate a value {b} for {x_1} with probability approximately {p_b}, and then recurse to generate a random assignment for {\#\phi_{x \leftarrow b}}.

The same equivalence holds, clearly, for 2SAT and, among other problems, for the problem of counting the number of perfect matchings in a bipartite graph. It is known that it is NP-hard to perform approximate counting for 2SAT and this result, with the above reduction, implies that approximate sampling is also hard for 2SAT. The problem of approximately sampling a perfect matching has a probabilistic polynomial solution, and the reduction implies that approximately counting the number of perfect matchings in a graph can also be done in probabilistic polynomial time.

The reduction and the results from last section also imply that 3SAT (and any other {{\bf NP}} relation) has an approximate sampling algorithm that runs in probabilistic polynomial time with an {{\bf NP}} oracle. With a careful use of the techniques from last week it is indeed possible to get an exact sampling algorithm for 3SAT (and any other {{\bf NP}} relation) running in probabilistic polynomial time with an {{\bf NP}} oracle. This is essentially best possible, because the approximate sampling requires randomness by its very definition, and generating satisfying assignments for a 3SAT formula requires at least an {{\bf NP}} oracle.

1 thought on “CS254 Lecture 8 – Approximate Counting

Leave a comment