Search vs Decision vs Certification for Planted Problems

Edited 5/7/2018. Thanks to Sam Hopkins for several corrections and suggestions.

I am revising my notes from the course on “better-than-worst-case” analysis of algorithms. With the benefit of hindsight, in this post (and continuing in future posts) I would like to review again how one applies spectral methods and semidefinite programming to problems that involve a “planted” solution, and what is the role of concentration results for random matrices in the analysis of such algorithms.

In general, a “planted” distribution is one where we start from picking a random “planted” solution and then we pick an instance of our problem in which the planted solution is good. Such problems are interesting for the average-case analysis of algorithms, because they provide a non-trivial testing ground to understand why certain algorithms perform better in practice than in the worst case; planted problems are also of great interest in complexity theory, because when they are average-case hard they imply the existence of one-way functions; planted problems are also a rich ground for interdisciplinary work: statisticians recognize them as parametric distributions, where the parameter is the planted solutions, for which one wants to approximate the parameter given one sample, and information theorists think of them as a channel where the sender sends a planted solution and the receiver gets an instance of the problem, so that one recognizes that having a large mutual information between instance and planted solution is a necessary condition for the problem to be solvable.

For the sake of this post, it is helpful to focus on the motivating examples of planted {k}-clique and planted bisection. In planted {k}-clique, we sample a graph {G} from the {G_{n, \frac 12}} distribution, then we randomly select {k} vertices and we add all the necessary edges to make those edges a clique. In the planted bisection, or stochastic block model, {G_{n,p,q}} distribution, we randomly split a set of {n} vertices into two subsets {A,B} of equal size and then we select edges so that edges within {A} and within {B} have probability {p}, and edges crossing the cut {(A,B)} have probability {q}, with {p> q}.

A graph sampled from the planted clique distribution is a lot like a {G_{n, \frac 12}} graph, except for the extra edges that create a large clique, and a graph sampled from the {G_{n, p, q}} distribution is a lot like a {G_{n, \frac {p+q} 2}} random graph, except for the cut {(A,B)}, which is crossed by roughly {\frac 14 \cdot qn^2} edges, is sparser than a typical balanced cut, which is crossed by about {\frac 18 \cdot (pn^2 + qn^2)} edges.

1. The Search Problem, the Decision Problem, and the Certification Problem

Having defined these distributions, our problem of interest is:

  • The search problem: given a sample from the planted distribution, find the planted solution, or an approximation, in a useful sense, of the planted solution.

Usually, in planted problems, the solution that is planted is of a kind that exists only with negligible, or even exponentially small, probability in the “non-planted” distribution that we start from. Thus, an algorithm that solves the search problem with noticeable probability {p} also solve the following problem with probability close to {p}:

  • The decision problem of distinguishing a sample of the planted distribution from a sample of the analogous “non-planted” distribution.

For small values of {p}, achieving distinguishing probability {p} in the decision problem can be (seemingly) much easier than solving the search problem with probabiltiy {p}. For example, if {k= o(\sqrt n)}, we do not know any polynomial time algorithm that solves the search problem for planted {k}-clique with {\geq 1/poly(n)} success probability, but checking whether the number of edges is {\geq \frac 12 {n \choose 2}} solves the distinguishing problem with distinguishing probability order of {k^2/n}. Even if we change our definition of our “non-planted” distribution so that it has {\frac 12 {n \choose 2} + \frac 12 {k \choose 2}} edges on average, the number of triangles can still be used to noticeably distinguish it from the non-planted distribution.

In the stochastic block model {G_{n, p, q}} if we call {a= 2p/n} the expected “internal” degree of a vertex (that is, the number of neighbors on the same side of the planted cut) and {b = 2q/n} the expected “external” degree, and if {a} and {b} are absolute constants, then it is known that the search problem is not solvable, even in an approximate sense, when {a-b < \sqrt{a+b}}, as proved by Mossel, Neeman and Sly (note that our definition of {a} and {b} is off from theirs by a factor of 2). Provided that {a} and {b} are absolute constants and {a> b}, however, one can distinguish {G_{n, p, q}} from {G_{n, \frac 12 \cdot (p+ q) }} with constant distinguishing probability. This is because the number of triangles (counted as ordered sequences of three vertices) in {G_{n, \frac 12 \cdot (p+ q) }} is, on average, {(a+b)^3}, while it is, on average, {(a+b)^3 + (a-b)^3} in {G_{n, p, q}}; furthermore, the variance of both distributions is an absolute constants (those calculations appear in the above paper of Mossel et al.). Now if we have two nonnegative integer random variables {X} and {Y} such that the expectations and variances of both are absolute constants and {\mathop{\mathbb E} X - \mathop{\mathbb E} Y > \Omega(1)}, then there is a threshold {t} such that testing whether a number is {\geq t} is a test that distinguishes {X} and {Y} with constant probability.

As far as I am aware, in all the known cases where the decision problem is efficiently solvable with {1-o(1)} distinguishing probability then it is also known how to efficiently solve the search problem (perhaps approximately) with {1-o(1)} probability, although algorithms that solve the distinguishing problem don’t always give insights into how to solve the search problem. Mossel et al., for example, show that the decision problem for the stochastic block model is efficiently solvable with {1-o(1)} distinguishing probability when {a-b > \sqrt{a+b}}, by counting the number of simple cycles of length around {\log^{1/4} n}, and they could use the approach of counting simple cycles to approximate {a} and {b}, but they could not show how to solve the search problem in the {a-b > \sqrt{a+b}} regime, a problem that they solved later with additional techniques (the problem was also solved by Massoulie independently).

In summary, efficiently solving the decision problem is a necessary but not sufficient condition for being able to efficiently solve the search problem, and it is a good first step to understand the complexity of a given planted problem.

Another approach to distinguish the {G_{n, \frac 12}} distribution from the planted {k}-clique distribution is to able to certify, given a graph sampled from {G_{n, \frac 12}}, that its maximum clique size is {\leq k-1}, a fact that is going to be true with high probability if {k >> 2\log n}. Assuming that, like in the planted clique problem, we are discussing instances of an optimization problem, we thus have the following problem:

  • The certification problem: given an instance sampled from the “non-planted” distribution, produce a certificate that the optimum value for the instance is worse than the optimum value of all (or all but a negligible fraction of) instances from the planted distribution.

Note that if you can solve the certification with probability {p}, then you can also solve the decision problem with distinguishing probability {p} (or {p} minus a negligible term, if the certified property is allowed to hold with negligible probability in the planted distribution).

In some interesting cases, the certification problem appears to be harder than the decision problem and the search problem.

In the stochastic block model, for example, if one takes the min bisection problem to be the underlying optimization problem, the search problem is solvable (in the sense that one can recover a partition that correlates to the planted partition) whenever {a-b > \sqrt{a+b}}, as discussed above, but for {a-b} very close to {\sqrt{a+b}} it is an open question whether even knowing the value of the min bisection exactly can be used to solve the decision problem.

Another interesting example is “planted Max 3SAT.” Suppose that we construct a satisfiable instance of 3SAT in the following way: we start from a random assignment to {n} variables, and then we pick at random {m} clauses among the {7 {n \choose 3}} clauses that are consistent with the “planted” assignment. If {m} is, say {100 n \log n}, then it is very easy to solve the search problem (and hence the distinguishing problem): the value of every variable in the planted assignment can be deduced by looking at whether the variable appears more often complemented or not complemented. In the non-planted distribution in which we pick {100 n\log n} clauses among all the {8 {n\choose 3}} possible clauses, however, we don’t know any efficient algorithm that certifies that instance is not satisfiable. Even if we pick clauses that are consistent with both the planted assignment and the negation of the planted assignment (thus eliminating correlations between complementations of variables and the values of the variables in the planted assignment), we still introduce pairwise correlations that can used to find the planted assignment. See the comment by Ryan O’Donnell below for more information.

When we are dealing with a hard optimization problem, a natural approach is to study efficiently solvable convex relaxations of the problem. One can hope to use a relaxation to solve the certification problem, by showing the relaxation provides a bound to the optimum in non-planted instances that is sufficient to distinguish them from planted instances. One can also hope to use a relaxation to solve the search problem, by showing that the optimum of the relaxation is close, in appropriate sense, to the planted solution, or even that the planted solution is the unique optimum of the relaxation for most planted instances.

Interestingly, in all the examples that I am aware of, a relaxation has been successfully used to solve the certification problem if and only if it has been successfully used to solve the search problem. Intuitively, if a relaxation does not solve the certification problem, it means that there are feasible solutions in the non-planted case whose cost is already better than the cost of the planted solution, and so the planted solution cannot be the optimum, or close to the optimum, of the relaxation in the planted case. For the other direction, if a relaxation solves the certification problem, then the proof that it does can usually be lifted to a proof that all solutions that don’t “correlate” (in an appropriate sense) with the planted solution cannot be optimal solutions in the planted case, allowing one to conclude that the optimum of the relaxation in the planted case correlates with the planted solution.

In summary, although the certification problem can be harder than the search problem and the decision problem, if one wants to use a relaxation to solve the search problem then it is good to start understanding whether the relaxation solves the certification problem.

In the next post we will discuss how we can efficiently certify properties of random instances of optimization problems, and how to turn those results into algorithms that find planted solutions. We will see the key results involve showing that certain matrices associated to the instances are close to their expectations in appropriately defined norms.

Advertisements

7 thoughts on “Search vs Decision vs Certification for Planted Problems

  1. Thanks for this post Luca! It is quite hard to sort these things out…

    A couple of thoughts:

    1) It is very nice of you to attribute the SBM distinguishing result to me and David Steurer, but I think that result, at least for the 2-community SBM in this post, is implicit in the paper of Mossel, Neeman, and Sly (here: https://arxiv.org/abs/1311.4115) explicit in the paper of Bordenave, Lelerge, and Massoulie (https://arxiv.org/pdf/1501.06087.pdf). (The paper by me and David Steurer does phrase the result & proof in a way that it is generalized easily to other models, like the mixed-membership blockmodel.)

    2) What are the parameter regimes for SBM where search is impossible (information-theoretically) but decision is possible? Somehow I thought the continguity results in this other Mossel-Neeman-Sly paper (https://arxiv.org/pdf/1202.1499.pdf) say this regime does not exist, but probably I am misinterpreting those results.

    3) Is it provable that in the regime near the threshold for the blockmodel “the minimal bisection in the planted instance does not correlate to the planted bisection, so an optimal algorithm for the min bisection problem would not help solving the search problem either.” I think that this is suspected by the physicists, but I’m not sure it’s actually known rigorously — as far as I know it could be possible to solve the search problem by maximum-likelihood estimation all the way down to the threshold for SBM. (I realize that I may have misled you about this at some point, and if so I am sorry for that.)

    4) Do you think there is a way to formalize these relationships between the search, certification, and decision? For example, I think there is a canonical way to associate a certification problem to a decision problem: given a pair of probability distributions (P0, P1), the associated certification problem is to design an algorithm which certifies w.p. 1-o(1) over a sample x ~ P0 that Pr_{P1}(x) < o(1). This certification problem is clearly harder than the decision problem. Perhaps there are other such canonical transformations between problem types.

  2. 5) It seems like the canonical explanation for the MAX 3SAT conundrum is that the planted distribution described is not the “hardest” one. There is a distribution supported on satisfiable instances where we don’t think there is a distinguishing algorithm — you can for example draw clauses which are consistent with both x and not x, where x is the hidden assignment. (For general CSPs there is some method to generate such a distribution based on t-wise independence.)

  3. Hi Sam, thanks for all the comments.

    I’ll update the text about (1).

    Regarding (2), whenever $$p>q$$ you have a slightly different average number of triangles. For example, take $$p = 2a/n$$ and $$q = 2b/n$$ where $$a$$ and $$b$$ are absolute constants such that $$b-a = \sqrt{a+b} / 100$$, so well below the reconstruction threshold. Then, the number of ordered triples of vertices that form a triangle is $$n^3 \cdot (a+b)^3/n^3 = (a+b)^3$$ in the non-planted distribution and a different absolute constant in the planted case. The difference in averages is smaller than the standard deviation, but there is still a test whose distinguishing probability is inverse polynomial (actually, I think the distinguishing probability is a small constant).

    For (3), I think the min-bisection in $$G_{n,p}$$ is now known rigorously and it is more than the “half-Ramanujan” bound that would be required to make maximum likelihood work, I’ll look for the reference but maybe another commentator will point it out sooner.

    (4) yes indeed certification is harder than distinguishing

    Finally, for (5) I have vague memory that one could find the planted assignment even in the formula sampled as you describe, but probably I am confusing it with something else

  4. Regarding (2) — the existence of a test which succeeds with inverse polynomial probability is quite interesting! I think success with small constant probability should not be possible due to the contiguity result in the Mossel-Neeman-Sly paper.

    For (3) I am not sure I follow — I think Dembo, Montanari, and Sen can tell you the extremal cuts in G(n,p) (https://arxiv.org/abs/1503.03923) but what does this say about whether the min bisection in G(n,p,q) correlates with the planted partition? Or even whether the min bisection in G(n,p,q) is sparser than in G(n,p)? I think all it says is that the min bisection in G(n,p) is sparser than the planted partition in G(n,p,q), but the planted partition could conspire with some fluctuations to create an even sparser bisection.

    (5) Maybe I am not describing the right distribution, but I think that if you get the right planted 3sat distribution there should not be a simple (computable by a low-degree polynomial in the instance variables) way to find the values of variables. (This is usually essential for the “pseudocalibration” approach to sos lower bounds to work.) For 3sat specifically I think you can use the distribution you get from taking a random 3xor instance and pushing it through the usual 3xor –> 3sat reduction.

  5. (5) Yes, if you start from 3XOR and you do the reduction that you get an instance of 3SAT that is hard for SoS, but it is easy for Gaussian elimination. I think that it is an open question to come up with a distribution of satisfiable instances of 3SAT that can be reasonable described as “random with a planted assignment” and that are hard to satisfy.

    I think I remember what was the issue if you start from an assignment and you pick from the clauses that are consistent with the assignment and its complement: when two variables occur together in a clause, their pattern of complementations are correlated with whether the variables are equal or different in the planted assignment. This already gives you a distinguishing algorithm when m = 1,000*n, because you can set up a 2LIN problem that is random in the random case and roughly 3/4-satisfiable in the planted case. I am guessing that a good approximation of the 2LIN problem will give you an approximation of the planted assignment in the planted case, and then you can find the actual assignment with an appropriate greedy clean-up phase.

  6. (3) Yes, now I see that all you can rigorously say is that the planted partition is not the optimal min bisection solution, but, of course it isn’t or else we would be in the exact reconstruction regime.

  7. The hard(-seeming) distribution for random 3SAT is as follows:

    Let lambda_2 be a parameter between 0 and 1/6.
    Pick a random solution planted solution s in {0,1}^n.
    Pick m = alpha n random 3-tuples of variables.
    For each 3-tuple, randomly pick a literal pattern according to the following recipe:
    . the resulting clause should have 1 correct literal (vis-a-vis s) with probability 3/4 – (3/2)lambda_2
    . the resulting clause should have 2 correct literals (vis-a-vis s) with probability 3lambda_2
    . the resulting clause should have 3 correct literals (vis-a-vis s) with probability 1/4 – (3/2)lambda_2

    The point here is that each variable appearance is equally likely to be positive or negative.

    Note that if lambda_2 = 1/6 then you’re doing a naive-planted-3NAE-SAT instance, and if lambda_2 = 0 then you’re doing a naive-planted-3XOR-SAT instance.

    In fact, to make the instance hard you should choose lambda_2 to strictly positive but small. Jia–Moore-Strain had some reason for suggesting lambda_2 = sqrt(5)/2 – 1 = .118, but in fact this is a bad idea; it seems statistical physics algorithms ‘should’ work for this choice.

    Instead, it seems that any positive lambda_2 at most .09 should be okay, and then you should take the clause density alpha bigger than the “usual” 3SAT threshold, 4.2667. See the paper “Reweighted Belief Propagation and Quiet Planting for Random K-SAT” by Florent Krzakala, Marc Mézard, and Lenka Zdeborová for a full investigation.

    An interesting (to me) open problem: Suppose you choose a random planted 3SAT instance in the most naive way, with clause density alpha. By that I mean, pick a random planted solution and then pick alpha n random 3-clauses consistent with it. Flaxman showed there is a constant alpha_1 and a polynomial-time algorithm that provably finds a solution (whp) when alpha exceeds alpha_1. On the other hand, it’s not hard to show there’s a positive constant alpha_0 and a polynomial-time algorithm that provably finds a solution when alpha is at most alpha_0. Can one show that for _all_ constant alpha, there is a polynomial-time algorithm that provably finds a solution? (I know how to code a simple algorithm that works in practice, but don’t know how to prove it.)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s