Scribed by Jeff Xu
In which we discussed planted clique distribution, specifically, we talked about how to find a planted clique in a random graph. We heavily relied upon our material back in lecture 2 and lecture 3 in which we covered the upper bound certificate for max clique in . At the end of this class, we wrapped up this topic and started the topic of -SAT.
1. Planted Clique
To start with, we describe a distribution of graphs with a planted clique. Suppose that we sample from and we want to modify s.t. it has a size clique, i.e., we have a clique with . The following code describes a sampler for the distribution.
- Pick a subset of vertices from s.t.
- Independently for each pair , make an edge with probability
Note: We are only interested in the case , which is the case in which the planted clique is, with high probability, larger than any pre-existing clique
1.1. Finding the planted clique when
When , finding the planted clique is easy because the vertices in the planted clique are precisely the vertices of higher degree.
Lemma 1 In , w.h.p., for every vertex , deg() .
Proof: For each vertex in a graph , we have = sum of random bits, which is simply a Binomial distribution. By Chernoff bound,
For this probability to be upper bounded, say by , we can fix s.t. and this completes the proof that with high probability, a vertex in random graph has degree .
Now we consider a vertex in the planted clique .
Claim 1 In a graph with a planted clique coming from a random graph to which we add all the edges necessary to make a clique, each node in will receive added edges w.h.p. over the sampling of graph.
Proof: Again, we regard the number of neighbors of a vertex in as a sum of Bernoulli distribution and denote it by . By Chernoff bound, we obtain an upper bound on the probability that a vertex has more than neighbors in in a random graph.
Since the probability is exponentially small in , we can conclude that a node in , with high probability, has less than edges in the original random graph, and thus at least edges will be added to each vertex in .
Corollary 2 In a graph with a planted clique, a vertex in will have degree . (Note: we have in this example)
Therefore, we show that in graph with a large planted clique, we can distinguish it from distribution by the existence of node with large degree, i.e., degree over .
1.2. Distinguish Planted Clique Distribution with
Moving on to the case in which is of the order of , we first show how to distinguish graphs sampled from the planted clique distribution from random graphs.
Say that , and let be the adjacency matrix of a graph from the planted clique distribution. Then
Now, recall he following theorem from Lecture 2.
Theorem 3 If is the adjacency matrix of a graph, w.h.p., .
Therefore, we can identify graph with a planted clique from a random distribution using this method.
1.3. Uniqueness of Maximum Clique in Planted Clique Distribution
In order to show that we can find the planted clique from a graph, we want to prove first that the maximum clique in planted clique Distribution is unique. In other words, we want to prove that the planted clique is the maximum clique in planted clique distribution. We first prove the following lemmas:
Proof: This is largely similar to Lemma 1. We see this as a sum of random bits and by Chernoff bound, we have:
For this probability to be upper bounded, say by , we can choose a s.t. . Therefore, we pick and this completes the proof that, with high probability, each vertex not in the planted clique has less no more than neighbors in the planted clique.
Lemma 5 sampled from , w.h.p., has a largest clique of size . (proved in lecture 2)
Claim 2 Under the above assumptions, is the unique clique of size in .
Proof: Suppose, for the sake of contradiction, we find a clique s.t. . Since are both cliques by assumption, is also clique. has a largest clique of size w.h.p., so since it is a clique in . Consider a vertex : the number of ‘s neighbors in is at least , but this contradicts Lemma 4, which states that should have no more than neighbors in .
1.4. Finding the Planted Clique
Now that we have shown the uniqueness of maximum clique, we want to proceed and show that we can find the planted clique. Let be the adjacency matrix of a random graph with a planted clique of size , and let be the maximizer of
We will show below that is close to the indicator vector of . First, we need to note that we are no longer using the sampling method described earlier to attain a planted clique distribution. Alternatively, we sample our graph from random distribution, pick a subset of vertices from and add to it the necessary edges to make a clique. From this point of view, we have , where is the graph with a planted clique and is a distribution of edges that we need to add. We can then represent the adjacency matrix of as:
where and .
By the theorem shown in lecture 2 (which we just recapped above), we have the following equations with high probability:
Now we combine the equations listed above, and wlog, let .
With from above, we have
Therefore, . With shown above, we can conclude that
and, up to passing to , which has the same set of largest entries in absolute value, we have , for sufficiently large . This means that, up to scaling, and are nearly identical.
Let be the set of largest entries of , and hence of , breaking ties arbitrarily, and let be the threshold value for membership in (that is, for all and for all ). Suppose that there are elements of that are not in , and hence elements not in that are in . Then
And we conclude that , that is, contains at least of the elements of .
We now present an algorithm to find the planted clique. Let be the set of vertices that has largest . We then consider each vertex . If , by our proof above, it should have neighbors in L. If , should have neighbors in . Therefore, we can easily verify a vertex is in by looking at its number of neighbors in , and this gives us an algorithm.
- adjacency matrix of
- eigenvector of largest eigenvalue to
- set of vertices with largest
- clique set of vertices with at least neighbors in
It is still an open problem to find a planted clique of size
2. Random -SAT and Proof of Unsatisfiability
We start the topic on random -SAT formulas. In -SAT problem, we are trying to decide whether a formula in CNF with each clause containing up to literals is satisfiable. We note in class that checking satisfiability for randomly generated equations is hard even in average case. Similar to the model, we generate a -SAT formula on variable with parameter s.t. each of the clauses exists with probability . Besides the model above, we also briefly mention another model, in which we randomly pick of the possible clauses. (Note: these two models are closely related when we have ). To gain more insights, we discussed the example of -SAT problem. It has an expected number of satisfying assignments . We also observe that for any -SAT instance , we have which goes to if .