Scribed by Theo McKenzie
In which we study the spectrum of random graphs.
1. Overview
When attempting to find in polynomial time an upper bound certificate on the max cut and maximum independent set of a graph, we have used the following property. If , then with high probability , where is the spectral norm. Generally, if and then w.h.p.
Today we will prove how to obtain the bound in Proposition 1 with an extra term of , as well as show an outline of the method of finding the bound in Proposition 1. We will also show how when is small this bound breaks down, namely how when ,
2. Introducing the Trace
Henceforth signifies . Take symmetric and real. All eigenvalues of this matrix are real, and we can enumerate them such that .
The trace is defined to be where is an matrix.
Moreover we know that \textnormal{Tr}. If we take large and even, the eigenvalues of are . Therefore we have
Moreover we have
This gives us an estimation of the norm, , which for gives a constant factor approximation of .
3. Using the Trace to Bound the Spectral Norm
Assume that and is the adjacency matrix of . We will prove the following. is bounded above by . If , by taking the th root we achieve a bound of on .
3.1. Expected Value of Matrix Entries
First, we examine the matrix . We have and with equal probability of each when . Moreover . If if is odd and for even.
by the linearity of expectation and symmetry between the entries. We evalute .
where represents the intermediate steps on a “path” between vertices that starts at 1 and returns to 1. For example, . Note that we can repeat edges in these paths. By the linearity of expectation
If any pair occurs times in the sequence of pairs , where is odd, then as the value of this term is independent from all other terms and for odd , then . If all pairs occur an even number of times, their product’s expectation is 1. Therefore is the number of sequences such that, in the sequence of pairs , each pair occurs an even number of times.
3.2. Encoding argument
In order to give an upper bound on the number of such sequences, we will show how to encode a sequence where there are distinct edges. In the sequence , the element is represented either as , which takes bits, if appears for the first time in the sequence at location , and as otherwise, where is such that , which requires bits. Notice that, if occurs for the first time at location , then the pair also occurs for the first time at the location and . Thus the number of times that we encounter a vertex for the first time is at most the number of distinct edges. If we have distinct vertices (other than vertex 1), then we are using ; for , this value increases with , but we have (because every edge has to appear an even number of times and so there can be at most distinct edges. This means that we use at most bits in the encoding. The number of strings that can be encoded using at most bits is . If we assume , we have the bound , meaning
Therefore using suitable and we achieve our bound on . For example, choose and . We use Markov’s inequality to obtain
4. Tightening the Bound
To obtain the sharper bound of , we need to count the number of pairs more sharply and remove the term, namely improve the way we talk about repetitions. Here we give an outline for how to find a tighter bound.
The worst case in the above analysis is when the number of distinct vertices (not counting vertex ) is maximal, which is . In that case, the number of distinct “edges” is , and they must form a connected graph over vertices, that is, they have to form a tree. Furthermore, each edges is repeated exactly twice in the closed walk, otherwise we would not have enough distinct edges to connect distinct vertices.
If the pairs form a tree, then the only way we can have closed walk in which every edge is repeated twice is that the closed walk is a depth-first visit of the tree. In this case, we can improve our encoding in the following way. In a depth-first visit of a tree only two events are possible at each step: either we discover a new vertex, or we backtrack on the edge between the current node and the parent node. Thus we only need to pay bits to encode a new node in the sequence and bit to encode an already seen node, and we obtain a bound of . By taking the th root we obtain a bound on of .
5. Generalizing to any
Now assume and is the adjacency matrix of . We also assume . We define
In this matrix and if with probability and with probability . Therefore . In fact, for all .
From this we see we need to sum over sequences such that the multiset has each pair occuring at least two times, as if any pair occurs once, the expectation is .
Therefore the bound is
where is the number of distinct pairs and the sum is taken over multisets where each pair occurs at least twice. For large , the number of sequences where each pair occurs at least twice with distinct pairs is approximately . This would give us
so the bound on is . However, the bound on the number of sequences with distict pairs breaks down when is much smaller than . In a full proof much more complicated calculations must be done.
6. Problems with sparse graphs
If , then w.h.p.
This breaks down the nice bound we obtained in section 5. This follows from the irregularity of sparse graphs. There will be isolated vertices and vertices with degree much higher than average.
Lemma 1 If then w.h.p. the highest degree vertex of is of order .
If G has a node of degree , then, for every , . This implies that .
Proof: We have
where the maximum is taken over all nonzero vectors . Call a node of degree and call of its neighbors .
Consider the vector such that and for other vertices . We have
Therefore if ,
yielding the desired bound.
Theorem 4 proceeds immediately from Proposition 5 and Lemma 1.
In which we introduce semidefinite programming and apply it to Max Cut.
1. Overview
We begin with an introduction to Semidefinite Programming (SDP). We will then see that, using SDP, we can find a cut with the same kind of near-optimal performance for Max Cut in random graphs as we got from the greedy algorithm — that is,
in random graphs . More generally, we will prove that you can always find a cut at least this large in the case that G is triangle-free and with maximum vertex degree , which will imply the bound in random graphs. We will also see how to use SDP to certify an upper bound:
with high probability in
Methods using SDP will become particularly helpful in future lectures when we consider planted-solution models instead of fully random graphs: greedy algorithms will fail on some analogous problems where methods using SDP can succeed.
2. Semidefinite Programming
Semidefinite Programming (SDP) is a form of convex optimization, similar to linear programming but with the addition of a constraint stating that, if the variables in the linear program are considered as entries in a matrix, that matrix is positive semidefinite. To formalize this, we begin by recalling some basic facts from linear algebra.
2.1. Linear algebra review
Definition 1 (Positive Semidefinite) A matrix is positive semidefinite (abbreviated PSD and written ) if it is symmetric and all its eigenvalues are non-negative.
We will also make use of the following facts from linear algebra:
where the are orthonormal eigenvectors of the .
and the optimization problem in the right-hand side is solvable up to arbitrarily good accuracy
This gives us the following lemmas:
Lemma 2 if and only if for every vector we have .
Proof: From part (2) above, the smallest eigenvalue of M is given by
Noting that we always have , then if and only if the numerator on the right is always non-negative.
Lemma 3 If , then
Proof: , . By Lemma 2, this implies .
Lemma 4 If and , then
Proof: , . By Lemma 2, this implies .
2.2. Formulation of SDP
With these characterizations in mind, we define a semidefinite program as an optimization program in which we have real variables , with , and we want to maximize, or minimize, a linear function of the variables such that linear constraints over the variables are satisfied (so far this is the same as a linear program) and subject to the additional constraint that the matrix is PSD. Thus, a typical semidefinite program (SDP) looks like
where the matrices and the scalars are given, and the entries of are the variables over which we are optimizing.
We will also use the following alternative characterization of PSD matrices
Lemma 5 A matrix is PSD if and only if there is a collection of vectors such that, for every , we have .
Proof: Suppose that and are such that for all and . Then is PSD because for every vector we have
Conversely, if is PSD and we write it as
we have
and we see that we can define vectors by setting
and we do have the property that
This leads to the following equivalent formulation of the SDP optimization problem:
where our variables are vectors . This is the statement of the optimization problem that we will most commonly use.
2.3. Polynomial time solvability
From lemmas 3 and 4, we recall that if and are two matrices such that and , and if is a scalar, then and . This means that the set of PSD matrices is a convex subset of , and that the above optimization problem is a convex problem.
Using the ellipsoid algorithm, one can solve in polynomial time (up to arbitrarily good accuracy) any optimization problem in which one wants to optimize a linear function over a convex feasible region, provided that one has a separation oracle for the feasible region: that is, an algorithm that, given a point,
In order to construct a separation oracle for a SDP, it is enough to solve the following problem: given a matrix , decide if it is PSD or not and, if not, construct an inequality that is satisfied by the entries of all PSD matrices but that is not satisfied by . In order to do so, recall that the smallest eigenvalue of is
and that the above minimization problem is solvable in polynomial time (up to arbitrarily good accuracy). If the above optimization problem has a non-negative optimum, then is PSD. If it is a negative optimum , then the matrix is not PSD, and the inequality
is satisfied for all PSD matrices but fails for . Thus we have a separation oracle and we can solve SDPs in polynomial time up to arbitrarily good accuracy.
3. SDP Relaxation of Max Cut and Random Hyperplane Rounding
The Max Cut problem in a given graph has the following equivalent characterization, as a quadratic optimization problem over real variables , where :
We can interpret this as associating every vertex with a value , so that the cut edges are those with one vertex of value and one of value .
While quadratic optimization is NP-hard, we can instead use a relaxation to a polynomial-time solvable problem. We note that any quadratic optimization problem has a natural relaxation to an SDP, in which we relax real variables to take vector values and we change multiplication to inner product:
Figure 1: The hyperplane through the origin defines a cut partitioning the vertices into sets and .
Solving the above SDP, which is doable in polynomial time up to arbitrarily good accuracy, gives us a unit vector for each vertex . A simple way to convert this collection to a cut is to take a random hyperplane through the origin, and then define to be the set of vertices such that is above the hyperplane. Equivalently, we pick a random vector according to a rotation-invariant distribution, for example a Gaussian distribution, and let be the set of vertices such that .
Let be an edge: One sees that if is the angle between and , then the probability is cut is proportional to :
and the contribution of to the cost function is
Some calculus shows that for every we have
and so
so we have a polynomial time approximation algorithm with worst-case approximation guarantee .
Next time, we will see how the SDP relaxation behaves on random graphs, but first let us how it behaves on a large class of graphs.
4. Max Cut in Bounded-Degree Triangle-Free Graphs
Theorem 6 If is a triangle-free graph in which every vertex has degree at most , then
Proof: Consider the following feasible solution for the SDP: we associate to each node an -dimensional vector such that , if , and otherwise. We immediately see that for every and so the solution is feasible.
For example, if we have a graph such that vertex 1 is adjacent to vertices 3 and 5:
2 | 3 | 4 | 5 | |||
0 | 0 | |||||
0 | 0 | 0 | 0 | |||
0 | 0 | 0 | ||||
0 | 0 | 0 | 0 | 0 |
Let us transform this SDP solution into a cut using a random hyperplane.
We see that, for every edge we have
The probability that is cut by is
and
so that the expected number of cut edges is at least .
Scribed by Keyhan Vakil
In which we complete the study of Independent Set and Max Cut in random graphs.
1. Maximum Independent Set
Last time we proved an upper bound of to the probable value of the maximum independent set in a random graph. This bound also holds if is a function of . There is a simple greedy algorithm which can be shown to achieve an independent set of size where is the average degree of the graph. For a random graph, this gives us an independent of size . However we will see how to specialize this analysis to sparse random graphs, and close the remaining gap between the probable value and the greedy algorithm.
Consider the greedy algorithm below.
1.1. First attempt
We might try to model our analysis of this algorithm based on our discussion from Lecture~2.
To wit, let be the set of vertices not in which have no neighbors in . Let be the size of when contains vertices. If , then our algorithm outputs an independent set of size . Therefore we can determine the expected size of the algorithm’s output (up to a constant factor) by determining such that .
Now we determine . A proportion of vertices are connected to the th vertex in expectation. Of the vertices, we expect that of them will remain unconnected to all the vertices in . This gives us that , and by induction .
Let be such that . Then:
We conclude that our independent set has expected size . However if we take , that would lead us to believe that we could get an independent set of size in a graph with only vertices, which is impossible.
The error is that should be , not . Note that once we add the th vertex to , it can no longer be in by definition. When is a constant, the difference is negligible, but when is small then the difference becomes more significant.
It is possible to salvage this analysis, but the result is less elegant. Instead we will now present a different analysis, which will also let us conclude more about higher moments as well.
1.2. Analysis of the greedy algorithm
To analyze the algorithm, consider the following random variables: let be the number of for-loop iterations between the time the -th element is added to and the time the -th element is added to . We leave undefined if the algorithm terminates with a set of size less than . Thus the size of the independent set found by the algorithm is the largest such that is defined. Consider the following slightly different probabilistic process: in addition to our graph over vertices , we also consider a countably infinite number of other vertices . We sample an infinite super-graph of our graph over this larger vertex set, so that each possible edge has probability of being generated.
We continue to run the greedy algorithm for every vertex of this infinite graph, and we call the (now, always defined) number of for-loop iterations between the -th and the -th time that we add a node to . In this revised definition, the size of the independent set found by algorithm in our actual graph is the largest such that .
Now we will reason about the distribution of . Say that we have vertices in and we are trying to determine if we should add some vertex to . Note that the probability of being disconnected from all of is . So we add a vertex at each iteration with probability , which shows that is geometrically distributed with success probability .
Based on this, we can find the expected value and variance of our sum from before
and likewise
We want to choose so that the sum is at most with high probability. Let
This makes the expected value of the sum and the standard deviation . Thus, if sufficiently fast, the greedy algorithm has a probability of finding an independent set of size , where is a measure of the average degree.
1.3. Certifiable upper bound
We now derive a polynomial time computable upper bound certificate for maximum independent set in . We use the following lemma without proof. Note its similarity to Lemma~2 from Lecture~1.
Lemma 1 If , is sampled from , is the adjacency matrix of , and is the matrix of all ones, then there is a probability that
Since is a real symmetric matrix its spectral norm can be computed as:
If is an independent set of size , then , , and , so that
This bound holds for any independent set, so it also holds for the largest one. If we denote by the size of the largest independent set in , we have that
For a random graph, the above upper bound is with high probability.
2. Max Cut
We will now reconsider Max Cut for the general case . In Lecture~2, we dealt with the special case of . Unlike maximum independent set, our arguments for the case apply to Max Cut without much modification.
2.1. High probability upper bound
Let be a random graph from , and define as a measure of its average degree. We will prove that the size of a maximum cut of is at most with high probability. The proof of this statement is nearly identical to the version in Lecture~2, where it was presented for the case . We know that the expected value of a cut is . By a Chernoff bound, the probability that any particular cut exceeds expectation by an additive factor of is exponentially decreasing by a factor of . By taking and taking a union bound over all possible cuts , we have that our expected cut has value at most with probability .
2.2. Greedy algorithm
Consider the greedy algorithm
Label . Let and be the sets and when vertex is considered in the for-loop. For the purpose of analysis, we delay the random decisions in until a vertex is considered. In particular, we delay the choice of which of is a neighbor until is vertex is considered. Note that no edge needs to be considered twice, and so we can treat each one as an independent biased coin flip.
Let and be the neighbors of in and respectively. We can show that , and so is the gain our algorithm achieves over cutting half the edges.
Now has expectation and variance . Adding over all , the sum of the differences has mean and variance . This gives us an expected gain of with probability. The value of cutting half the edges is approximately . This gives a final value of w.h.p. as stated.
2.3. Certifiable upper bound
Again, we will derive a certifiable upper bound by looking at the spectral norm. If is a cut with value , then we have
so
This means that, in every graph, the maximum cut is upper bounded by
which if is with high probability at most (by Lemma~1).
3. Conclusion
We conclude with the following table, which summarizes our results for a random graph sampled from .
Problem | Expected Value | Greedy Algorithm | Certifiable Upper Bound |
Independent Set | w.h.p. | w.h.p.* | |
Max Cut | w.h.p. | w.h.p.* |
* Note that both certifiable upper bounds require .
Both greedy algorithms perform very well in comparison to the probable value. In Max~Cut, our greedy algorithm is particularly strong, matching our certifiable upper bound up to a lower order term. This supports one of our major theses: while greedy algorithms exhibit poor worst-case performance, they tend to do well over our given distribution.
Scribe: Mahshid Montazer
In this lecture, we study the Max Cut problem in random graphs. We compute the probable value of its optimal solution, we give a greedy algorithm which is nearly optimal on random graphs and we compute a polynomial time upper bound certificate for it using linear algebra methods. We also study the problem of Maximum Independent Set in random graphs and we compute an upper bound to the probable value for its optimal solution.
1. Max Cut
Definition 1 Max Cut: In an un-weighted graph , a cut is defined as a partition of its vertices into two sets and . Let be the size of the cut which is the number of the edges with one endpoint in and one endpoint in . Max Cut is the the problem of finding a cut of largest size.
To give a clear example, in every bipartite graph, a bipartition is a maximum cut. It is easy to show that the size of the maximum cut would be at least half of the number of the graph edges. One question that arises here is that how much more than half of the edges can we cut. The answer is: not that much in random graphs. We will show this claim in the following section.
2. Probable Value of Max Cut Optimal Solution
In this section, we compute the probable value of Max Cut optimal solution in random graphs. Our result is for samples of , but the analysis will generalize to .
Proof:
Proof: The proof is by applying Chernoff bounds on the result of lemma 2.
Lemma 4 There is a constant such that
where and the probability is taken over the choice of from the distribution .
for an appropriate choice of .
The above lemma clearly leads us to the following theorem.
Theorem 5 There is a constant such that w.h.p. Max Cut in is of size at most
Thus, we showed that in , the probable value of Max Cut is at most .
3. Greedy Algorithm for Max Cut
Consider the following greedy algorithm for Max Cut:
The above algorithm can be applied to any graph, but we will analyze it on random graphs. A naive analysis of the algorithm guarantees that our greedy algorithm cuts at least half of the edges, giving us an approximation ratio of 2. The reason is that at each step, we add at least half of the processing vertex’s incident edges to the cut. However, a more careful analysis of the algorithm shows that it is near-optimal for random graphs. Below, we prove our claim for .
Lemma 6 With high probability over the choice of from , the greedy algorithm finds a cut of size .
Proof: Let be the given graph and let be the order in which we process the vertices. Note that at the time of processing , we do not need to know the edges that connect to any vertex . Let and be the size of sets and before processing , respectively. Although is given before we run the algorithm, for the sake of the analysis, we can assume that we are building it on the go and while processing each of the vertices. Remember that each edge of the graph would exists independently with probability . For deciding where to put , we generate random bits and call their summation . We also generate random bits and call their summation . We put in set (respectively, ) if (respectively, ). Note that the more balanced and get, the worse it gets for the analysis. Also, note that the extra edges that the algorithm cuts other than half of the edges would be:
We know that
Note that
and
Thus, we have that has mean and standard deviation . Thus, with probability we have:
4. Polynomial Time Upper Bound for Max Cut
In this section, we find polynomial time upper bound certificates for Max Cut in random graphs using linear algebra techniques.
Lemma 7 Let be a graph, be its adjacency matrix, be the matrix all whose entries are 1 and be the Max Cut of . Then
Proof: we have:
Recall that, with high probability over the choice of a graph from , if is the adjacency matrix of then we have with high probability.
We conclude that, with high probability over the choice of from we can find in polynomial time a certificate the max cut optimum of is at most .
5. Maximum Independent Set
In this section, we discuss the Maximum Independent Set problem for (especially ) and we show its close connection with Max Clique problem. Finally, we compute its optimal solution’s probable value.
Definition 8 Maximum Independent Set: In a graph , an independent set is a set of vertices that are mutually disconnected. A Maximum Independent Set in is an independent set of largest possible size. The Maximum Independent Set problem is the problem of finding such a set.
Note that the Maximum Independent Set in corresponds to the Maximum Clique in . Thus, for , everything that we argued for Max Clique is usable for Maximum Independent Set as well.
In this section, we compute an upper bound to the probable value of Maximum Independent Set’s optimal solution in .
Fix a set of size . We have
where the probability is over the choice of .
The following lemma holds.
Proof:
Now, what would be the maximum value of such that with high probability we can still make sure that there exists an independent set of size ? Note that the value of (0) goes to 0 when .
A sufficient condition for is to have , showing us that there is a high probability that maximum independent set in is at most . A more careful bound is that we can have provided, say, , and so with high probability the maximum independent set in is at most . If we call , then the bound is
]]>
In which we describe what this course is about and discuss algorithms for the clique problem in random graphs.
1. About This Course
In this course we will see how to analyze the performance of algorithms (such as running time, approximation ratio, or, in the case of online algorithms, regret and competitive ratio) without resorting to worst-case analysis. The class will assume familiarity with basic combinatorics and discrete probability (as covered in CS70), linear algebra (as covered in Math54), and analysis of algorithms and NP-completeness (as covered in CS170). This course is based on on a course by the same name developed by Tim Roughgarden at Stanford, but our choice of topics will be slightly different.
A familiar criticism of the worst-case analysis of algorithms is that it can significantly overestimate the performance of algorithms in practice. For example, quicksort with a fixed pivot choice has worst-case quadratic time, but usually it runs faster than mergesort; hash tables with a deterministic hash function have worst-case linear time per operation, but they usually require constant time per operation; only quadratic-time algorithms are known for edit distance (and sub-quadratic worst-case performance is impossible under standard assumptions) but sub-quadratic running time occurs in practice, especially if one allows approximations; the simplex algorithm for linear programming has worst-case exponential running time in all known implementations but works well in practice; and so on.
In order to make a more predictive, non-worst-case, analysis of algorithms we need to first develop a model of the instances that we will feed into the algorithms, and this will usually be a probabilistic model. In this course we will look at models of various complexity, ranging from simple models involving only one or few (or even zero!) parameters, which are easy to understand but not necessarily a good fit for real-world instances, to more complex models involving a mix of adversarial and probabilistic choices.
We can roughly group the models that we will study in four categories.
Although these models are very simple to describe, they often lead to deep and fascinating questions, and insights gained in these models can be stepping stones to analyses of more realistic models, or even worst-case analyses of randomized algorithms. For example, the analysis that quicksort with fixed pivot choice runs in expected time on random sequences naturally leads to an runtime analysis for quicksort with random pivot choice on worst-case sequences. An understanding of properties of random Gaussian matrices is critical to the smoothed analysis of the simplex, and an understanding of properties of random graphs is the starting point to develop algorithms for more realistic graph generative models, and so on.
These models “break the symmetry” of i.i.d. models. While random fluctuations are the only source of structure in i.i.d. models, here we introduce structure by design. In planted-solution models it is interesting to see if an algorithm is able to find not just any good solution, but the particular solution that was created in the generative process. Usually, this is the case because, relying on our understanding of (1), we can establish that any solution that is significantly different from the planted solution would not be a near-optimal (or in some cases even a feasible) solution.
These models capture problems studied in statistics, information theory and machine learning. Generally, if an existing algorithm that works well in practice can be rigorously proved to work well in a “planted-solution” model, then such a proof provides some insight into what make the algorithm work well in practice. If an algorithm is designed to work well in such a model, however, it may not necessarily work well in practice if the design of the algorithm overfits specific properties of the model.
Usually, performance in these models is a good predictor of real-world performance.
For an algorithm to perform well on semi-random graph models, the algorithm must be robust to the presence of arbitrary local structures, and generally this avoids the possibility of algorithms overfitting a specific generative model and performing poorly in practice.
In numerical optimization problems such as linear programming, the numerical values in the problem instance come from noisy measurements, and so it is appropriate to model them as arbitrary quantities to which Gaussian noise is added, which is exactly the model of smoothed analysis.
(Note that here we are straining the notion of what it means to go “beyond worst-case analysis,” since we are essentially doing a worst-case analysis over a subset of instances.)
We will see examples of all the above types of analysis, and for some problems like min-bisection we will work our way through each type of modeling.
We will study exact algorithms, approximation algorithms and online algorithms, and consider both combinatorial and numerical problems.
At the end of the course we will also do a review of average-case complexity and see how subtle it is to find the “right” definition of efficiency for distributional problems, we will see that there is a distribution of inputs such that, for every problem, the average-case complexity of the problem according to this distribution is the same as the worst-case complexity, and we will see some highlights of Levin’s theory of “average-case NP-hardness,” including the surprising roles that hashing and compressibility play in it.
The course will be more a collection of case studies than an attempt to provide a unified toolkit for average-case analysis of algorithms, but we will see certain themes re-occur, such as the effectiveness of greedy and local search approaches (which often have very poor worst-case performances) and the power of semidefinite programming.
2. Clique in Random Graphs
We will start by studying the Max Clique problem in random graphs, starting from the simplest case of the distribution, which is the uniform distribution over all undirected graphs on vertices.
2.1. The typical size of a largest clique
A first fact about this problem, is that, with probability, the size of the maximum clique of a graph sampled from is where the logarithm is to base 2. (This will be our standard convention for logarithms; we will use to denote logarithms in base .)
We will not provide a full proof, but note that the expected number of cliques of size in a graph sampled from is
which is at most and, if , it is at most . By applying Markov’s inequality, we get that there is a that a graph sampled from has a clique of size at most than . On the other hand, (1) is at least
and if, for example, we choose , we see that the above quantity goes to infinity like . Thus there is an expected large number of cliques of size in a random graph. This is not enough to say that there is at least one such clique with probability tending to 1, but a second-moment calculation would show that the standard deviation of the number of cliques is small, so that we can apply Chebyshev’s inequality.
2.2. The greedy algorithm
How about finding cliques in ? Consider the following simple greedy algorithm: we initialize a set to be the empty set, and then, while is non-empty, we add an (arbitrary) element of to , and we delete and all the non-neighbors of from . When is empty, we output .
The algorithm maintains the invariants that is a clique in and that all the elements of are neighbors of all the elements of , so the algorithm always outputs a clique.
Initially, the set has size and is empty and, at every step, increases by 1 and , on average, shrinks by a factor of 2, so that we would expect to have size at the end. This can be made rigorous and, in fact, the size of the clique found by the algorithm is concentrated around .
In terms of implementation, note that there is no need to keep track of the set (which is only useful in the analysis), and a simple implementation is to start with an empty , scan the nodes in an arbitrary order, and add the current node to if it is a neighbor to all elements of . This takes time at most , where is the size of the clique found by the algorithm and one can see that in the expected running time of the algorithm is actually .
So, with probability, the greedy algorithm finds a clique of size , and the maximum clique has size at most meaning that, ignoring low-probability events and lower-order terms, the greedy algorithm achieves a factor 2 approximation. This is impressive considering that worst-case approximation within a factor is NP-hard.
Can we do better in polynomial time? We don’t know. So far, there is no known polynomial time (or average polynomial time) algorithm able to find with high probability cliques of size in random graphs, and such an algorithm would be considered a breakthrough and its analysis would probably have something very interesting to say beyond the specific result.
2.3. Certifying an upper bound
Approximation algorithms with a worst-case approximation ratio guarantee have an important property that is lost in an average-case analysis of approximation ratio like the one we sketched above. Suppose that we have a 2-approximation algorithm for a maximization problem that, for every instance, finds a solution whose cost is at least half the optimum. Then, if, on a given instance, the algorithm finds a solution of cost , it follows that the analysis of the algorithms and the steps of its execution provide a polynomial time computable and checkable certificate that the optimum is at most . Note that the optimum has to be at least , so the certified upper bound to the optimum is off at most by a factor of 2 from the true value of the optimum.
Thus, whenever an algorithm has a worst-case approximation of a factor of , it is also able to find upper bound certificates for the value of the optimum that are off at most by a factor of .
This symmetry between approximate solutions and approximate upper bounds is lost in average-case analysis. We know that, almost always, the optimum of the Max Clique problem in is about , we know how to find solutions of cost about , but we do not know how to find certificates that the optimum is at most , or even or . The best known polynomial time certificates only certify upper bounds of the order , with the difference between various methods being only in the multiplicative constant. There is also some evidence that there is no way to find, in polynomial time, certificates that most graphs from have Maximum Clique upper bounded by, say, .
We will sketch the simplest way of finding, with high probability, a certificate that the Maximum Clique of a graph is at most . Later we will see a more principled way to derive such a bound.
Given a graph sampled from , we will apply linear-algebraic methods to the adjacency matrix of . A recurrent theme in this course is that , with high probability, will “behave like” its expectation in several important ways, and that this will be true for several probabilistic generative models of graphs.
To capture the way in which is “close” to its expectation, we will use the spectral norm, so let us first give a five-minute review of the relevant linear algebra.
If is a symmetric real valued matrix, then all its eigenvalues are real. If we call them , then the largest eigenvalue of has the characterization
and the smallest eigenvalue of has the following characterizations
The largest eigenvalue in absolute value can be similarly characterized as
The spectral norm of a square matrix is its largest singular value, and is characterized as
if is symmetric and real valued, then is the largest eigenvalue in absolute value, so we have
Furthermore, the spectral norm of a symmetric matrix can be determined up to approximation in polynomial time.
We have the following simple fact.
Lemma 1 Let be a graph, its adjacency matrix, be the matrix all whose entries are 1, and be the size of the largest clique in . Then
Proof: Let be a clique of size and let be the indicator vector of . Then
and
so
Noting that , we have
Note that is essentially the average of (to be precise, is the average of , but adding or subtracting changes the spectral norm by at most ) so it remains to show that is usually close in spectral norm to its average. The following bound is known, and best possible up to the value of the constant .
Lemma 2 There is a constant such that, with probability, if we sample from and we let be the adjacency matrix of , we have
More specifically, it is known that with high probability we have .
Thus, with high probability, we can certify in polynomial time that a graph sampled from has Max Clique at most .
]]>
Yesterday, Norbert Blum posted a preprint with a claimed proof that . An incorrect comment that I made last night and corrected this morning caused a lot of confusion, so let me apologize by summarizing the claims in the paper.
Coincidentally, this week, there is an Oberwolfach workshop on proof complexity and circuit complexity, so I am confident that by the end of the week we will hear substantive comments on the technical claims in the paper.
Recall that if a decision problem is solvable by an algorithm running in time on inputs of length then, for every , it is also solved, on inputs of length , by a circuit of size . Thus, if a problem is solvable in polynomial time it is also solvable by a family of polynomial size circuits (or, in short, it has polynomial circuit complexity).
The paper claims that Clique has exponential circuit complexity, and hence it has no polynomial time algorithm and . The argument leverages known results from monotone circuit complexity.
A monotone circuit is a boolean circuit that has only AND gates and OR gates, but does not have any NOT gates. A decision problem is a monotone problem if, for every fixed input length, it is solvable by a monotone circuit. For example, the problem to decide if a given -vertex graph has a clique of size at least is a monotone problem (if the input graph is presented as a boolean adjacency matrix), and so is the problem of deciding if a given graph has a perfect matching.
In the 1980s, Razborov proved that Clique cannot be computed by polynomial size monotone circuits. Later Andreev proved that there is a monotone problem in NP that requires exponential size monotone circuits, and Tardos Alon and Boppana proved that Clique itself requires exponential size monotone circuits. At the time, it was conjectured that if a monotone problem is in P, then it is solvable by a family of polynomial size monotone circuits. Under this conjecture, Razborov’s result would imply that Clique is not in P, and hence .
Unfortunately, Razborov refuted this conjecture, by showing that the perfect matching problem, which is in P, does not have polynomial size monotone circuits. Tardos showed that the Alon-Boppana exponential lower bound for clique holds for any monotone function sandwiched between the clique number and the chromatic number of a graph, including the Lovasz Theta function. Since the Theta function is polynomial time computable, and hence has polynomial size circuits, this shows that the gap between monotone circuit complexity and general circuit complexity can be exponentially large. (See first comment below.)
Razborov’s proof of the Clique lower bound for monotone circuits introduced the powerful approximation method. Roughly speaking, his approach was to start from a hypothetical polynomial size family of monotone circuits for Clique, and, from that, build a family of approximating circuits, which are just DNF formulas. The approximating circuits constructed by his method do not solve the Clique problem in general, but they solve a “promise” version of the problem, that is, they solve Clique on a certain subset of graphs. Then Razborov finishes the argument by showing that the approximating circuits are so simple that it is impossible for them to even solve Clique on that subset of inputs, thus reaching a contradiction to the assumption that there are monotone polynomial size circuits for Clique. The approximation method, variously modified, was also used to prove the lower bounds for Andreev’s problem and for matching.
Tim Gowers wrote a wonderful exposition of Razborov’s method, trying to show how one would come up with the various ideas, rather than just presenting the proof step by step.
Berg and Ulfberg simplify the proofs of Razborov, Andreev and Tardos for Clique and Andreev’s problem (but not for matching) by showing how to construct an approximator that has both small DNF complexity and small CNF complexity. The stronger claim makes an inductive argument in the construction easier to establish.
(At this point, I should clarify that I have never properly studies these results, so I am probably getting some details wrong. Please post corrections in the comments.)
The main claim in Blum’s paper is Theorem 6, which claims that if polynomial-size monotone circuits for a monotone problem admit a “CNF-DNF approximator” for a promise restriction of the problem (like the Berg-Ulfberg one), then also general circuits for the same problem admit such an approximator. Thus, if the approximator does not exist, it not only follows that the monotone complexity of the problem is super-polynomial, but also the general circuit complexity of the problem is superpolynomial.
Together with the Berg-Ulfberg approximator for monotone circuits for Clique, this implies that Clique is not in P.
Now, what could possibly go wrong with this argument?
This argument can only applied to (certain) monotone problems, and monotone problems are a vanishing fraction of all problems, hence one does not have the “largeness” property of natural proofs, and the argument is not a natural proof. (This is noted in the paper.)
The argument starts from a boolean circuit for a given problem. If one is given an oracle circuit, with gates that answer oracle queries, or give evaluation of a polynomial extension of the problem, the argument cannot be applied.
It is not known how to make the known lower bound for matching work via a “CNF-DNF approximator” and the claims in the paper only concern monotone lower bounds proved in this way. (Edited to add: but what about the Lovasz theta function?)
Like any no-go theorem, Razborov’s impossibility result makes some assumption on what it means to “apply the approximation method to general circuits” and Blum claims that the assumptions do not apply to his argument.
I don’t have a good answer to this question. All the work is done in the proof of Theorem 6, which is the construction of the approximator starting from an arbitrary circuit. Maybe the argument is right and the heavy lifting was in Razborov’s work and subsequent extension and simplifications, and the fact that one can handle NOT gates at the bottom can be handled with an argument that is syntactically similar to previous work. Or, something goes wrong in the construction. Either way we will probably know soon.
]]>
Our own Pasin Manurangsi received the Danny Lewin STOC Student Paper Award for his work on the hardness of the dense k-subgraph problem. This is the problem in which we are given a graph and a number k, and we want to find the set of k vertices that induces the most edges. Pasin, who is co-advised by Prasad Raghavendra and me, discovered a new, simple but ingenious reduction that establishes hardness up to almost polynomial factors.
I received the same award exactly twenty years ago, also for a hardness-of-approximation result established via a simple reduction. (Prasad also received it, nine years ago, for a hardness-of-approximation result established via a difficult reduction.) I then spent time at MIT, where Oded Goldreich was, and, partly thanks to his influence, I did my best work there. Pasin is spending this summer at Weizmann, where Oded Goldreich is, so, no pressure, but let’s see what happens. . .
Alistair Sinclair received the ACM SIGACT Distinguished Service prize, for his work setting up and leading the Simons Institute for the Theory of Computing.
Those who have been to the institute, that is, almost the whole theoretical computer science community, have seen that it is a place uniquely conducive to do good work. If you stop at think about what it is that makes it so, Alistair’s hand is behind it. The open layout of the second floor, with the whiteboards dividing the space and absorbing sound? Alistair worked closely with the architect, for a year, during the renovation, to make sure that the design would best fit the needs of our community. The friendly, competent and responsive staff? Alistair sat in all the interviews when the staff was recruited, and participates in their performance review. So many things happening and never a conflict? You know whom to thank.
More substantially, almost all the programs that we have had were, to some extent, solicited, and Alistair led the conversations and negotiations with all prospective organizers, shepherding promising concepts to approved programs.
Alistair has also been relentless in making people do things, and making them do things by prescribed deadlines, something that is notoriously difficult in our community. The Simons Institute programs have succeeded, in part, because of the tremendous amount of volunteer work that the organizers donated to our community, and while they would all have been extremely generous with their time in any case, Alistair made sure that they were extra generous. A personal anecdote: I was one of the organizers of one of the Fall 2013 inaugural programs. At that point, I was at Stanford and we were beginning to discuss the idea that I could come back to Berkeley. At some point, around October, I get a phone call from Alistair, and I assume he wants to talk about it. Instead, he goes “you know, I haven’t been seeing you much at the Institute so far. We expect organizers to be around a lot more.” A few months later, I got the offer to move to Berkeley, with a 50% affiliation at the Institute. Even knowing who my boss would be, I enthusiastically accepted.
Oded Goldreich received the Knuth Prize. I have already said how I feel about Oded, so there is no need to repeat myself, but I will add that I am also really happy for the Knuth Prize itself, that has managed to consistently make really good choices for the past 21 years, which is an outstanding record.
Finally, and I can’t believe that it took so long, the paper of Dwork, McSherry, Nissim and Smith, that introduced differential privacy, has been recognized with the Godel prize. I am very happy for them, especially for my matron of honor and former neighbor Cynthia.
Congratulations to all, and by all I don’t mean just the aforementioned awardees, but also our whole community, that nurtures so many great people, inspires so many good ideas, and makes being part of it such a joy (even when Alistair makes me do things).
Silvio has given his kind permission to share the speech, and he has put it in a pdf form that includes the pictures that he used as slides.
Oded has touched countless lives, with his boundless dedication to mentoring, executed with a unique mix of tough love and good humor. He embodies a purity of vision in the pursuit of the “right” definitions, the “right” conceptual point of view and the “right” proofs in the areas of theoretical computer science that he has transformed with his work and his influence.
A turning point in my own work in theoretical computer science came when I found this paper online in the Spring of 1995. I was a second-year graduate student in Rome, and I was interested in working on PCP-based hardness of approximation, but this seemed like an impossible goal for me. Following the publication of ALMSS, there had been an avalanche of work between 1992 and 1995, mostly in the form of extended abstracts that were impossible to understand without an awareness of a context that was, at that point, purely an oral tradition. The aforementioned paper, instead, was a 100+ page monster, that explained everything. Studying that paper gave me an entrance into the area.
Three years later, while i was a postdoc at MIT and Oded was there on sabbatical, he played a key role in the series of events that led me to prove that one can get extractors from pseudorandom generators, and it was him who explained to me that this was, in fact, what I had proved. (Initially, I thought my argument was just proving a much less consequential result.) For the most part, it was this result that got me a good job and that is paying my mortgage.
Like me, there are countless people who started to work in a certain area of theoretical computer science because of a course that Oded taught or a set of lecture notes that he wrote, and countless people whose work was made possible by Oded nudging, or usually shoving, them along the right path.
The last two days have felt a bit like going to a wedding, and not just because I saw friends that I do not get to see too often and because there was a lot to eat and to drink. A wedding is a celebration of the couple getting married, but it is also a public event in which friends and family, by affirming their bonds to the newlyweds, also affirm their bonds to each other.
I was deeply moved by the speeches given by Silvio and Shafi, and really everybody did a great job at telling Oded stories and bringing to life various aspects of his work and personality. But perhaps the most fittingly weird tribute was Benny Chor presenting the Chor-Goldreich paper (the one that introduced min-entropy as a measure of randomness for weak random sources, and the problem of 2-source extraction) using the original 1985 slides.
Speaking of public celebrations, there is less than a month left to register for STOC 2017, the “Theory Fest” that will take place in Montreal in June.
The mention for a major alumni award given by U.C. Berkeley is for excellence in achievement.
Meanwhile, in the episode “Brother, can you spare two dimes?”, Mr. Burns has to come up on the spot with the name for a fake awards, and he comes up with an award for outstanding achievement in the field of excellence.
(You’ll note that the dancers in the video are wearing gold and blue)