Lies, Damns Lies, and Herbert London

I am grading the final projects of my class, I am trying the clear the backlog of publishing all the class notes, I am way behind on my STOC reviews, and in two days I am taking off for a complicated two-week trips involving planes, trains and a rented automobile, as well as an ambitious plan of doing no work whatsoever from December 20 to December 31.

So, today I was browsing Facebook, and when I saw a post containing an incredibly blatant arithmetic mistake (which none of the several comments seemed to notice) I spent the rest of the morning looking up where it came from.

The goal of the post was to make the wrong claim that people have been paying more than enough money into social security (through payroll taxes) to support the current level of benefits. Indeed, since the beginning, social security has been paying individuals more than they put in, and now that population and salaries have stop growing, social security is also paying out retired people more than it gets from working people, so that the “trust fund” (whether one believes it is a real thing or an accounting fiction) will run out in the 2030s unless some change is made.

This is a complicated matter, but the post included a sentence to the extent that $4,500 a year, with an interest of 1% per year “compounded monthly”, would add up to $1,3 million after 40 years. This is not even in the right order of magnitude (it adds up to about $220k) and it should be obvious without making the calculation. Who would write such a thing, and why?

My first stop was a July 2012 post on snopes, which commented on a very similar viral email. Snopes points out various mistakes (including the rate of social security payroll taxes), but the calculation in the snopes email, while based on wrong assumptions, has correct arithmetic: it says that $4,500 a year, with a 5% interest, become about $890k after 49 years.

So how did the viral email with the wrong assumptions and correct arithmetic morph into the Facebook post with the same wrong assumptions but also the wrong arithmetic?

I don’t know, but here is an August 2012 post on, you can’t make this stuff up, Accuracy in Media, which wikipedia describes as a “media watchdog.”

The post is attributed to Herbert London, who has PhD from Columbia, is a member of the Council on Foreign Relation and used to be the president of a conservative think-tank. Currently, he has an affiliation with King’s College in New York. London’s post has the sentence I saw in the Facebook post:

(…) an employer’s contribution of $375 per month at a modest one percent rate compounded over a 40 year work experience the total would be $1.3 million.

The rest of the post is almost identical to the July 2012 message reported by Snopes.

Where did Dr. London get his numbers? Maybe he compounded this hypothetical saving as 1% per month? No, because that would give more than $4 million. One does get about $1.3 million if one saves $375 a month for thirty years with a return of 1% per month, though.

Perhaps a more interesting question is why this “fake math” is coming back after five years. In 2012, Paul Ryan put forward a plan to “privatize” Social Security, and such a plan is now being revived. The only way to sell such a plan is to convince people that if they saved in a private account the amount of payroll taxes that “goes into” Social Security, they would get better benefits. This may be factually wrong, but that’s hardly the point.

Against a 61% Tax Increase on Berkeley Students

Currently, when graduate students work as teaching assistants, the university waives their tuition and pays them a stipend. Under current tax law, students pay income tax “only” on their stipend. A provision in the tax bill currently under consideration would count the waived tuition as income, on which the student would have to pay taxes as well.

A calculation by a Berkeley physics graduate student (source) finds that a student who work as TA for both semesters and the summer, is payed at “step 1” of the UC Berkeley salary scale, and is a California resident, currently pays ​$2,229 in federal income tax, which would become ​$3,641​ under the proposed tax plan, a 61% increase. The situation for EECS students is a bit different: they are paid at a higher scale, which puts them in a higher bracket, and they are often on a F1 visa, which means that they pay the much-higher non-resident tuition, so they would be a lot worse off (on the other hand, they usually TA at most one semester per year). The same calculation for MIT students shows a 240% tax increase. A different calculation (sorry, no link available) shows a 144% increase for a Berkeley EECS student on a F! visa.

This is one of the tax increases that go to fund the abolition of the estate tax for estates worth more than $10.9 million, a reduction in corporate tax rates, a reduction in high-income tax rates, and other benefits for multi-millionaires.

There is also a vox explainer, and articles in inside higher ed and the chronicle of higher education with more information.

If you are a US Citizen, and if you think that graduate students should not pay for the estate tax of eight-figure estates, you should let you representative know. Usually calling, and asking to speak with the staffer responsible for tax policy, is much better than emailing or sending a physical mail. You can find the phone numbers of your representatives here.

If you have any pull in ACM, this is the kind of matter on which they might want to make a factual statement about the consequences for US computer science education, as they did at the time of the travel ban.

Beyond Worst-Case Analysis: Lecture 11

Scribed by Neng Huang

In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.

1. A Brief Review of the Model

First, let’s briefly review the model. We have a random graph {G = (V, E)} with an unknown partition of the vertices into two equal parts {V_1} and {V_2}. Edges across the partition are generated independently with probability {q}, and edges inside the partition are generated independently with probability {p}. To abbreviate the notation, we let {a = pn/2}, which is the average internal degree, and {b = qn/2}, which is the average external degree. Intuitively, the closer are {a} and {b}, the more difficult it is to reconstruct the partition. We assume {a > b}, although there are also similar results in the complementary model where {b} is larger than {a}. We also assume {b > 1} so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

  1. For every {\epsilon > 0}, there exists a constant {c_\epsilon} such that if {a - b > c_\epsilon\sqrt{a+b}}, then we can reconstruct the partition up to less than {\epsilon n} misclassified vertices.
  2. There exists a constant {c} such that if {a-b \geq c \sqrt{\log n}\sqrt{a+b}}, then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every {\epsilon > 0}, there also exists a constant {c_\epsilon'} such that if {a-b < c_\epsilon'\sqrt{a+b}}, then it will be impossible to reconstruct the partition even if an {\epsilon} fraction of misclassified vertices is allowed. Also, the constant {c_\epsilon} will go to infinity as {\epsilon} goes to 0, so if we want more and more accuracy, {a-b} needs to be a bigger and bigger constant times {\sqrt{a+b}}. When the constant becomes {O(\sqrt{\log n})}, we will get an exact reconstruction as stated in the second result.

Continue reading

Corrado Bohm


I was very saddened to hear that Corrado Böhm died today at age 94.

Böhm was one of the founding fathers of Italian computer science. His dissertation, from 1951, was one of the first (maybe the first? I don’t know the history of these ideas very well) examples of a programming language with a compiler written in the language itself. In the 1950s and 1960s he worked at the CNR (an Italian national research institution with its own technical staff), in the IAC (Institute for the Applications of Computing) directed by mathematician Mauro Picone. IAC was the second place in Italy to acquire a computer. In 1970 he moved to the University of Turin, were he was the founding chairman of the computer science department. In 1972 he moved to the Sapienza University of Rome, in the Math department, and in 1989 he was one of the founders of the Computer Science department at Sapienza. He remained at Sapienza until his retirement.

Böhm became internationally known for a 1966 result, joint with Giuseppe Jacopini, in which he showed, roughly speaking, that programs written in a language that includes goto statements (formalized as flow-charts) could be mapped to equivalent programs that don’t. The point of the paper was that the translation was “structural” and the translated program retained much of the structure and the logic of the original program, meaning that programmers could give up goto statements without having to fundamentally change the way they think.

Dijkstra’s famous “Go To Statement Considered Harmful” 1968 letter to CACM had two references, one of which was the Jacopini-Böhm theorem.

Böhm was responsible for important foundational work on lambda calculus, typed functional languages, and the theory of programming languages at large.

He was a remarkable mentor, many of whose students and collaborators (including a notable number of women) became prominent in the Italian community of theory of programming languages, and Italian academia in general.


In the photo above is Böhm with Simona Ronchi, Betti Venneri and Mariangiola Dezani, who all became prominent Italian professors.

You may also recognize the man on the right as a recent recipient of the Turing Award. Silvio Micali went to Sapienza to study math as an undergrad, and he worked with Böhm, who encouraged Silvio to pursue his PhD abroad.

I studied Computer Science at Sapienza, starting the first year that the major was introduced in 1989. I remember that when I first met Böhm he reminded me of Doc Brown from Back to the Future: a tall man with crazy white hair, speaking of wild ideas with incomprehensible technical terms, but with unstoppable enthusiasm.

One year, I tried attending a small elective class that he was teaching. My, probably imprecise, recollection of the first lecture is as follows.

He said that one vertex is a binary tree, and that if you connect two binary trees to a new root you also get a binary tree, then he asked us, how would you prove statements on binary trees by induction? The class stopped until we would say something. After some consultation among us, one of the smart kids proposed “by induction on the number of vertices?” Yes, said Böhm, that would work, but isn’t there a better way? He wanted us to come up by ourselves with the insight that, since binary trees have a recursive definition, one can do induction on the structure of the definition.

In subsequent lectures, we looked (without being told) at how to construct purely functional data structures. I dropped the class after about a month.

(Photo credits: corradobohm.it)

Beyond Worst-Case Analysis: Lecture 10

Scribed by David Dinh

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

Continue reading

Beyond Worst-Case Analysis: Lecture 9

Scribed by Chinmay Nirkhe

In which we explore the Stochastic Block Model.

1. The {G_{n,p,q}} problem

The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the {G_{n,p,q}} problem.

Definition 1 ({G_{n,p,q}} graph distribution) The {G_{n,p,q}} distribution is a distribution on graphs of {n} vertices where {V} is partitioned into two 2 subsets of equal size: {V = V_1 \sqcup V_2}. Then for {\{i,j\}} pair of vertices in the same subset, {\Pr( (i,j) \in E) = p} and otherwise {\Pr( (i,j) \in E) = q}.

We will only consider the regime under which {p > q}. If we want to find the partition {V = V_1 \sqcup V_2}, it is intuitive to look at the problem of finding the minimum balanced cut. The cut {(V_1, V_2)} has expected size {q n^2 / 4} and any other cut will have greater expected size.

Our intuition should be that as {p \rightarrow q}, the problem only gets harder. And for fixed ratio {p/q}, as {p,q \rightarrow 1}, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for {p,q} then we can also solve it for {cp, cq} where {c > 1}, by keeping only {1/c} edges and reducing to the case we can solve.

Recall that for the {k}-planted clique problem, we found the eigenvector {{\bf x}} corresponding to the largest eigenvalue of {A - \frac{1}{2} J}. We then defined {S} as the vertices {i} with the {k} largest values of {|x_i|} and cleaned up {S} a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of {A - \left( \frac{p + q}{2} \right) J}. Note this is intuitive as the average degree of the graph is {p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}. The idea is simple: Solve {{\bf x}} the largest eigenvector corresponding to the largest eigenvalue and define

\displaystyle  V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)

Continue reading

Beyond Worst-Case Analysis: Lecture 8

Scribed by Luowen Qian

In which we use spectral techniques to find certificates of unsatisfiability for random {k}-SAT formulas.

1. Introduction

Given a random {k}-SAT formula with {m} clauses and {n} variables, we want to find a certificate of unsatisfiability of such formula within polynomial time. Here we consider {k} as fixed, usually equal to 3 or 4. For fixed {n}, the more clauses you have, the more constraints you have, so it becomes easier to show that these constraints are inconsistent. For example, for 3-SAT,

  1. In the previous lecture, we have shown that if {m > c_3 \cdot n} for some large constant {c_3}, almost surely the formula is not satisfiable. But it’s conjectured that there is no polynomial time, or even subexponential time algorithms that can find the certificate of unsatisfiability for {m = O(n)}.
  2. If {m > c \cdot n^2} for some other constant {c}, we’ve shown in the last time that we can find a certificate within polynomial time with high probability that the formula is not satisfiable.

    The algorithm for finding such certificate is shown below.

    • Algorithm 3SAT-refute({f})
    • for {b_1 \in \{0,1\}}
      • if 2SAT-satisfiable({f} restricted to clauses that contains {x_1= \overline b_1}, with {x:= \overline b_1})
        • return {\bot}
    • return UNSATISFIABLE

    We know that we can solve 2-SATs in linear time, and approximately

    \displaystyle \frac{\binom{n - 1} 2 \cdot m}{\binom n 3 \cdot 2} = \frac{3m}{2n + O(1)} > \frac 3 2 cn - O(1)

    clauses contains {x_1 = \overline{b_1}}. Similarly when {c} is sufficiently large, the 2-SATs will almost surely be unsatisfiable. When a subset of the clauses is not satisfiable, the whole 3-SAT formula is not satisfiable. Therefore we can certify unsatisfiability for 3-SATs with high probability.

In general for {k}-SAT,

  1. If {m > c_k \cdot n} for some large constant {c_k}, almost surely the formula is not satisfiable.
  2. If {m > c'_k \cdot n^{k - 1}} for some other constant {c'_k}, we can construct a very similar algorithm, in which we check all assignments to the first {k-2} variables, and see if the 2SAT part of the restricted formula is unsatisfiable.

    Since for every fixed assignments to the first {k - 2} variables, approximately

    \displaystyle \frac{\binom{n - k + 2} 2}{\binom n k 2^{k - 2}} = \frac{k!}{(n^{k - 2} + O(n^{k - 3})) 2^{k - 1}}

    portion of the {m} clauses remains, we expect the constant {c'_k = \Omega\left(\frac{2^k}{k!}\right)} and the running time is {O(2^k m)}.

So what about {m}‘s that are in between? It turns out that we can do better with spectral techniques. And the reason that spectral techniques work better is that unlike the previous method, it does not try all the possible assignments and fails to find a certificate of unsatisfiability.

2. Reduce certifying unsatisfiability for k-SAT to finding largest independent set

2.1. From 3-SAT instances to hypergraphs

Given a random 3-SAT formula {f}, which is an and of {m} random 3-CNF-SAT clauses over {n} variables {x_1, x_2, ..., x_n} (abbreviated as vector {{\bf x}}), i.e.

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3}} and no two {(\sigma_{i,1}, b_{i,1}, \sigma_{i,2}, b_{i,2}, \sigma_{i,3}, b_{i,3})} are exactly the same. Construct hypergraph {H_f = (X, E)}, where

\displaystyle X = \left\{(i, b) \middle| i \in [n], b \in \{0, 1\}\right\}

is a set of {2n} vertices, where each vertex means an assignment to a variable, and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline{b_{j,1}}), (\sigma_{j,2}, \overline{b_{j,2}}), (\sigma_{j,3}, \overline{b_{j,3}})\}

is a set of {m} 3-hyperedges. The reason we’re putting in the negation of {b} is that a 3-CNF clause evaluates to false if and only if all three subclauses evaluate to false. This will be useful shortly after.

First let’s generalize the notion of independent set for hypergraphs.

An independent set for hypergraph {H = (X, E)} is a set {S \subseteq X} that satisfies {\forall e \in E, e \not \subseteq S}.

If {f} is satisfiable, {H_f} has an independent set of size at least {n}. Equivalently if the largest independent set of {H_f} has size less than {n}, {f} is unsatisfiable. Proof: Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i) | i \in [n] \}} is an independent set of size {n}. If not, it means some hyperedge {e_j \subseteq S}, so {\sigma_{j,1} = \overline{b_{j,1}} \land \sigma_{j,2} = \overline{b_{j,2}} \land \sigma_{j,3} = \overline{b_{j,3}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

We know that if we pick a random graph that’s sufficiently dense, i.e. the average degree {d > \ln n}, by spectral techniques we will have a certifiable upper bound on the size of the largest independent set of {O\left(\frac n{\sqrt d}\right)} with high probability. So if a random graph has {\Omega(n \log n)} random edges, we can prove that there’s no large independent set with high probability.

But if we have a random hypergraph with {\Omega(n \log n)} random hyperedges, we don’t have any analog of spectral theories for hypergraphs that allow us to do this kind of certification. And from the fact that the problem of certifying unsatisfiability of random formula of {\Omega(n \log n)} clauses is considered to be hard, we conjecture that there doesn’t exist a spectral theory for hypergraphs able to replicate some of the things we are able to do on graphs.

However, what we can do is possibly with some loss, to reduce the hypergraph to a graph, where we can apply spectral techniques.

2.2. From 4-SAT instances to graphs

Now let’s look at random 4-SATs. Similarly we will write a random 4-SAT formula {f} as:

\displaystyle  f({\bf x}) = \bigwedge\limits_{i = 1}^m \left( x_{\sigma_{i,1}} = b_{i,1} \lor x_{\sigma_{i,2}} = b_{i,2} \lor x_{\sigma_{i,3}} = b_{i,3} \lor x_{\sigma_{i,4}} = b_{i,4} \right),

where {\sigma_{i,j} \in [n], b_{i,j} \in \{0, 1\}}, {\forall i \in [m], \sigma_{i,1} < \sigma_{i,2} < \sigma_{i,3} < \sigma_{i,4}} and no two {(\sigma_{i,1}, b_{i,1}, ..., \sigma_{i,4}, b_{i,4})} are exactly the same. Similar to the previous construction, but instead of constructing another hypergraph, we will construct just a graph {G_f = (V, E)}, where

\displaystyle V = \left\{(i_1, b_1, i_2, b_2) \middle| i_1, i_2 \in [n], b_1, b_2 \in \{0, 1\}\right\}

is a set of {4n^2} vertices and

\displaystyle E = \left\{ e_j \middle| j \in [m] \right\}, e_j = \{(\sigma_{j,1}, \overline {b_{j,1}}, \sigma_{j,2}, \overline {b_{j,2}}), (\sigma_{j,3}, \overline {b_{j,3}}, \sigma_{j,4}, \overline {b_{j,4}})\}

is a set of {m} edges.

If {f} is satisfiable, {G_f} has an independent set of size at least {n^2}. Equivalently if the largest independent set of {H_f} has size less than {n^2}, {f} is unsatisfiable. Proof: The proof is very similar to the previous one. Assume {f} is satisfiable, let {{\bf x} \leftarrow {\bf y}} be a satisfiable assignment, where {{\bf y} \in \{0, 1\}^n}. Then {S = \{ (x_i, y_i, x_j, y_j) | i, j \in [n] \}} is an independent set of size {n^2}. If not, it means some edge {e_j \subseteq S}, so {\sigma_{j,1} = \overline {b_{j,1}} \land \sigma_{j,2} = \overline {b_{j,2}} \land \sigma_{j,3} = \overline {b_{j,3}} \land \sigma_{j,4} = \overline {b_{j,4}}} and the {j}-th clause in {f} evaluates to false. Therefore {f} evaluates to false, which contradicts the fact that {{\bf y}} is a satisfiable assignment. \Box

From here, we can observe that {G_f} is not a random graph because some edges are forbidden, for example when the two vertices of the edge has some element in common. But it’s very close to a random graph. In fact, we can apply the same spectral techniques to get a certifiable upper bound on the size of the largest independent set if the average degree {d > \ln n}, i.e. if {m = \Omega(n^2 \log n)}, we can certify unsatisfiability with high probability, by upper bounding the size of the largest independent set in the constructed graph.

We can generalize this results for all even {k}‘s. For random {k}-SAT where {k} is even, if {m > c_k n^{k/2} \log n}, we can certify unsatisfiability with high probability, which is better than the previous method which requires {m = \Omega(n^{k - 1})}. The same {n^{k/2}(\log n)^{O(1)}} is achievable for odd {k}, but the argument is significantly more complicated.

2.3. Certifiable upper bound for independent sets in modified random sparse graphs

Despite odd {k}‘s, another question is that in this setup, can we do better and get rid of the {\log n} term? This term is coming from the fact that spectral norm break down when the average degree {d < \ln n}. However it’s still true that random graph doesn’t have any large independent sets even when the average degree {d} is constant. It’s just that the spectral norm isn’t giving us good bounds any more, since the spectral norm is at most {O\left(\sqrt{\max d}\right) = O\left(\sqrt \frac{\log n}{\log \log n}\right)}. So is there something tighter than spectral bounds that could help us get rid of the {\log n} term? Could we fix this by removing all the high degree vertices in the random graph?

This construction is due to Feige-Ofek. Given random graph {G \sim G_{n, p}}, where the average degree {d = np} is some large constant. Construct {G'} by taking {G} and removing all edges incident on nodes with degree higher than {2\bar d} where {\bar d} is the average degree of {G}. We denote {A} for the adjacency matrix of {G} and {A'} for that of {G'}. And it turns out,

With high probability, {\left\lVert A' - \frac d n J \right\rVert \le O\left(\sqrt d\right)}.

It turns out to be rather difficult to prove. Previously we saw spectral results on random graphs that uses matrix traces to bound the largest eigenvalue. In this case, it’s hard to do so because the contribution to the trace of a closed walk is complicated by the fact that edges have dependencies. The other approach is that given random matrix {M}, we will try to upper bound {\left\lVert M \right\rVert = \max\limits_x \frac {x^T M x} {\lVert x \rVert^2}}. A standard way for this is to that for every solution, count the instances of {M} in which the fixed solution is good, and argue that the number of the fixed solutions is small, which tells us that there’s no good solution. The problem here is that the set of solutions is infinitely large. So Feige-Ofek discretize the set of vectors, and then reduce the bound on the quadratic form of a discretized vector to a sum of several terms, each of which has to be carefully bounded.

We always have

\displaystyle  \max \textrm{IndSet}(G) \le \max \textrm{IndSet}(G') \le \frac n d \left\lVert A' - \frac d n J \right\rVert

and so, with high probability, we get an {O\left(\frac n {\sqrt d}\right)} polynomial time upper bound certificate to the size of the independent set for a {G_{n,d/n}} random graph. This removes the extra {\log n} term from our analysis of certificates of unsatisfiability for random {k}-SAT when {k} is even.

3. SDP relaxation of independent sets in random sparse graphs

In order to show a random graph has no large independent sets, a more principled way is to argue that there is some polynomial time solvable relaxation of the problem whose solution is an upper bound of the problem.

Let SDPIndSet{(G)} be the optimum of the following semidefinite programming relaxation of the Independent Set problem, which is due to Lovász:

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \langle {\bf x}_0, {\bf x}_i \rangle = ||{\bf x}_i ||^2 \ \ \ \forall i\in V\\ && \langle {\bf x}_i, {\bf x}_j \rangle = 0 \ \ \ \forall (i,j)\in E \end{array}

Since it’s the relaxation of the problem of finding the maximum independent set, {\max \textrm{IndSet}(G) \le \textrm{SDPIndSet}(G)} for any graph {G}. And this relaxation has a nice property.

For every {0 < p < 1}, and for every graph {G}, we have \begin{equation*} {\rm SDPIndSet}(G) \leq \frac 1p \cdot || pJ – A || \end{equation*} where {J} is the all-one matrix and {A} is the adjacency matrix of {G}.

Proof: First we note that SDPIndSet{(G)} is at most

\displaystyle  \begin{array}{rcl}  \max && \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

and this is equal to

\displaystyle  \begin{array}{rcl}  \max && \frac { \left( \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle \right) ^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2}{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}


\displaystyle  \sum_{i\in V} \langle {\bf x}_i, {\bf x}_0 \rangle = \left\langle \sum_{i\in V} {\bf x}_i , {\bf x}_0\right \rangle \leq \left \| \sum_{i\in V} {\bf x}_i \right \| \cdot || {\bf x}_0 || = \left \| \sum_{i\in V} {\bf x}_i \right \|

Finally, the above optimization is equivalent to the following

\displaystyle  \begin{array}{rcl}  \max && \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2}\\ s.t. \\ && ||{\bf x}_0||^2 = 1\\ && \sum_{i\in V} \langle {\bf x}_0, {\bf x}_i \rangle = \sum_{i\in V} ||{\bf x}_i ||^2 \\ && \sum_{(i,j)\in E} \langle {\bf x}_i, {\bf x}_j \rangle = 0 \end{array}

which is at most the unconstrained problem

\displaystyle \begin{aligned} \max \frac { \left \| \sum_{i\in V}{\bf x}_i \right \|^2 - \frac 1p \sum_{i,j} A_{i,j} \langle {\bf x}_i , {\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} &= \max \frac { \sum_{i,j} \left( J - \frac 1p A\right)_{i,j} \langle {\bf x}_i,{\bf x}_j \rangle }{\sum_{i \in V} || {\bf x}_i||^2} \\ &= \lambda_{\max} \left (J - \frac 1p A \right) \\ &\leq \frac 1p || pJ - A||. \end{aligned}


Recall from the previous section that we constructed {G'} by removing edges from {G}, which corresponds to removing constraints in our semidefinite programming problem, so {\textrm{SDPIndSet}(G) \le \textrm{SDPIndSet}(G') \le \left\lVert J - \frac 1 p A' \right\rVert}, which is by theorem 3 at most {O\left(\frac n{\sqrt d}\right)} with high probability.

4. SDP relaxation of random k-SAT

From the previous section, we get an idea that we can use semidefinite programming to relax the problem directly and find a certificate of unsatisfiability for the relaxed problem.

Given a random {k}-SAT formula {f}:

\displaystyle  \begin{array}{rcl}  f({\bf x}) &= & \bigwedge\limits_{i = 1}^m \bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\overline{\bigvee\limits_{j = 1}^k x_{\sigma_{i,j}} = b_{i,j}}} \\ &= &\bigwedge\limits_{i = 1}^m \overline{\bigwedge\limits_{j = 1}^k x_{\sigma_{i,j}} = \overline{b_{i,j}}}. \end{array}

The satisfiability of {f} is equivalent of the satisfiability of the following equations:

\displaystyle  \begin{array}{rcl}  && x_i^2 = x_i \forall i \in [n] \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}x_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \end{array}

Notice that if we expand the polynomial on the left side, there are some of the monomials having degree higher than 2 which prevents us relaxing these equations to a semidefinite programming problem. In order to resolve this, {\forall A \subseteq {\bf x}} and {|A| \le k/2} we introduce {x_A = \prod_{i \in A} x_i}. Then we can relax all variables to be vectors, i.e.

\displaystyle  \begin{array}{rcl}  && \lVert {\bf x}_\emptyset \rVert^2 = 1 \\ && \langle {\bf x}_A, {\bf x}_B \rangle = \langle {\bf x}_C, {\bf x}_D \rangle \ \ \ \forall A \cup B = C \cup D \\ && \sum_{i = 1}^m \left(1 - \prod_{j = 1}^k\left((-1)^{b_{i,j}}{\bf x}_{\sigma_{i,j}} + b_{i,j}\right)\right) = m \ \ \ \textrm{rewritten as quadratic forms of } {\bf x}_A \end{array}

For example, if we have a 4-SAT clause

\displaystyle  x_3 \lor \overline{x_4} \lor x_7 \lor \overline{x_{10}},

we can rewrite it as

\displaystyle  \begin{array}{rcl}  1 - (1 - {\bf x}_3) \cdot {\bf x}_4 \cdot (1 - {\bf x}_7) \cdot {\bf x}_{10} &= &1 - {\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_4{\bf x}_{10} + {\bf x}_3{\bf x}_7{\bf x}_{10} - {\bf x}_3{\bf x}_4{\bf x}_7{\bf x}_{10} \\ &= &1 - {\bf x}_{\{4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,4\}}{\bf x}_{\{10\}} + {\bf x}_{\{3,7\}}{\bf x}_{\{10\}} - {\bf x}_{\{3,4\}}{\bf x}_{\{7,10\}}. \end{array}

For this relaxation, we have:

  1. If {m < c(k, n) n^{k/2}}, the SDP associated with the formula is feasible with high probability, where {c(k, n) = 1/n^{o(1)}} for every fixed {k}.
  2. If {m > c'(k) n^{k/2}}, the SDP associated with the formula is not feasible with high probability, where {c'(k, n)} is a constant for every fixed even {k}, and {c'(k, n) = \textrm{poly}(\log n)} for every fixed odd {k}.

Beyond Worst-Case Analysis: Lecture 7

Scribed by Jeff Xu

In which we discussed planted clique distribution, specifically, we talked about how to find a planted clique in a random graph. We heavily relied upon our material back in lecture 2 and lecture 3 in which we covered the upper bound certificate for max clique in {G_{n,\frac{1}{2}}}. At the end of this class, we wrapped up this topic and started the topic of {k}-SAT.

1. Planted Clique

To start with, we describe a distribution of graphs with a planted clique. Suppose that we sample {G} from {G_{n,\frac{1}{2}}} and we want to modify {G} s.t. it has a size {k} clique, i.e., we have a clique {S \subseteq V} with {\left|S\right|=k}. The following code describes a sampler for the distribution.

  • {G \leftarrow G_{n,\frac{1}{2}}}
  • Pick a subset of vertices {S} from {V} s.t. {|S|=k}
  • Independently for each pair {\{u,v\}}, make {\{u,v\}} an edge with probability
    • {1} if {u,v \in S}
    • {\frac 12} otherwise

Note: We are only interested in the case {k \geq 2\log n}, which is the case in which the planted clique is, with high probability, larger than any pre-existing clique

Continue reading