新年快乐！

]]>So, today I was browsing Facebook, and when I saw a post containing an incredibly blatant arithmetic mistake (which none of the several comments seemed to notice) I spent the rest of the morning looking up where it came from.

The goal of the post was to make the wrong claim that people have been paying more than enough money into social security (through payroll taxes) to support the current level of benefits. Indeed, since the beginning, social security has been paying individuals more than they put in, and now that population and salaries have stop growing, social security is also paying out retired people more than it gets from working people, so that the “trust fund” (whether one believes it is a real thing or an accounting fiction) will run out in the 2030s unless some change is made.

This is a complicated matter, but the post included a sentence to the extent that $4,500 a year, with an interest of 1% per year “compounded monthly”, would add up to $1,3 million after 40 years. This is not even in the right order of magnitude (it adds up to about $220k) and it should be obvious without making the calculation. Who would write such a thing, and why?

My first stop was a July 2012 post on snopes, which commented on a very similar viral email. Snopes points out various mistakes (including the rate of social security payroll taxes), but the calculation in the snopes email, while based on wrong assumptions, has correct arithmetic: it says that $4,500 a year, with a 5% interest, become about $890k after 49 years.

So how did the viral email with the wrong assumptions and correct arithmetic morph into the Facebook post with the same wrong assumptions but also the wrong arithmetic?

I don’t know, but here is an August 2012 post on, you can’t make this stuff up, Accuracy in Media, which wikipedia describes as a “media watchdog.”

The post is attributed to Herbert London, who has PhD from Columbia, is a member of the Council on Foreign Relation and used to be the president of a conservative think-tank. Currently, he has an affiliation with King’s College in New York. London’s post has the sentence I saw in the Facebook post:

(…) an employer’s contribution of $375 per month at a modest one percent rate compounded over a 40 year work experience the total would be $1.3 million.

The rest of the post is almost identical to the July 2012 message reported by Snopes.

Where did Dr. London get his numbers? Maybe he compounded this hypothetical saving as 1% *per month*? No, because that would give more than $4 million. One does get about $1.3 million if one saves $375 a month for *thirty* years with a return of 1% per month, though.

Perhaps a more interesting question is why this “fake math” is coming back after five years. In 2012, Paul Ryan put forward a plan to “privatize” Social Security, and such a plan is now being revived. The only way to sell such a plan is to convince people that if they saved in a private account the amount of payroll taxes that “goes into” Social Security, they would get better benefits. This may be factually wrong, but that’s hardly the point.

]]>A calculation by a Berkeley physics graduate student (source) finds that a student who work as TA for both semesters and the summer, is payed at “step 1” of the UC Berkeley salary scale, and is a California resident, currently pays $2,229 in federal income tax, which would become $3,641 under the proposed tax plan, a 61% increase. The situation for EECS students is a bit different: they are paid at a higher scale, which puts them in a higher bracket, and they are often on a F1 visa, which means that they pay the much-higher non-resident tuition, so they would be a lot worse off (on the other hand, they usually TA at most one semester per year). The same calculation for MIT students shows a 240% tax increase. A different calculation (sorry, no link available) shows a 144% increase for a Berkeley EECS student on a F! visa.

This is one of the tax increases that go to fund the abolition of the estate tax for estates worth more than $10.9 million, a reduction in corporate tax rates, a reduction in high-income tax rates, and other benefits for multi-millionaires.

There is also a vox explainer, and articles in inside higher ed and the chronicle of higher education with more information.

If you are a US Citizen, and if you think that graduate students should not pay for the estate tax of eight-figure estates, you should let you representative know. Usually calling, and asking to speak with the staffer responsible for tax policy, is much better than emailing or sending a physical mail. You can find the phone numbers of your representatives here.

If you have any pull in ACM, this is the kind of matter on which they might want to make a factual statement about the consequences for US computer science education, as they did at the time of the travel ban.

]]>Scribed by Neng Huang

*In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.*

**1. A Brief Review of the Model **

First, let’s briefly review the model. We have a random graph with an unknown partition of the vertices into two equal parts and . Edges across the partition are generated independently with probability , and edges inside the partition are generated independently with probability . To abbreviate the notation, we let , which is the average internal degree, and , which is the average external degree. Intuitively, the closer are and , the more difficult it is to reconstruct the partition. We assume , although there are also similar results in the complementary model where is larger than . We also assume so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

- For every , there exists a constant such that if , then we can reconstruct the partition up to less than misclassified vertices.
- There exists a constant such that if , then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every , there also exists a constant such that if , then it will be impossible to reconstruct the partition even if an fraction of misclassified vertices is allowed. Also, the constant will go to infinity as goes to 0, so if we want more and more accuracy, needs to be a bigger and bigger constant times . When the constant becomes , we will get an exact reconstruction as stated in the second result.

**2. The Algorithm **

Our algorithm will be based on semi-definite programming. Intuitively, the problem of reconstructing the partition is essentially the same as min-bisection problem, which is to find a balanced cut with the fewest edges. This is because the balanced cut with the fewest expected edges is exactly our hidden cut. Unfortunately, the min-bisection problem is -hard, so we will use semi-definite programming. The min-bisection problem can be stated as the following program: \begin{equation*} & {\text{minimize}} & & \sum_{(u, v) \in E} \frac{1}{4}(x_u – x_v)^2

& \text{subject to} & & x_v^2 = 1, \forall v \in V

&&& \sum_{v \in V}x_v = 0. \end{equation*} Its semi-definite programming relaxation will be

Our algorithm will be as follows.

- Solve the semi-definite programming above.
- Let be the optimal solution and such that .
- Find , which is the eigenvector corresponding to the largest eigenvalue of .
- Let , .
- Output as our partition.

Ideally, we want half of the ‘s pointing to one direction, and the other half pointing to the opposite direction. In this ideal case we will have

Then will be a rank-one matrix and , which is the indicator vector of the hidden cut, will be its eigenvector with eigenvalue . The remaining eigenvalues of will be all zeros. So finding the largest eigenvector of will reveal the hidden cut. In reality, if , then our solution will be almost the same as that in the ideal case, so the cut we get will be almost the same as the hidden cut. Furthermore, if , then the unique optimal solution of the SDP will be the combinatorial solution of min-bisection problem, that is, in the vector language, the one-dimensional solution.\footnote{“A miracle”, said Luca.}

**3. Analysis of the Algorithm **

First, we rearrange the SDP to make it slightly simpler. We have the following SDP:

We note that SDP1 and SDP2 have the same optimal solution, because the cost function of SDP1 is

The first term is a constant and the second is the cost function of SDP2 with a factor of -1/4.

Now, consider the cost of SDP2 of where

The expected cost will be

Since each edge is chosen independently, with high probability our cost will be at least , which implies that the optimal solution of SDP2 will be at least . Let be the optimal solution of the SDP, then we have

n(a-b) – O(n) & \leq cost(**x**_1^\ast, \ldots, **x**_n^\ast)\nonumber

& = \sum_{u,v}A_{uv}\langle**x**_u^\ast, **x**_v^\ast\rangle\nonumber

& = \sum_{u,v}\left(A_{uv} – \frac{a+b}{n}\right)\langle**x**_u^\ast, **x**_v^\ast\rangle

In the last equality we used the fact that

When we used the spectral method last week, we said that the largest eigenvalue of is large, where is the average degree. This is because the hidden cut will give us a vector with large Rayleigh quotient. But has a relatively small spectral norm, so everything should come from , which when simplified will be 1 for entries representing vertices on the same side and -1 for entries representing vertices on different sides. We will redo this argument with SDP norm in place of spectral norm and every step appropriately adjusted.

Recall that the SDP norm of a matrix is defined to be

Let , then by Grothendieck inequality we have

We proved in the previous lecture that with high probability, so we know that the SDP norm with high probability as well. By definition, this means

Substracting 3 from 3, we obtain

where is the all-one matrix and . Plugging 5 into 4, we get

which can be simplified to

For simplicity, in the following analysis the term will be called . Notice that is a matrix with 1 for nodes from the same side of the cut and -1 for nodes from different sides of the cut, and is an inner product of two unit vectors. If is very close to zero, then the sum will be very close to . This means that should be 1 for almost every pair of , which shows that is actually very close to . Now, we will make this argument robust. To achieve this, we introduce the Frobenius norm of a matrix.

Definition 1 (Frobenius norm)Let be a matrix. The Frobenius norm of is

The following fact is a good exercise.

Fact 2Let be a matrix. Then

where denotes the spectral norm.

To see how close are and , we calculate the Frobenius norm of , which will be

This gives us a bound on the spectral norm of , namely

Let be the unit eigenvector of corresponding to its largest eigenvalue, then by Davis-Kahan theorem we have\footnote{When we apply Davis-Kahan theorem, what we get is actually an upper bound on . We have assumed here that the bound holds for , but the exact same proof will also work in the other case.}

For any , if is a large enough constant then we will have . Now we have the following standard argument:

The last inequality is because every with will contribute at least 1 in the sum . This shows that our algorithm will misclassify at most vertices.

]]>

I was very saddened to hear that Corrado Böhm died today at age 94.

Böhm was one of the founding fathers of Italian computer science. His dissertation, from 1951, was one of the first (maybe the first? I don’t know the history of these ideas very well) examples of a programming language with a compiler written in the language itself. In the 1950s and 1960s he worked at the CNR (an Italian national research institution with its own technical staff), in the IAC (Institute for the Applications of Computing) directed by mathematician Mauro Picone. IAC was the second place in Italy to acquire a computer. In 1970 he moved to the University of Turin, were he was the founding chairman of the computer science department. In 1972 he moved to the Sapienza University of Rome, in the Math department, and in 1989 he was one of the founders of the Computer Science department at Sapienza. He remained at Sapienza until his retirement.

Böhm became internationally known for a 1966 result, joint with Giuseppe Jacopini, in which he showed, roughly speaking, that programs written in a language that includes goto statements (formalized as flow-charts) could be mapped to equivalent programs that don’t. The point of the paper was that the translation was “structural” and the translated program retained much of the structure and the logic of the original program, meaning that programmers could give up goto statements without having to fundamentally change the way they think.

Dijkstra’s famous “Go To Statement Considered Harmful” 1968 letter to CACM had two references, one of which was the Jacopini-Böhm theorem.

Böhm was responsible for important foundational work on lambda calculus, typed functional languages, and the theory of programming languages at large.

He was a remarkable mentor, many of whose students and collaborators (including a notable number of women) became prominent in the Italian community of theory of programming languages, and Italian academia in general.

In the photo above is Böhm with Simona Ronchi, Betti Venneri and Mariangiola Dezani, who all became prominent Italian professors.

You may also recognize the man on the right as a recent recipient of the Turing Award. Silvio Micali went to Sapienza to study math as an undergrad, and he worked with Böhm, who encouraged Silvio to pursue his PhD abroad.

I studied Computer Science at Sapienza, starting the first year that the major was introduced in 1989. I remember that when I first met Böhm he reminded me of Doc Brown from *Back to the Future*: a tall man with crazy white hair, speaking of wild ideas with incomprehensible technical terms, but with unstoppable enthusiasm.

One year, I tried attending a small elective class that he was teaching. My, probably imprecise, recollection of the first lecture is as follows.

He said that one vertex is a binary tree, and that if you connect two binary trees to a new root you also get a binary tree, then he asked us, how would you prove statements on binary trees by induction? The class stopped until we would say something. After some consultation among us, one of the smart kids proposed “by induction on the number of vertices?” Yes, said Böhm, that would work, but isn’t there a better way? He wanted us to come up by ourselves with the insight that, since binary trees have a recursive definition, one can do induction on the structure of the definition.

In subsequent lectures, we looked (without being told) at how to construct purely functional data structures. I dropped the class after about a month.

(Photo credits: corradobohm.it)

]]>Scribed by David Dinh

*In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.*

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the *infinity-to-one* norm, and find an efficient way to approximate it using SDP.

**1. Fun with Norms **

In the past few lectures, we’ve been heavily relying on the spectral norm,

which is efficiently computable and turns out to be pretty handy in a lot of cases.

Unfortunately, high-degree vertices have a disproportionately large influence on the spectral norm’s value, limiting its usefulness in graphs where such outliers exist. Often (as we did in lecture 9), people will try to modify the input so that there are no vertices of overly high degree or add some regularizing term to ameliorate this issue. Unfortunately, this can lead to less useful results – in lecture 9, for instance, we derived a bound that required that all high-degree vertices be excised from the input graph first.

In this lecture, we’ll attack this problem in a different way by introducing a different norm, the *infinity-to-one* norm, defined as follows:

It can be shown that

So spectral norm always gives us a bound for the infinity-to-one norm. However, the infinity-to-one norm can come in even more handy than the spectral norm (if we can actually calculate it):

Theorem 1Let be a symmetric matrix with entries in . Pick a random graph such that is in w.p. . Then whp:

For context, recall that we proved a similar bound in Lecture 9 for the spectral norm, but we required that all nodes with degree greater than twice the average degree be removed first. The zero-to-one norm allows us to bypass that restriction entirely.

*Proof:* Fix . Examine the expression

We want to make sure that this probability exponentially decreases w.r.t. .

Recall that Bernstein’s inequality states that given independent random variables, absolutely bounded by , with expectation , we have .

is either or and therefore is bounded by . Since , we can have the from Bernstein’s inequality take on value . Combining this with the fact that

Bernstein’s inequality gives us

provided that , as desired.

So this norm allows us to easily sidestep the issue of high-degree vertices. The problem is that it’s NP-hard to compute.

**2. Grothendieck’s Inequality **

However, it turns out that we can *approximate* the infinity-to-one norm to within a constant factor using SDP:

Theorem 2(Grothendieck’s Inequality) There exists some (turns out to be around , but we won’t worry about its exact value here) such that for all ,

So instead of dealing the combinatorially huge problem of optimizing over all vectors, we can just solve an SDP instead to get a good approximation. For convenience, let’s denote the quantity on the left of the above expression as .

Let’s start with a warmup lemma. The proof techniques in this lemma – specifically, the trick of replacing a vector from a continuous distribution with a *random* vector from some discrete distribution, and then taking the expectation to relate the two quantities, will come in handy later on.

Lemma 3. In other words, maximizing over the discrete space of random vectors and maximizing over the continuous space of vectors in that range gives the same result.

*Proof:* It is obvious that the expression on the left is at least the right (since it’s just a relaxation). In order to show that the right is at least the left: given some continuous vectors , we can find discrete , vectors such that their expectations are equal to . We can do that by having take on value w.p. and w.p. , likewise for .

That means that:

which gives us the desired result.

Fact 4is a norm.

*Proof:* **Multiplicative scaling: **obvious.

**Nonnegativity (except iff ):** It is obvious that the SDP norm is zero if is zero.

Now suppose is nonzero.

Notice that we can replace the constraints in the SDP norm requiring that , with , . Why? We’ll use the same trick as we did in the proof of Lemma 3:

Obviously maximizing over , will give us at least as good a result as maximizing over , , since it’s a relaxation, so it suffices to show that if we can obtain some value for using , , we can do at least as well using , .

Let’s suppose we have some vectors with length at most . Now let’s replace them with *random *vectors* * of length exactly whose expectation are respectively (just scale the up, and have be either the scaled value or its negative with probability calibrated appropriately so their expectations work out to be ). Then we can just say:

So must take on some values (of length exactly ) that make , as desired.

Since we assumed is nonzero, let , be such that . If we set to be some arbitrary vector of unit length – using a plus if is positive and a minus if it’s negative – and all other to zero (which we can do without affecting the value of the max by the fact we just proved), we can immediately see that

giving a positive lower bound for the maximizer, as desired.

**Triangle inequality:** Just look at the and that maximize for and , and observe that

since we can always match the quantity on the left hand side by choosing the same and for both terms on the right-hand-side.

** 2.1. Proof of Grothendieck’s inequality **

Now let’s prove Grothendieck’s inequality:

*Proof:* Observe that, by Lemma 3, maximizing over the choice of in the zero-to-one norm is equivalent to maximizing over the choice of , so we can rewrite our proof goal as

Let be of length and be the optimal choices used in , i.e. the maximizers of . It suffices to prove that there exist vectors (for some fixed constant , since we can just scale the result by changing ) to plug into the right-hand side of the above expression such that .

Pick , with each coordinate being drawn from the normal distribution with mean and variance , and let and . Then

But is just the identity, since the diagonal elements are just the expectation of the square of the Gaussian (which is , its variance) and the off-diagonals are the expectation of the product of two independent Gaussians, which is zero (since the expectation of each individual Gaussian is zero).

So

At this point we’ve got something that looks a lot like what we want: if the expectation of is equal to , then maximizing over them is definitely going to give us a quantity greater than or equal to . Unfortunately, there’s an issue here: since Gaussian random variables are unbounded, and are unbounded; on the other hand, what we’re trying to do is maximize over vectors whose elements are bounded by .

Our approach will be to “clip” the and to a finite range, and then bound the error introduced by the clipping process. Formally, fix a constant and pick a Gaussian random vector . Now define a ‘clipped’ inner product:

and likewise for . For convenience, let’s define the *truncation error* as follows, to represent how far the clipped value differs from the actual value.

So we can rewrite as:

and similarly we can define:

So now we have

Clearly is just the SDP norm , since and were the ‘original’ and .

What we will show now is that the remaining three terms are bounded by constant factors of SDP norm, so the entire sum of all four terms getting a constant factor approximation of it. The analysis is the same for all three terms so, for brevity, we’ll just look at the last one:

For convenience, let’s define as:

Now, it’s convenient to think of the and as vectors of infinite dimension indexed by input (we’ll bring the dimensionality down to a finite value at the end of the proof). Let’s define the following inner product and norm in this space:

Now, since the Gaussian distribution is rotation-independent (we can just rotate a Gaussian random variable around without changing its distribution), the squared norms of the and the are all the same (since all the , have length , and dotting with them can be thought of a rotation). That means that the above norm takes on the same value for all , , so all we need to do is figure out a constant bound on it.

Fortunately, this value is pretty easy to bound. Notice that the function is zero if the dot product’s absolute value is smaller than . If we have ,

for sufficiently large , which is a constant.

So if the squared norms are bounded above by , which means we can substitute this and the norm bound into the above expression to get:

Now, armed with this, we can conclude (with similar results for the other two error terms)

which tells us that , is within a constant bound of the SDP norm as desired.

There’s only one slightly fishy bit in the proof we used above, though, and that’s the treatment of functions of infinite-dimensional vectors indexed by a Gaussian vector. Let’s conclude by constructing a (finite-dimensional) solution to the SDP from the functions:

**Claim:** If , then .

**Proof: **

** **as desired.

So the matrix comprises a nice finite-dimensional solution to the SDP, and we’re done with the proof.

Also, noticing that we have four terms in the expansion of , each one of which is a feasible value for the SDP, we can figure out how much we deviate from the actual optimum, i.e. . Since each term can’t exceed the value of the SDP optimum at all, is separated from by a factor of four, giving us a bound on the constant in the inequality.

To recap, notice that there were two key “tricks” here:

1) Assuming that we were rounding an optimal solution to our SDP (i.e. starting with as optimizers). We don’t get any bounds otherwise!

2) Treating the rounding error itself as a feasible solution of the SDP.

This proof was communicated to us by James Lee.

]]>Scribed by Chinmay Nirkhe

*In which we explore the Stochastic Block Model.*

**1. The problem **

The *Stochastic Block Model* is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the problem.

Definition 1 ( graph distribution)The distribution is a distribution on graphs of vertices where is partitioned into two 2 subsets of equal size: . Then for pair of vertices in the same subset, and otherwise .

We will only consider the regime under which . If we want to find the partition , it is intuitive to look at the problem of finding the minimum balanced cut. The cut has expected size and any other cut will have greater expected size.

Our intuition should be that as , the problem only gets harder. And for fixed ratio , as , the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for then we can also solve it for where , by keeping only edges and reducing to the case we can solve.

Recall that for the -planted clique problem, we found the eigenvector corresponding to the largest eigenvalue of . We then defined as the vertices with the largest values of and cleaned up a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of . Note this is intuitive as the average degree of the graph is . The idea is simple: Solve the largest eigenvector corresponding to the largest eigenvalue and define

As we proceed to the analysis of this procedure, we fix . Prior to fixing, the adjacency matrix was .\footnote{The diagonal should be zeroes, but this is close enough.} Upon fixing , the average adjacency matrix looks different. For ease of notation, if we write a bold constant for a matrix, we mean the matrix . It will be clear from context.

Here we have broken up into blocks according to the partition .

Theorem 2If then with high probability, .

*Proof:* Define the graph as the union of a graph on and graph on . Define the graph a a graph. Note that the graph is distributed according to picking a and graph and adding the partition crossing edges of to . Let and be the respective adjacency matrices and define the follow submatrices:

Then the adjacency matrix is defined by

Similarly, we can generate a decomposition for :

Then using triangle inequality we can bound by bounding the difference in the various terms.

The last line follows as the submatrices are adjacency matrices of graphs and we can apply the results we proved in that regime for .

But the difficulty is that we don’t know as . If we knew , then we would know the partition. What we can compute is .\footnote{The rest of this proof actually doesn’t even rely on knowing or . We can estimate by calculating the average vertex degree.} We can rewrite as

Call the matrix on the right . It is clearly rank-one as it has decomposition where . Therefore

Then is close (in operator norm) to the rank 1 matrix . Then their largest eigenvalues are close. But since has only one non-zero eigenvalue , finding the corresponding eigenvector to the largest eigenvalue of will be close to the ideal partition as describes the ideal partition. This can be formalized with the Davis-Kaham Theorem.

Theorem 3 (Davis-Kahan)Given matrices with where has eigenvalues and corresponding eigenvectors and has eigenvalues and corresponding eigenvectors , thenEquivalently,

The Davis Kahan Theorem with , and states that

where , the eigenvector associated with the largest eigenvalue of and , the expected degrees of the two parts of the graph. Choose between for the one closer to . Then

Recall that . If and disagree in sign, then this contributes at least to the value of . Equivalently, is at least the number of misclassified vertices. It is simple to see from here that if then we can bound the number of misclassified vertices by . This completes the proof that the proposed algorithm does well in calculating the partition of the Stochastic Block Model.

]]>Scribed by Luowen Qian

*In which we use spectral techniques to find certificates of unsatisfiability for random -SAT formulas.*

**1. Introduction **

Given a random -SAT formula with clauses and variables, we want to find a certificate of unsatisfiability of such formula within polynomial time. Here we consider as fixed, usually equal to 3 or 4. For fixed , the more clauses you have, the more constraints you have, so it becomes easier to show that these constraints are inconsistent. For example, for 3-SAT,

- In the previous lecture, we have shown that if for some large constant , almost surely the formula is not satisfiable. But it’s conjectured that there is no polynomial time, or even subexponential time algorithms that can find the certificate of unsatisfiability for .
- If for some other constant , we’ve shown in the last time that we can find a certificate within polynomial time with high probability that the formula is not satisfiable.
The algorithm for finding such certificate is shown below.

- Algorithm 3SAT-refute()
- for
- if 2SAT-satisfiable( restricted to clauses that contains , with )
- return

- if 2SAT-satisfiable( restricted to clauses that contains , with )
- return UNSATISFIABLE

We know that we can solve 2-SATs in linear time, and approximately

clauses contains . Similarly when is sufficiently large, the 2-SATs will almost surely be unsatisfiable. When a subset of the clauses is not satisfiable, the whole 3-SAT formula is not satisfiable. Therefore we can certify unsatisfiability for 3-SATs with high probability.

In general for -SAT,

- If for some large constant , almost surely the formula is not satisfiable.
- If for some other constant , we can construct a very similar algorithm, in which we check all assignments to the first variables, and see if the 2SAT part of the restricted formula is unsatisfiable.
Since for every fixed assignments to the first variables, approximately

portion of the clauses remains, we expect the constant and the running time is .

So what about ‘s that are in between? It turns out that we can do better with spectral techniques. And the reason that spectral techniques work better is that unlike the previous method, it does not try all the possible assignments and fails to find a certificate of unsatisfiability.

**2. Reduce certifying unsatisfiability for k-SAT to finding largest independent set **

** 2.1. From 3-SAT instances to hypergraphs **

Given a random 3-SAT formula , which is an and of random 3-CNF-SAT clauses over variables (abbreviated as vector ), i.e.

where , and no two are exactly the same. Construct hypergraph , where

is a set of vertices, where each vertex means an assignment to a variable, and

is a set of 3-hyperedges. The reason we’re putting in the negation of is that a 3-CNF clause evaluates to false if and only if all three subclauses evaluate to false. This will be useful shortly after.

First let’s generalize the notion of independent set for hypergraphs.

An independent set for hypergraph is a set that satisfies .

If is satisfiable, has an independent set of size at least . Equivalently if the largest independent set of has size less than , is unsatisfiable. *Proof:* Assume is satisfiable, let be a satisfiable assignment, where . Then is an independent set of size . If not, it means some hyperedge , so and the -th clause in evaluates to false. Therefore evaluates to false, which contradicts the fact that is a satisfiable assignment.

We know that if we pick a random graph that’s sufficiently dense, i.e. the average degree , by spectral techniques we will have a certifiable upper bound on the size of the largest independent set of with high probability. So if a random graph has random edges, we can prove that there’s no large independent set with high probability.

But if we have a random hypergraph with random hyperedges, we don’t have any analog of spectral theories for hypergraphs that allow us to do this kind of certification. And from the fact that the problem of certifying unsatisfiability of random formula of clauses is considered to be hard, we conjecture that there doesn’t exist a spectral theory for hypergraphs able to replicate some of the things we are able to do on graphs.

However, what we can do is possibly with some loss, to reduce the hypergraph to a graph, where we can apply spectral techniques.

** 2.2. From 4-SAT instances to graphs **

Now let’s look at random 4-SATs. Similarly we will write a random 4-SAT formula as:

where , and no two are exactly the same. Similar to the previous construction, but instead of constructing another hypergraph, we will construct just a graph , where

is a set of vertices and

is a set of edges.

If is satisfiable, has an independent set of size at least . Equivalently if the largest independent set of has size less than , is unsatisfiable. *Proof:* The proof is very similar to the previous one. Assume is satisfiable, let be a satisfiable assignment, where . Then is an independent set of size . If not, it means some edge , so and the -th clause in evaluates to false. Therefore evaluates to false, which contradicts the fact that is a satisfiable assignment.

From here, we can observe that is not a random graph because some edges are forbidden, for example when the two vertices of the edge has some element in common. But it’s very close to a random graph. In fact, we can apply the same spectral techniques to get a certifiable upper bound on the size of the largest independent set if the average degree , i.e. if , we can certify unsatisfiability with high probability, by upper bounding the size of the largest independent set in the constructed graph.

We can generalize this results for all even ‘s. For random -SAT where is even, if , we can certify unsatisfiability with high probability, which is better than the previous method which requires . The same is achievable for odd , but the argument is significantly more complicated.

** 2.3. Certifiable upper bound for independent sets in modified random sparse graphs **

Despite odd ‘s, another question is that in this setup, can we do better and get rid of the term? This term is coming from the fact that spectral norm break down when the average degree . However it’s still true that random graph doesn’t have any large independent sets even when the average degree is constant. It’s just that the spectral norm isn’t giving us good bounds any more, since the spectral norm is at most . So is there something tighter than spectral bounds that could help us get rid of the term? Could we fix this by removing all the high degree vertices in the random graph?

This construction is due to Feige-Ofek. Given random graph , where the average degree is some large constant. Construct by taking and removing all edges incident on nodes with degree higher than where is the average degree of . We denote for the adjacency matrix of and for that of . And it turns out,

With high probability, .

It turns out to be rather difficult to prove. Previously we saw spectral results on random graphs that uses matrix traces to bound the largest eigenvalue. In this case, it’s hard to do so because the contribution to the trace of a closed walk is complicated by the fact that edges have dependencies. The other approach is that given random matrix , we will try to upper bound . A standard way for this is to that for every solution, count the instances of in which the fixed solution is good, and argue that the number of the fixed solutions is small, which tells us that there’s no good solution. The problem here is that the set of solutions is infinitely large. So Feige-Ofek discretize the set of vectors, and then reduce the bound on the quadratic form of a discretized vector to a sum of several terms, each of which has to be carefully bounded.

We always have

and so, with high probability, we get an polynomial time upper bound certificate to the size of the independent set for a random graph. This removes the extra term from our analysis of certificates of unsatisfiability for random -SAT when is even.

**3. SDP relaxation of independent sets in random sparse graphs **

In order to show a random graph has no large independent sets, a more principled way is to argue that there is some polynomial time solvable relaxation of the problem whose solution is an upper bound of the problem.

Let SDPIndSet be the optimum of the following semidefinite programming relaxation of the Independent Set problem, which is due to Lovász:

Since it’s the relaxation of the problem of finding the maximum independent set, for any graph . And this relaxation has a nice property.

For every , and for every graph , we have \begin{equation*} {\rm SDPIndSet}(G) \leq \frac 1p \cdot || pJ – A || \end{equation*} where is the all-one matrix and is the adjacency matrix of .

*Proof:* First we note that SDPIndSet is at most

and this is equal to

which is at most

because

Finally, the above optimization is equivalent to the following

which is at most the unconstrained problem

Recall from the previous section that we constructed by removing edges from , which corresponds to removing constraints in our semidefinite programming problem, so , which is by theorem 3 at most with high probability.

**4. SDP relaxation of random k-SAT **

From the previous section, we get an idea that we can use semidefinite programming to relax the problem directly and find a certificate of unsatisfiability for the relaxed problem.

Given a random -SAT formula :

The satisfiability of is equivalent of the satisfiability of the following equations:

Notice that if we expand the polynomial on the left side, there are some of the monomials having degree higher than 2 which prevents us relaxing these equations to a semidefinite programming problem. In order to resolve this, and we introduce . Then we can relax all variables to be vectors, i.e.

For example, if we have a 4-SAT clause

we can rewrite it as

For this relaxation, we have:

- If , the SDP associated with the formula is feasible with high probability, where for every fixed .
- If , the SDP associated with the formula is not feasible with high probability, where is a constant for every fixed even , and for every fixed odd .

]]>

Scribed by Jeff Xu

*In which we discussed planted clique distribution, specifically, we talked about how to find a planted clique in a random graph. We heavily relied upon our material back in lecture 2 and lecture 3 in which we covered the upper bound certificate for max clique in . At the end of this class, we wrapped up this topic and started the topic of -SAT.*

**1. Planted Clique **

To start with, we describe a distribution of graphs with a planted clique. Suppose that we sample from and we want to modify s.t. it has a size clique, i.e., we have a clique with . The following code describes a sampler for the distribution.

- Pick a subset of vertices from s.t.
- Independently for each pair , make an edge with probability
- if
- otherwise

Note: We are only interested in the case , which is the case in which the planted clique is, with high probability, larger than any pre-existing clique

** 1.1. Finding the planted clique when **

When , finding the planted clique is easy because the vertices in the planted clique are precisely the vertices of higher degree.

Lemma 1In , w.h.p., for every vertex , deg() .

*Proof:* For each vertex in a graph , we have = sum of random bits, which is simply a Binomial distribution. By Chernoff bound,

For this probability to be upper bounded, say by , we can fix s.t. and this completes the proof that with high probability, a vertex in random graph has degree .

Now we consider a vertex in the planted clique .

Claim 1In a graph with a planted clique coming from a random graph to which we add all the edges necessary to make a clique, each node in will receive added edges w.h.p. over the sampling of graph.

*Proof:* Again, we regard the number of neighbors of a vertex in as a sum of Bernoulli distribution and denote it by . By Chernoff bound, we obtain an upper bound on the probability that a vertex has more than neighbors in in a random graph.

Since the probability is exponentially small in , we can conclude that a node in , with high probability, has less than edges in the original random graph, and thus at least edges will be added to each vertex in .

Corollary 2In a graph with a planted clique, a vertex in will have degree . (Note: we have in this example)

Therefore, we show that in graph with a large planted clique, we can distinguish it from distribution by the existence of node with large degree, i.e., degree over .

** 1.2. Distinguish Planted Clique Distribution with **

Moving on to the case in which is of the order of , we first show how to distinguish graphs sampled from the planted clique distribution from random graphs.

Say that , and let be the adjacency matrix of a graph from the planted clique distribution. Then

Now, recall he following theorem from Lecture 2.

Theorem 3If is the adjacency matrix of a graph, w.h.p., .

Therefore, we can identify graph with a planted clique from a random distribution using this method.

** 1.3. Uniqueness of Maximum Clique in Planted Clique Distribution **

In order to show that we can find the planted clique from a graph, we want to prove first that the maximum clique in planted clique Distribution is unique. In other words, we want to prove that the planted clique is the maximum clique in planted clique distribution. We first prove the following lemmas:

Lemma 4For each vertex not in the planted clique, i.e., , of ‘s neighbors in .

*Proof:* This is largely similar to Lemma 1. We see this as a sum of random bits and by Chernoff bound, we have:

For this probability to be upper bounded, say by , we can choose a s.t. . Therefore, we pick and this completes the proof that, with high probability, each vertex not in the planted clique has less no more than neighbors in the planted clique.

Lemma 5sampled from , w.h.p., has a largest clique of size . (proved in lecture 2)

Claim 2Under the above assumptions, is the unique clique of size in .

*Proof:* Suppose, for the sake of contradiction, we find a clique s.t. . Since are both cliques by assumption, is also clique. has a largest clique of size w.h.p., so since it is a clique in . Consider a vertex : the number of ‘s neighbors in is at least , but this contradicts Lemma 4, which states that should have no more than neighbors in .

** 1.4. Finding the Planted Clique **

Now that we have shown the uniqueness of maximum clique, we want to proceed and show that we can find the planted clique. Let be the adjacency matrix of a random graph with a planted clique of size , and let be the maximizer of

We will show below that is close to the indicator vector of . First, we need to note that we are no longer using the sampling method described earlier to attain a planted clique distribution. Alternatively, we sample our graph from random distribution, pick a subset of vertices from and add to it the necessary edges to make a clique. From this point of view, we have , where is the graph with a planted clique and is a distribution of edges that we need to add. We can then represent the adjacency matrix of as:

where and .

By the theorem shown in lecture 2 (which we just recapped above), we have the following equations with high probability:

Now we combine the equations listed above, and wlog, let .

With from above, we have

Therefore, . With shown above, we can conclude that

That is,

and, up to passing to , which has the same set of largest entries in absolute value, we have , for sufficiently large . This means that, up to scaling, and are nearly identical.

Let be the set of largest entries of , and hence of , breaking ties arbitrarily, and let be the threshold value for membership in (that is, for all and for all ). Suppose that there are elements of that are not in , and hence elements not in that are in . Then

And we conclude that , that is, contains at least of the elements of .

We now present an algorithm to find the planted clique. Let be the set of vertices that has largest . We then consider each vertex . If , by our proof above, it should have neighbors in L. If , should have neighbors in . Therefore, we can easily verify a vertex is in by looking at its number of neighbors in , and this gives us an algorithm.

- Algorithm():
- adjacency matrix of
- eigenvector of largest eigenvalue to
- set of vertices with largest
- clique set of vertices with at least neighbors in

It is still an open problem to find a planted clique of size

**2. Random -SAT and Proof of Unsatisfiability **

We start the topic on random -SAT formulas. In -SAT problem, we are trying to decide whether a formula in CNF with each clause containing up to literals is satisfiable. We note in class that checking satisfiability for randomly generated equations is hard even in average case. Similar to the model, we generate a -SAT formula on variable with parameter s.t. each of the clauses exists with probability . Besides the model above, we also briefly mention another model, in which we randomly pick of the possible clauses. (Note: these two models are closely related when we have ). To gain more insights, we discussed the example of -SAT problem. It has an expected number of satisfying assignments . We also observe that for any -SAT instance , we have which goes to if .

]]>