More on Taiwan in a later post. Today I would like to give a couple of updates on the survey paper on average-case complexity theory that Andrej Bogdanov and I wrote in 2006: apologies to the readers for a number of missing references, and news on the front of worst-case to average-case reductions.

**1. Addendum and missing references **

In Section 2 and 3, we discuss Levin’s proof of the analog of NP-completeness for the theory of average-case complexity. Levin proves that there is a decision problem in NP, let’s call it , such that if there is an “average polynomial time” algorithm for with respect to the uniform distribution, then for every decision problem in NP and for every distribution of instances that is “polynomial time computable” in a certain technical sense, then also admits an “average polynomial time” algorithm with respect to the distribution .

To prove such a result, we want to take an instance sampled from and transform it into one (or more) instance , such that if we feed into an “average polynomial time” algorithm for problem then

- the algorithm will run also in “average polynomial time” and
- from the answer of the algorithm we can decide whether or not.

This is different from the Cook-Levin theorem in that we need the instance produced by the reduction to be approximately uniformly distributed, or else we are not guaranteed that the algorithm for that runs in “average polynomial time” with respect to the uniform distribution will also run in “average polynomial time” in whatever is the distribution of .

The key idea in Levin’s proof is to use compression: for the class of “polynomial time computable” distributions that he considers, there exist essentially optimal compression algorithms, such that if is sampled from and then mapped to a compressed (but efficiently invertible) representation , then is “uniformly distributed.” Then one has to map to in a way that does not mess up the distribution too much. (Above, “uniformly distributed” is in quotes because will be a string of variable length, so the sense in which it is approximately uniformly distributed has to be defined carefully.)

In order to turn this intuition into a proof, one has to give a precise definition of “average polynomial time” and of reductions between distributional problems, and prove that reductions preserve the existence of average polynomial time algorithms. These matters are quite subtle, because the first definitions that come to mind don’t quite work well, and the definitions that do work well are a bit complicated.

The paper with Andrej lacks references to several important works that clarified, simplified, and generalized Levin’s original presentation. In particular, our definition of average polynomial time, and the proof that it is preserved by reductions, is due to Russell Impagliazzo, in his famous “five worlds” paper. The presentation of the completeness result follows notes by Oded Goldreich. As in Oded’s notes, we present the completeness of a Bounded Halting problem, which we erroneously attribute to Levin, while it is due to Gurevich. After one proves the existence of one complete problem, one would like to see a series of reductions to natural problems, establishing more completeness results, that is, one would like the average-case complexity analog of Karp’s paper! This has not happened yet, although there have been a number of important average-case completeness results, which we also fail to cite.

We have submitted to the publisher an addendum to correct all the above omissions. We have already privately apologized to the authors of the above works, but we would also like to publicly apologize to the readers of the survey.

**2. Worst-case to average-case reductions **

In Section 7, Andrej and I discuss the problem of constructing “worst-case to average-case reductions” for NP-complete problems. Ideally, we would like to “collapse” Levin’s theory to the standard theory of NP-completeness, and to prove that if , or something morally equivalent like , then Levin’s complete problem does not admit an average polynomial time algorithm with respect to the uniform distribution (same for other complete distributional problems). More ambitiously, one may hope to base symmetric-key cryptography, or even public-key cryptography, on the assumption that , which, short of settling the versus question, would be the soundest possible basis on which to base cryptography. (One would also get high confidence that the resulting cryptosystems are quantum-resistant!)

Indeed, there are problems, like the Permanent and (in a limited sense) the Discrete Log problem in which one can take a worst-case instance , and map into to a sequence of instances , each of which is uniformly distributed and such that solving the problem on the gives a solution for . Thus, an average polynomial time algorithm for such problems, which are called random self-reducible, gives a worst-case (randomized) polynomial time algorithm.

Feigenbaum and Fortnow, unfortunately, prove that if a problem in NP is random self reducible (or even randomly reducible, in a similar sense, to another problem in NP), then it is, essentially, in coNP, and so such techniques cannot apply to NP-complete problem.

In 2003, Andrej Bogdanov and I generalized this to show that if one has any non-adaptive reduction from a problem to a distributional problem where is in and is samplable, then is in and thus unlikely to be NP-complete. Our definition of reduction allows any non-adaptive procedure that, given an oracle that solves in average polynomial time with respect to , is able to solve in worst-case polynomial time.

It remains open whether such a result holds for adaptive reductions, and my guess is that it does. I actually had a much stronger guess, namely that for every NP problem and every samplable distribution , solving well on average with respect to is doable with a computational power in the ballpark of , perhaps in randomized polynomial time with oracle access to a language in . If so there would be a fundamental gap between the complexity of solving NP-complete problems in the worst case versus the average case.

A recent, very exciting paper by Shuichi Hirahara, shows that my guess was wrong, and that there are distributional problems in NP with respect to samplable distributions, whose complexity is fundamentally higher than .

Hirahara considers an NP problem called MINKT, which is the problem of computing a resource-bounded version of Kolmogorov complexity. He shows that if MINKT has an average polynomial time algorithm, then it has a ZPP worst-case algorithm (the latter is an approximation algorithm with additive error). Furthermore, under reasonable assumptions, MINKT is *not* in coNP, or coNP/poly, or in any morally equivalent class.

So did Hirahara come up with an adaptive reduction, in order to get around the result of Andrej and myself? No, remarkably he comes up with an argument that *is not a reduction*! (More modestly, he says that he comes up with a *non-black-box* reduction.)

To clarify terminology, I prefer to call a *reduction* (what most other people prefer to call a “black box” reduction, in this setting) a way of showing that, given an oracle for a problem we show how to solve problem ; hence an algorithm for implies an algorithm for .

This seems to be the only possible way of showing that if we have an algorithm for then we have an algorithm for , but consider the following example. Say that is a complete problem for , the second level of the polynomial time hierarchy; we know that if there is a polynomial time algorithm for 3SAT then there is an algorithm for , however the proof does not involve a (black box, if you must) reduction from to 3SAT. In fact, we do not know how to solve in polynomial time given an oracle for 3SAT and, assuming that the polynomial time hierarchy does not collapse, *there is no way to do that*!

So how do we prove that if 3SAT is in polynomial time then is in polynomial time? We take the “algorithm” for , which is an NP machine with oracle an NP machine, we replace calls to the oracle with executions of the 3SAT algorithm, and now we have a regular NP machine, which we can reduce to 3SAT. Note that, in the first step, we had to *use the code of the algorithm for 3SAT*, and use the assumption that it runs in polynomial time.

Hirahara’s argument, likewise, after assuming that MINKT has an average polynomial time algorithm, uses the code of the algorithm and the assumption about efficiency.

At this point, I do not have a good sense of the computational power that is necessary or sufficient to solve NP distributional problems well on average, but Hirahara’s paper makes me think it’s conceivable that one could base average-case hardness on , a possibility that I previously considered hopeless.

]]>

After Apple made history by becoming the first publicly traded company to be worth more than $1 Trillions in the US stock market, several news sources reported that, *adjusting for inflation,* the East India Dutch company was worth, at one point in the late 1600s or early 1700s, more than $7 Trillions. Here, for example, is a Time.com article putting its adjusted value at $8 Trillion.

There were only about 600 million people in the world at that point (source) mostly living off subsistence agriculture, and that would be more than $10,000 of today’s dollar per person! Maddison estimates the the world’s GDP at that time was about $90 Billion in 1990 dollars (source) or about $180 Billion in 2018 dollars (source for 1990-2018 adjustment) and the GDP of the Netherland at that time was about $4 Billion in 1990 dollars (source) or about $8 Billion in 2018 dollars. Could a company really be worth a thousand time the GDP of its country and 40 times the world’s GDP? Maddison’s estimates are controversial, but even Bairoch’s higher estimates put the combined GDP of North America, Europe, Russia and Japan at about $300 Billion in 2018 dollars (source).

So how much would the 1700s value of the Dutch East Indian Company be worth in 2018 dollars, and where did the crazy $7 Trillion come from? The answer to the first question is about $1 Billion, and the answer to the second question is not clear, but most likely someone named Alex Planes came up with that number with some creative methodology, and then the factoid made it to more and more prestigious publications, each one citing the previous occurrence, and this includes publications with high standards for fact-checking like the Atlantic.

Trying to answer the second question shows how completely broken fact-checking is whenever numbers are involved.

So let’s start with the second question: where does the $7-8 Trillion figure comes from?

The Time.com article cites a Motley Fool article from 2012, which says that the company was worth, at its peak, 78 million guilders, or $7.4 Trillion.

Business Insider also mentions the factoid in 2017, attributing it to Visual Capitalist, which attributes it (in the comments!) to Motley Fool. The Independent mentions it without attribution. Several other publications have credited either Business Insider or Visual Capitalist, thus ultimately referring back to Motley Fool. For example, earlier this year, Wikipedia had the claim, referring it to Digg, which referred it to Visual Capitalist.

The Atlantic mentions it in 2015, and it references a 2010 Bloomberg article that is behind a paywall but mirrored here, which says that a recently discovered share of the company could be worth more than $700,000 at auction, because it is only the fourth such to have been preserved.

The Atlantic link is bizarre but Motley Fool also cites Bloomberg (not a specific article, just “Bloomberg”) and this is the only article in Bloomberg appeared before 2012 that I could find mentioning the East India Dutch company. Furthermore, I could not find any reference to the factoid before the Motley Fool article. So my guess is that it originated in the Motley Fool article, and that it is related to the Bloomberg article. Perhaps the company had 10 million stocks, and Alex Planes, the author of the Motley Fool article, simply multiplied what the certificate would sell for today by the number of stocks? This is also the theory favored in this reddit thread.

What about fact-checking? The Atlantic and the Independent are the kind of publications that would freak out if someone’s last name is misspelled and issue a correction, and if someone’s name was spelled Rxsfgw in the original draft, the copy editor would probably get back to the author and tell them that that’s probably not a real name.

I would argue that the notion that a company could be worth $7 Trillion of today’s dollar in the 1600s or 1700s should look more unlikely than someone’s name being Rxsfgw, and inspire the googling that shows the number to be off by three or four orders of magnitude.

So, how much was the company worth? The Motley Fool article (and all follows-up) say that at the top it was worth 78 million guilder. I could not find any reference to this claim before August 2012, so this could have also been made up in the Motley Fool article, but, taking it for good, this currency converter puts one guilder in 1700 as being about $17 in 2018 dollars (it returns a value in 2015 dollars, which I further adjusted) so that would $1,3 Billion (the reddit thread linked above reaches similar conclusions using other sources, and the value a guilder is estimated as being between $9 and $20).

]]>

Excellence is the sole criterion of evaluation

Here are the review criteria for the US National Science Foundation:

Reviewers evaluate all NSF proposals through the use of two National Science Board approved merit review criteria: Intellectual Merit and Broader Impacts, which are based upon Merit Review Principles. Reviewers are asked to consider five elements in the review for both criteria. For more information on merit review principles and criteria, see PAPPG Chapter III.A.

(If you are keeping track, that’s two criteria and ten principles)

]]>
*Edited 5/7/2018. Thanks to Sam Hopkins for several corrections and suggestions.*

I am revising my notes from the course on “better-than-worst-case” analysis of algorithms. With the benefit of hindsight, in this post (and continuing in future posts) I would like to review again how one applies spectral methods and semidefinite programming to problems that involve a “planted” solution, and what is the role of concentration results for random matrices in the analysis of such algorithms.

In general, a “planted” distribution is one where we start from picking a random “planted” solution and then we pick an instance of our problem in which the planted solution is good. Such problems are interesting for the average-case analysis of algorithms, because they provide a non-trivial testing ground to understand why certain algorithms perform better in practice than in the worst case; planted problems are also of great interest in complexity theory, because when they are average-case hard they imply the existence of one-way functions; planted problems are also a rich ground for interdisciplinary work: statisticians recognize them as parametric distributions, where the parameter is the planted solutions, for which one wants to approximate the parameter given one sample, and information theorists think of them as a channel where the sender sends a planted solution and the receiver gets an instance of the problem, so that one recognizes that having a large mutual information between instance and planted solution is a necessary condition for the problem to be solvable.

For the sake of this post, it is helpful to focus on the motivating examples of planted -clique and planted bisection. In planted -clique, we sample a graph from the distribution, then we randomly select vertices and we add all the necessary edges to make those edges a clique. In the planted bisection, or stochastic block model, distribution, we randomly split a set of vertices into two subsets of equal size and then we select edges so that edges within and within have probability , and edges crossing the cut have probability , with .

A graph sampled from the planted clique distribution is a lot like a graph, except for the extra edges that create a large clique, and a graph sampled from the distribution is a lot like a random graph, except for the cut , which is crossed by roughly edges, is sparser than a typical balanced cut, which is crossed by about edges.

**1. The Search Problem, the Decision Problem, and the Certification Problem **

Having defined these distributions, our problem of interest is:

- The
*search*problem: given a sample from the planted distribution, find the planted solution, or an approximation, in a useful sense, of the planted solution.

Usually, in planted problems, the solution that is planted is of a kind that exists only with negligible, or even exponentially small, probability in the “non-planted” distribution that we start from. Thus, an algorithm that solves the search problem with noticeable probability also solve the following problem with probability close to :

- The
*decision*problem of distinguishing a sample of the planted distribution from a sample of the analogous “non-planted” distribution.

For small values of , achieving distinguishing probability in the decision problem can be (seemingly) much easier than solving the search problem with probabiltiy . For example, if , we do not know any polynomial time algorithm that solves the search problem for planted -clique with success probability, but checking whether the number of edges is solves the distinguishing problem with distinguishing probability order of . Even if we change our definition of our “non-planted” distribution so that it has edges on average, the number of triangles can still be used to noticeably distinguish it from the non-planted distribution.

In the stochastic block model if we call the expected “internal” degree of a vertex (that is, the number of neighbors on the same side of the planted cut) and the expected “external” degree, and if and are absolute constants, then it is known that the search problem is not solvable, even in an approximate sense, when , as proved by Mossel, Neeman and Sly (note that our definition of and is off from theirs by a factor of 2). Provided that and are absolute constants and , however, one can distinguish from with constant distinguishing probability. This is because the number of triangles (counted as ordered sequences of three vertices) in is, on average, , while it is, on average, in ; furthermore, the variance of both distributions is an absolute constants (those calculations appear in the above paper of Mossel et al.). Now if we have two nonnegative integer random variables and such that the expectations and variances of both are absolute constants and , then there is a threshold such that testing whether a number is is a test that distinguishes and with constant probability.

As far as I am aware, in all the known cases where the decision problem is efficiently solvable with distinguishing probability then it is also known how to efficiently solve the search problem (perhaps approximately) with probability, although algorithms that solve the distinguishing problem don’t always give insights into how to solve the search problem. Mossel et al., for example, show that the decision problem for the stochastic block model is efficiently solvable with distinguishing probability when , by counting the number of simple cycles of length around , and they could use the approach of counting simple cycles to approximate and , but they could not show how to solve the search problem in the regime, a problem that they solved later with additional techniques (the problem was also solved by Massoulie independently).

In summary, efficiently solving the decision problem is a necessary but not sufficient condition for being able to efficiently solve the search problem, and it is a good first step to understand the complexity of a given planted problem.

Another approach to distinguish the distribution from the planted -clique distribution is to able to certify, given a graph sampled from , that its maximum clique size is , a fact that is going to be true with high probability if . Assuming that, like in the planted clique problem, we are discussing instances of an optimization problem, we thus have the following problem:

- The
*certification*problem: given an instance sampled from the “non-planted” distribution, produce a certificate that the optimum value for the instance is worse than the optimum value of all (or all but a negligible fraction of) instances from the planted distribution.

Note that if you can solve the certification with probability , then you can also solve the decision problem with distinguishing probability (or minus a negligible term, if the certified property is allowed to hold with negligible probability in the planted distribution).

In some interesting cases, the certification problem appears to be harder than the decision problem and the search problem.

In the stochastic block model, for example, if one takes the min bisection problem to be the underlying optimization problem, the search problem is solvable (in the sense that one can recover a partition that correlates to the planted partition) whenever , as discussed above, but for very close to it is an open question whether even knowing the value of the min bisection exactly can be used to solve the decision problem.

Another interesting example is “planted Max 3SAT.” Suppose that we construct a satisfiable instance of 3SAT in the following way: we start from a random assignment to variables, and then we pick at random clauses among the clauses that are consistent with the “planted” assignment. If is, say , then it is very easy to solve the search problem (and hence the distinguishing problem): the value of every variable in the planted assignment can be deduced by looking at whether the variable appears more often complemented or not complemented. In the non-planted distribution in which we pick clauses among all the possible clauses, however, we don’t know any efficient algorithm that certifies that instance is not satisfiable. Even if we pick clauses that are consistent with both the planted assignment and the negation of the planted assignment (thus eliminating correlations between complementations of variables and the values of the variables in the planted assignment), we still introduce pairwise correlations that can used to find the planted assignment. See the comment by Ryan O’Donnell below for more information.

When we are dealing with a hard optimization problem, a natural approach is to study efficiently solvable convex *relaxations* of the problem. One can hope to use a relaxation to solve the *certification* problem, by showing the relaxation provides a bound to the optimum in non-planted instances that is sufficient to distinguish them from planted instances. One can also hope to use a relaxation to solve the *search* problem, by showing that the optimum of the relaxation is close, in appropriate sense, to the planted solution, or even that the planted solution is the unique optimum of the relaxation for most planted instances.

Interestingly, in all the examples that I am aware of, a relaxation has been successfully used to solve the certification problem if and only if it has been successfully used to solve the search problem. Intuitively, if a relaxation does not solve the certification problem, it means that there are feasible solutions in the non-planted case whose cost is already better than the cost of the planted solution, and so the planted solution cannot be the optimum, or close to the optimum, of the relaxation in the planted case. For the other direction, if a relaxation solves the certification problem, then the proof that it does can usually be lifted to a proof that all solutions that don’t “correlate” (in an appropriate sense) with the planted solution cannot be optimal solutions in the planted case, allowing one to conclude that the optimum of the relaxation in the planted case correlates with the planted solution.

In summary, although the certification problem can be harder than the search problem and the decision problem, if one wants to use a relaxation to solve the search problem then it is good to start understanding whether the relaxation solves the certification problem.

In the next post we will discuss how we can efficiently certify properties of random instances of optimization problems, and how to turn those results into algorithms that find planted solutions. We will see the key results involve showing that certain matrices associated to the instances are close to their expectations in appropriately defined norms.

]]>
*[I was delighted to receive the following guest post by Chris Brzuska about a meeting that took place last week during Eurocrypt in Tel Aviv. This piece will also appear in Omer Reingold’s blog. Let me take this opportunity for a couple of shoutouts. Next week it’s going to be two years since Italy, last among Western European countries, has instituted same-sex civil unions (yay!) and the parties that opposed it now have an absolute majority after the last elections (boo!). The Berkeley EECS department has an LGBT+ graduate student organization called QiCSE that organizes a very visible breakfast meeting during the visit days for prospective grad students and regular meetings during the school year – as much as I value Berkeley exceptionalism, think about creating something like this in your own school. It would be great if there was a LGBT+ meeting at STOC this year; I am not going to STOC this year, but maybe someone else can take the lead. And now, on to Chris’s beautiful essay. Congratulations, Chris!. — Luca]*

I gender-transitioned two years ago, and Eurocrypt 2018 in Tel-Aviv is the first major conference I attend since then. I am a bit nervous. How much time does it take for 400 people to update my name and pronouns to use “Chris” and he/him? Two years feels like an eternity to me, but surely, some people will not have heard about my gender-transition. I will need to come out to some people.

Coming-out is very empowering, but after two years and uncountable coming-outs, I really wish that everyone knows that I am trans and gay.

A gay friend of mine remarks that when being bisexual/lesbian/gay, coming out is really never over, and one needs to come out again and again, to each new person. And really, he says, there is rarely a good time to bring it up.

“How come you didn’t know I am lesbian/gay?”, I heard from several friends, in shock, worried I might have wrongly assumed they are heterosexual.

How many LGBTQIA people are in our communities? I know some LGBTQIA people in the community, but how many more are there, and how can I find them?

This simple question leads to something which would become more important to me than I expected initially.

In the rump session, I give a coming-out talk, combined with an announcement for an LGBTQIA cryptographers meeting during the rump session break ( https://eurocrypt.2018.rump.cr.yp.to/4f756d069387ee90de62454a828a3b9b.pdf).

Giving this talk in itself was very nice. I enjoyed sharing my happiness with the community, see my happiness reflected in other people’s eyes. I enjoyed the many positive comments I received during the hours and days that followed, and the recognition of daring to be visible.

During the break, I am excited and nervous. How many people will come to the meeting? And who? More than 10 people come, most of which I knew without knowing they are LGBTQIA. We walk into the room, one by one, each with light in our eyes. We came out to each other, all of us, in that moment. It’s intimate, moving, exciting. Coming out remains deeply personal. It can be daunting, even in a warm, progressive environment such as our research community and even to an LGBTQIA subgroup.

After the rump session, we go to the gay-lesbian bar Shpagat in Tel-Aviv, in happy excitement. We are the last customers that night. The next day, during the breaks, we often find ourselves with a majority of LGBTQIA people in a conversation, we sit next to each other during talks. Something important happened.

In light of our increased visibility (to each other and to the community at large), there were more opportunities for coming outs the next days (or so was my impression, although I am only conscious of 2 explicit cases…). It was very liberating for me to share many of the following conference moments with LGBTQIA cryptographers who would add additional views to a heterosexual, cissexual perspective, and who would help me explain the sensitive issue of coming out to other caring members of our research community.

The research community is my permanent country of residence, my frame of reference, the source of almost all my long-term friendships – and enfin, in this country, there live quite a few LGBTQIA people, and the research community encourages us and shares our happiness.

We are going to organize more LGBTQIA meetings alongside cryptography-related conferences. I hope, there will be more such meetings inside and outside of CS. And we look forward to see the number of LGBTQIA researchers (that we are aware of) grow.

If you are an LGBTQIA researcher who wants to get in touch with us more discretely than at a public meeting (to talk to one of us, e.g., in the beginning of your PhD etc.), you can send an eMail to queercrypt@gmail.com. You can also use that eMail address to join our mailing list (for event announcements) and/or our WhatsApp group (include your phone number if you want to join the WhatsApp group). While the group centers around cryptography-related events, the group is not limited to researchers in cryptography.

]]>新年快乐！

]]>So, today I was browsing Facebook, and when I saw a post containing an incredibly blatant arithmetic mistake (which none of the several comments seemed to notice) I spent the rest of the morning looking up where it came from.

The goal of the post was to make the wrong claim that people have been paying more than enough money into social security (through payroll taxes) to support the current level of benefits. Indeed, since the beginning, social security has been paying individuals more than they put in, and now that population and salaries have stop growing, social security is also paying out retired people more than it gets from working people, so that the “trust fund” (whether one believes it is a real thing or an accounting fiction) will run out in the 2030s unless some change is made.

This is a complicated matter, but the post included a sentence to the extent that $4,500 a year, with an interest of 1% per year “compounded monthly”, would add up to $1,3 million after 40 years. This is not even in the right order of magnitude (it adds up to about $220k) and it should be obvious without making the calculation. Who would write such a thing, and why?

My first stop was a July 2012 post on snopes, which commented on a very similar viral email. Snopes points out various mistakes (including the rate of social security payroll taxes), but the calculation in the snopes email, while based on wrong assumptions, has correct arithmetic: it says that $4,500 a year, with a 5% interest, become about $890k after 49 years.

So how did the viral email with the wrong assumptions and correct arithmetic morph into the Facebook post with the same wrong assumptions but also the wrong arithmetic?

I don’t know, but here is an August 2012 post on, you can’t make this stuff up, Accuracy in Media, which wikipedia describes as a “media watchdog.”

The post is attributed to Herbert London, who has PhD from Columbia, is a member of the Council on Foreign Relation and used to be the president of a conservative think-tank. Currently, he has an affiliation with King’s College in New York. London’s post has the sentence I saw in the Facebook post:

(…) an employer’s contribution of $375 per month at a modest one percent rate compounded over a 40 year work experience the total would be $1.3 million.

The rest of the post is almost identical to the July 2012 message reported by Snopes.

Where did Dr. London get his numbers? Maybe he compounded this hypothetical saving as 1% *per month*? No, because that would give more than $4 million. One does get about $1.3 million if one saves $375 a month for *thirty* years with a return of 1% per month, though.

Perhaps a more interesting question is why this “fake math” is coming back after five years. In 2012, Paul Ryan put forward a plan to “privatize” Social Security, and such a plan is now being revived. The only way to sell such a plan is to convince people that if they saved in a private account the amount of payroll taxes that “goes into” Social Security, they would get better benefits. This may be factually wrong, but that’s hardly the point.

]]>A calculation by a Berkeley physics graduate student (source) finds that a student who work as TA for both semesters and the summer, is payed at “step 1” of the UC Berkeley salary scale, and is a California resident, currently pays $2,229 in federal income tax, which would become $3,641 under the proposed tax plan, a 61% increase. The situation for EECS students is a bit different: they are paid at a higher scale, which puts them in a higher bracket, and they are often on a F1 visa, which means that they pay the much-higher non-resident tuition, so they would be a lot worse off (on the other hand, they usually TA at most one semester per year). The same calculation for MIT students shows a 240% tax increase. A different calculation (sorry, no link available) shows a 144% increase for a Berkeley EECS student on a F! visa.

This is one of the tax increases that go to fund the abolition of the estate tax for estates worth more than $10.9 million, a reduction in corporate tax rates, a reduction in high-income tax rates, and other benefits for multi-millionaires.

There is also a vox explainer, and articles in inside higher ed and the chronicle of higher education with more information.

If you are a US Citizen, and if you think that graduate students should not pay for the estate tax of eight-figure estates, you should let you representative know. Usually calling, and asking to speak with the staffer responsible for tax policy, is much better than emailing or sending a physical mail. You can find the phone numbers of your representatives here.

If you have any pull in ACM, this is the kind of matter on which they might want to make a factual statement about the consequences for US computer science education, as they did at the time of the travel ban.

]]>Scribed by Neng Huang

*In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.*

**1. A Brief Review of the Model **

First, let’s briefly review the model. We have a random graph with an unknown partition of the vertices into two equal parts and . Edges across the partition are generated independently with probability , and edges inside the partition are generated independently with probability . To abbreviate the notation, we let , which is the average internal degree, and , which is the average external degree. Intuitively, the closer are and , the more difficult it is to reconstruct the partition. We assume , although there are also similar results in the complementary model where is larger than . We also assume so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

- For every , there exists a constant such that if , then we can reconstruct the partition up to less than misclassified vertices.
- There exists a constant such that if , then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every , there also exists a constant such that if , then it will be impossible to reconstruct the partition even if an fraction of misclassified vertices is allowed. Also, the constant will go to infinity as goes to 0, so if we want more and more accuracy, needs to be a bigger and bigger constant times . When the constant becomes , we will get an exact reconstruction as stated in the second result.

**2. The Algorithm **

Our algorithm will be based on semi-definite programming. Intuitively, the problem of reconstructing the partition is essentially the same as min-bisection problem, which is to find a balanced cut with the fewest edges. This is because the balanced cut with the fewest expected edges is exactly our hidden cut. Unfortunately, the min-bisection problem is -hard, so we will use semi-definite programming. The min-bisection problem can be stated as the following program: \begin{equation*} & {\text{minimize}} & & \sum_{(u, v) \in E} \frac{1}{4}(x_u – x_v)^2

& \text{subject to} & & x_v^2 = 1, \forall v \in V

&&& \sum_{v \in V}x_v = 0. \end{equation*} Its semi-definite programming relaxation will be

Our algorithm will be as follows.

- Solve the semi-definite programming above.
- Let be the optimal solution and such that .
- Find , which is the eigenvector corresponding to the largest eigenvalue of .
- Let , .
- Output as our partition.

Ideally, we want half of the ‘s pointing to one direction, and the other half pointing to the opposite direction. In this ideal case we will have

Then will be a rank-one matrix and , which is the indicator vector of the hidden cut, will be its eigenvector with eigenvalue . The remaining eigenvalues of will be all zeros. So finding the largest eigenvector of will reveal the hidden cut. In reality, if , then our solution will be almost the same as that in the ideal case, so the cut we get will be almost the same as the hidden cut. Furthermore, if , then the unique optimal solution of the SDP will be the combinatorial solution of min-bisection problem, that is, in the vector language, the one-dimensional solution.\footnote{“A miracle”, said Luca.}

**3. Analysis of the Algorithm **

First, we rearrange the SDP to make it slightly simpler. We have the following SDP:

We note that SDP1 and SDP2 have the same optimal solution, because the cost function of SDP1 is

The first term is a constant and the second is the cost function of SDP2 with a factor of -1/4.

Now, consider the cost of SDP2 of where

The expected cost will be

Since each edge is chosen independently, with high probability our cost will be at least , which implies that the optimal solution of SDP2 will be at least . Let be the optimal solution of the SDP, then we have

n(a-b) – O(n) & \leq cost(**x**_1^\ast, \ldots, **x**_n^\ast)\nonumber

& = \sum_{u,v}A_{uv}\langle**x**_u^\ast, **x**_v^\ast\rangle\nonumber

& = \sum_{u,v}\left(A_{uv} – \frac{a+b}{n}\right)\langle**x**_u^\ast, **x**_v^\ast\rangle

In the last equality we used the fact that

When we used the spectral method last week, we said that the largest eigenvalue of is large, where is the average degree. This is because the hidden cut will give us a vector with large Rayleigh quotient. But has a relatively small spectral norm, so everything should come from , which when simplified will be 1 for entries representing vertices on the same side and -1 for entries representing vertices on different sides. We will redo this argument with SDP norm in place of spectral norm and every step appropriately adjusted.

Recall that the SDP norm of a matrix is defined to be

Let , then by Grothendieck inequality we have

We proved in the previous lecture that with high probability, so we know that the SDP norm with high probability as well. By definition, this means

Substracting 3 from 3, we obtain

where is the all-one matrix and . Plugging 5 into 4, we get

which can be simplified to

For simplicity, in the following analysis the term will be called . Notice that is a matrix with 1 for nodes from the same side of the cut and -1 for nodes from different sides of the cut, and is an inner product of two unit vectors. If is very close to zero, then the sum will be very close to . This means that should be 1 for almost every pair of , which shows that is actually very close to . Now, we will make this argument robust. To achieve this, we introduce the Frobenius norm of a matrix.

Definition 1 (Frobenius norm)Let be a matrix. The Frobenius norm of is

The following fact is a good exercise.

Fact 2Let be a matrix. Then

where denotes the spectral norm.

To see how close are and , we calculate the Frobenius norm of , which will be

This gives us a bound on the spectral norm of , namely

Let be the unit eigenvector of corresponding to its largest eigenvalue, then by Davis-Kahan theorem we have\footnote{When we apply Davis-Kahan theorem, what we get is actually an upper bound on . We have assumed here that the bound holds for , but the exact same proof will also work in the other case.}

For any , if is a large enough constant then we will have . Now we have the following standard argument:

The last inequality is because every with will contribute at least 1 in the sum . This shows that our algorithm will misclassify at most vertices.

]]>