Search vs Decision vs Certification for Planted Problems

Edited 5/7/2018. Thanks to Sam Hopkins for several corrections and suggestions.

I am revising my notes from the course on “better-than-worst-case” analysis of algorithms. With the benefit of hindsight, in this post (and continuing in future posts) I would like to review again how one applies spectral methods and semidefinite programming to problems that involve a “planted” solution, and what is the role of concentration results for random matrices in the analysis of such algorithms.

Guest post by Chris Brzuska: LGBTQIA Meeting at Eurocrypt

[I was delighted to receive the following guest post by Chris Brzuska about a meeting that took place last week during Eurocrypt in Tel Aviv. This piece will also appear in Omer Reingold’s blog. Let me take this opportunity for a couple of shoutouts. Next week it’s going to be two years since Italy, last among Western European countries, has instituted same-sex civil unions (yay!) and the parties that opposed it now have an absolute majority after the last elections (boo!). The Berkeley EECS department has an LGBT+ graduate student organization called QiCSE that organizes a very visible breakfast meeting during the visit days for prospective grad students and regular meetings during the school year – as much as I value Berkeley exceptionalism, think about creating something like this in your own school. It would be great if there was a LGBT+ meeting at STOC this year; I am not going to STOC this year, but maybe someone else can take the lead. And now, on to Chris’s beautiful essay. Congratulations, Chris!. — Luca]

I gender-transitioned two years ago, and Eurocrypt 2018 in Tel-Aviv is the first major conference I attend since then. I am a bit nervous. How much time does it take for 400 people to update my name and pronouns to use “Chris” and he/him? Two years feels like an eternity to me, but surely, some people will not have heard about my gender-transition. I will need to come out to some people.

Coming-out is very empowering, but after two years and uncountable coming-outs, I really wish that everyone knows that I am trans and gay.

A gay friend of mine remarks that when being bisexual/lesbian/gay, coming out is really never over, and one needs to come out again and again, to each new person. And really, he says, there is rarely a good time to bring it up.

“How come you didn’t know I am lesbian/gay?”, I heard from several friends, in shock, worried I might have wrongly assumed they are heterosexual.

How many LGBTQIA people are in our communities? I know some LGBTQIA people in the community, but how many more are there, and how can I find them?

This simple question leads to something which would become more important to me than I expected initially.

In the rump session, I give a coming-out talk, combined with an announcement for an LGBTQIA cryptographers meeting during the rump session break ( https://eurocrypt.2018.rump.cr.yp.to/4f756d069387ee90de62454a828a3b9b.pdf).

Giving this talk in itself was very nice. I enjoyed sharing my happiness with the community, see my happiness reflected in other people’s eyes. I enjoyed the many positive comments I received during the hours and days that followed, and the recognition of daring to be visible.

During the break, I am excited and nervous. How many people will come to the meeting? And who? More than 10 people come, most of which I knew without knowing they are LGBTQIA. We walk into the room, one by one, each with light in our eyes. We came out to each other, all of us, in that moment. It’s intimate, moving, exciting. Coming out remains deeply personal. It can be daunting, even in a warm, progressive environment such as our research community and even to an LGBTQIA subgroup.

After the rump session, we go to the gay-lesbian bar Shpagat in Tel-Aviv, in happy excitement. We are the last customers that night. The next day, during the breaks, we often find ourselves with a majority of LGBTQIA people in a conversation, we sit next to each other during talks. Something important happened.

In light of our increased visibility (to each other and to the community at large), there were more opportunities for coming outs the next days (or so was my impression, although I am only conscious of 2 explicit cases…). It was very liberating for me to share many of the following conference moments with LGBTQIA cryptographers who would add additional views to a heterosexual, cissexual perspective, and who would help me explain the sensitive issue of coming out to other caring members of our research community.

The research community is my permanent country of residence, my frame of reference, the source of almost all my long-term friendships – and enfin, in this country, there live quite a few LGBTQIA people, and the research community encourages us and shares our happiness.

We are going to organize more LGBTQIA meetings alongside cryptography-related conferences. I hope, there will be more such meetings inside and outside of CS. And we look forward to see the number of LGBTQIA researchers (that we are aware of) grow.

If you are an LGBTQIA researcher who wants to get in touch with us more discretely than at a public meeting (to talk to one of us, e.g., in the beginning of your PhD etc.), you can send an eMail to queercrypt@gmail.com. You can also use that eMail address to join our mailing list (for event announcements) and/or our WhatsApp group (include your phone number if you want to join the WhatsApp group). While the group centers around cryptography-related events, the group is not limited to researchers in cryptography.

Lies, Damns Lies, and Herbert London

I am grading the final projects of my class, I am trying the clear the backlog of publishing all the class notes, I am way behind on my STOC reviews, and in two days I am taking off for a complicated two-week trips involving planes, trains and a rented automobile, as well as an ambitious plan of doing no work whatsoever from December 20 to December 31.

So, today I was browsing Facebook, and when I saw a post containing an incredibly blatant arithmetic mistake (which none of the several comments seemed to notice) I spent the rest of the morning looking up where it came from.

The goal of the post was to make the wrong claim that people have been paying more than enough money into social security (through payroll taxes) to support the current level of benefits. Indeed, since the beginning, social security has been paying individuals more than they put in, and now that population and salaries have stop growing, social security is also paying out retired people more than it gets from working people, so that the “trust fund” (whether one believes it is a real thing or an accounting fiction) will run out in the 2030s unless some change is made.

This is a complicated matter, but the post included a sentence to the extent that $4,500 a year, with an interest of 1% per year “compounded monthly”, would add up to$1,3 million after 40 years. This is not even in the right order of magnitude (it adds up to about $220k) and it should be obvious without making the calculation. Who would write such a thing, and why? My first stop was a July 2012 post on snopes, which commented on a very similar viral email. Snopes points out various mistakes (including the rate of social security payroll taxes), but the calculation in the snopes email, while based on wrong assumptions, has correct arithmetic: it says that$4,500 a year, with a 5% interest, become about $890k after 49 years. So how did the viral email with the wrong assumptions and correct arithmetic morph into the Facebook post with the same wrong assumptions but also the wrong arithmetic? I don’t know, but here is an August 2012 post on, you can’t make this stuff up, Accuracy in Media, which wikipedia describes as a “media watchdog.” The post is attributed to Herbert London, who has PhD from Columbia, is a member of the Council on Foreign Relation and used to be the president of a conservative think-tank. Currently, he has an affiliation with King’s College in New York. London’s post has the sentence I saw in the Facebook post: (…) an employer’s contribution of$375 per month at a modest one percent rate compounded over a 40 year work experience the total would be $1.3 million. The rest of the post is almost identical to the July 2012 message reported by Snopes. Where did Dr. London get his numbers? Maybe he compounded this hypothetical saving as 1% per month? No, because that would give more than$4 million. One does get about $1.3 million if one saves$375 a month for thirty years with a return of 1% per month, though.

Perhaps a more interesting question is why this “fake math” is coming back after five years. In 2012, Paul Ryan put forward a plan to “privatize” Social Security, and such a plan is now being revived. The only way to sell such a plan is to convince people that if they saved in a private account the amount of payroll taxes that “goes into” Social Security, they would get better benefits. This may be factually wrong, but that’s hardly the point.

Against a 61% Tax Increase on Berkeley Students

Currently, when graduate students work as teaching assistants, the university waives their tuition and pays them a stipend. Under current tax law, students pay income tax “only” on their stipend. A provision in the tax bill currently under consideration would count the waived tuition as income, on which the student would have to pay taxes as well.

A calculation by a Berkeley physics graduate student (source) finds that a student who work as TA for both semesters and the summer, is payed at “step 1” of the UC Berkeley salary scale, and is a California resident, currently pays ​$2,229 in federal income tax, which would become ​$3,641​ under the proposed tax plan, a 61% increase. The situation for EECS students is a bit different: they are paid at a higher scale, which puts them in a higher bracket, and they are often on a F1 visa, which means that they pay the much-higher non-resident tuition, so they would be a lot worse off (on the other hand, they usually TA at most one semester per year). The same calculation for MIT students shows a 240% tax increase. A different calculation (sorry, no link available) shows a 144% increase for a Berkeley EECS student on a F! visa.

This is one of the tax increases that go to fund the abolition of the estate tax for estates worth more than \$10.9 million, a reduction in corporate tax rates, a reduction in high-income tax rates, and other benefits for multi-millionaires.

There is also a vox explainer, and articles in inside higher ed and the chronicle of higher education with more information.

If you are a US Citizen, and if you think that graduate students should not pay for the estate tax of eight-figure estates, you should let you representative know. Usually calling, and asking to speak with the staffer responsible for tax policy, is much better than emailing or sending a physical mail. You can find the phone numbers of your representatives here.

If you have any pull in ACM, this is the kind of matter on which they might want to make a factual statement about the consequences for US computer science education, as they did at the time of the travel ban.

Beyond Worst-Case Analysis: Lecture 11

Scribed by Neng Huang

In which we use the SDP relaxation of the infinity-to-one norm and Grothendieck inequality to give an approximation reconstruction of the stochastic block model.

1. A Brief Review of the Model

First, let’s briefly review the model. We have a random graph ${G = (V, E)}$ with an unknown partition of the vertices into two equal parts ${V_1}$ and ${V_2}$. Edges across the partition are generated independently with probability ${q}$, and edges inside the partition are generated independently with probability ${p}$. To abbreviate the notation, we let ${a = pn/2}$, which is the average internal degree, and ${b = qn/2}$, which is the average external degree. Intuitively, the closer are ${a}$ and ${b}$, the more difficult it is to reconstruct the partition. We assume ${a > b}$, although there are also similar results in the complementary model where ${b}$ is larger than ${a}$. We also assume ${b > 1}$ so that the graph is not almost empty.

We will prove the following two results, the first of which will be proved using Grothendieck inequality.

1. For every ${\epsilon > 0}$, there exists a constant ${c_\epsilon}$ such that if ${a - b > c_\epsilon\sqrt{a+b}}$, then we can reconstruct the partition up to less than ${\epsilon n}$ misclassified vertices.
2. There exists a constant ${c}$ such that if ${a-b \geq c \sqrt{\log n}\sqrt{a+b}}$, then we can do exact reconstruct.

We note that the first result is essentially tight in the sense that for every ${\epsilon > 0}$, there also exists a constant ${c_\epsilon'}$ such that if ${a-b < c_\epsilon'\sqrt{a+b}}$, then it will be impossible to reconstruct the partition even if an ${\epsilon}$ fraction of misclassified vertices is allowed. Also, the constant ${c_\epsilon}$ will go to infinity as ${\epsilon}$ goes to 0, so if we want more and more accuracy, ${a-b}$ needs to be a bigger and bigger constant times ${\sqrt{a+b}}$. When the constant becomes ${O(\sqrt{\log n})}$, we will get an exact reconstruction as stated in the second result.

Theory of Computing: the Book

Over the past four decades, Avi Wigderson, figuratively, wrote the book on theoretical computer science. Now he has literally done so. I can’t wait for the movie adaptation.

I was very saddened to hear that Corrado Böhm died today at age 94.

Böhm was one of the founding fathers of Italian computer science. His dissertation, from 1951, was one of the first (maybe the first? I don’t know the history of these ideas very well) examples of a programming language with a compiler written in the language itself. In the 1950s and 1960s he worked at the CNR (an Italian national research institution with its own technical staff), in the IAC (Institute for the Applications of Computing) directed by mathematician Mauro Picone. IAC was the second place in Italy to acquire a computer. In 1970 he moved to the University of Turin, were he was the founding chairman of the computer science department. In 1972 he moved to the Sapienza University of Rome, in the Math department, and in 1989 he was one of the founders of the Computer Science department at Sapienza. He remained at Sapienza until his retirement.

Böhm became internationally known for a 1966 result, joint with Giuseppe Jacopini, in which he showed, roughly speaking, that programs written in a language that includes goto statements (formalized as flow-charts) could be mapped to equivalent programs that don’t. The point of the paper was that the translation was “structural” and the translated program retained much of the structure and the logic of the original program, meaning that programmers could give up goto statements without having to fundamentally change the way they think.

Dijkstra’s famous “Go To Statement Considered Harmful” 1968 letter to CACM had two references, one of which was the Jacopini-Böhm theorem.

Böhm was responsible for important foundational work on lambda calculus, typed functional languages, and the theory of programming languages at large.

He was a remarkable mentor, many of whose students and collaborators (including a notable number of women) became prominent in the Italian community of theory of programming languages, and Italian academia in general.

In the photo above is Böhm with Simona Ronchi, Betti Venneri and Mariangiola Dezani, who all became prominent Italian professors.

You may also recognize the man on the right as a recent recipient of the Turing Award. Silvio Micali went to Sapienza to study math as an undergrad, and he worked with Böhm, who encouraged Silvio to pursue his PhD abroad.

I studied Computer Science at Sapienza, starting the first year that the major was introduced in 1989. I remember that when I first met Böhm he reminded me of Doc Brown from Back to the Future: a tall man with crazy white hair, speaking of wild ideas with incomprehensible technical terms, but with unstoppable enthusiasm.

One year, I tried attending a small elective class that he was teaching. My, probably imprecise, recollection of the first lecture is as follows.

He said that one vertex is a binary tree, and that if you connect two binary trees to a new root you also get a binary tree, then he asked us, how would you prove statements on binary trees by induction? The class stopped until we would say something. After some consultation among us, one of the smart kids proposed “by induction on the number of vertices?” Yes, said Böhm, that would work, but isn’t there a better way? He wanted us to come up by ourselves with the insight that, since binary trees have a recursive definition, one can do induction on the structure of the definition.

In subsequent lectures, we looked (without being told) at how to construct purely functional data structures. I dropped the class after about a month.

Beyond Worst-Case Analysis: Lecture 10

Scribed by David Dinh

In which we go over a more powerful (but difficult to compute) alternative to the spectral norm, and discuss how to approximate it.

Today we’ll discuss a solution to the issue of high-degree vertices distorting spectral norms, which will prepare us for next lecture’s discussion on community detection in the stochastic block model using SDP. We’ll discuss a new kind of norm, the infinity-to-one norm, and find an efficient way to approximate it using SDP.

Beyond Worst-Case Analysis: Lecture 9

Scribed by Chinmay Nirkhe

In which we explore the Stochastic Block Model.

1. The ${G_{n,p,q}}$ problem

The Stochastic Block Model is a generic model for graphs generated by some parameters. The simplest model and one we will consider today is the ${G_{n,p,q}}$ problem.

Definition 1 (${G_{n,p,q}}$ graph distribution) The ${G_{n,p,q}}$ distribution is a distribution on graphs of ${n}$ vertices where ${V}$ is partitioned into two 2 subsets of equal size: ${V = V_1 \sqcup V_2}$. Then for ${\{i,j\}}$ pair of vertices in the same subset, ${\Pr( (i,j) \in E) = p}$ and otherwise ${\Pr( (i,j) \in E) = q}$.

We will only consider the regime under which ${p > q}$. If we want to find the partition ${V = V_1 \sqcup V_2}$, it is intuitive to look at the problem of finding the minimum balanced cut. The cut ${(V_1, V_2)}$ has expected size ${q n^2 / 4}$ and any other cut will have greater expected size.

Our intuition should be that as ${p \rightarrow q}$, the problem only gets harder. And for fixed ratio ${p/q}$, as ${p,q \rightarrow 1}$, the problem only gets easier. This can be stated rigorously as follows: If we can solve the problem for ${p,q}$ then we can also solve it for ${cp, cq}$ where ${c > 1}$, by keeping only ${1/c}$ edges and reducing to the case we can solve.

Recall that for the ${k}$-planted clique problem, we found the eigenvector ${{\bf x}}$ corresponding to the largest eigenvalue of ${A - \frac{1}{2} J}$. We then defined ${S}$ as the vertices ${i}$ with the ${k}$ largest values of ${|x_i|}$ and cleaned up ${S}$ a little to get our guess for the planted clique.

In the Stochastic Block Model we are going to follow a similar approach, but we are instead going to find the largest eigenvalue of ${A - \left( \frac{p + q}{2} \right) J}$. Note this is intuitive as the average degree of the graph is ${p(n/2 - 1) + q(n/2) \approx \frac{n}{2} (p+q)}$. The idea is simple: Solve ${{\bf x}}$ the largest eigenvector corresponding to the largest eigenvalue and define

$\displaystyle V_1 = \{ i : x_i > 0\}, \qquad V_2 = \{ i : x_i \leq 0 \} \ \ \ \ \ (1)$