A major reason for the exodus of the middle class from San Francisco, demographers say, is the high cost of housing, the highest in the mainland United States. Last month, the median cost of a dwelling in the San Francisco Standard Metropolitan Statistical Area was $129,000, according to the Federal Home Loan Bank Board in Washington, D.C. The comparable figure for New York, Newark and Jersey City was $90,400, and for Los Angeles, the second most expensive city, $118,400.

”This city dwarfs anything I’ve ever seen in terms of housing prices,” said Mr. Witte. Among factors contributing to high housing cost, according to Mr. Witte and others, is its relative scarcity, since the number of housing units has not grown significantly in a decade; the influx of Asians, whose first priority is usually to buy a home; the high incidence of adults with good incomes and no children, particularly homosexuals who pool their incomes to buy homes, and the desirability of San Francisco as a place to live.

$129,000 in 1981 dollars is $360,748 in 2019 dollars.

]]>
Again, the algorithm has to come up with a solution *without knowing what cost functions it is supposed to be optimizing*. Furthermore, we will think of the sequence of cost functions not as being fixed in advanced and unknown to the algorithm, but as being dynamically generated by an adversary, after seeing the solutions provided by the algorithm. (This resilience to adaptive adversaries will be important in most of the applications.)

The *offline optimum* after steps is the total cost that the best possible fixed solution would have incurred when evaluated against the cost functions seen by the algorithm, that is, it is a solution to

The *regret* after steps is the difference between the loss suffered by the algorithm and the offline optimum, that is,

The remarkable results that we will review give algorithms that achieve regret

that is, for fixed and , the regret-per-time-step goes to zero with the number of steps, as . It is intuitive that our bounds will have to depend on how big is the “diameter” of and how large is the “magnitude” and “smoothness” of the functions , but depending on how we choose to formalize these quantities we will be led to define different algorithms.

]]>

- The Barak-Hardt-Kale proof of the Impagliazzo hard-core lemma.
- The online convex optimization viewpoint on the Frieze-Kannan weak regularity lemma, on the dense model theorem of (RTTV), and on the abstract weak regularity lemma of (TTV) that were described to me by Madhur Tulsiani a few years ago. Furthermore, I wanted to see if Russel Impagliazzo’s subsequent improvements to the dense model theorem and to the abstract weak regularity lemma could be recovered from this point of view.
- The Arora-Kale algorithms for semidefinite programming, including their nearly linear-time algorithm for approximating the Goemans-Williamson relaxation of Max Cut.
- The meaning of the sentence “multiplicative weights and gradient descent are both special cases of follow-the-regularized-leader, using negative entropy and as regularizer, respectively.”
- The AllenZhu-Liao-Orecchia online optimization proof of the Batson-Spielman-Srivastava sparsification result.

I am happy to say that, except for the “furthermore” part of (2), I achieved my goals. To digest this material a bit better, I came up with the rather ambitious plan of writing a series of posts, in which I would alternate between (i) explaining a notion or theorem from online convex optimization (at a level that someone learning about optimization or machine learning might find useful) and (ii) explaining a complexity-theoretic application. Now that a very intense Spring semester is almost over, I plan to get started on this plan, although it is not clear that I will see it through the end. So stay tuned for the forthcoming first episode, which will be about the good old multiplicative weights algorithm.

]]>
*in theory*‘s first ever book reviews! The books are

Giorgio Garuzzo

*Quando in Italia si facevano i computer*

Available for free at Amazon.com and Amazon.it.

Giorgio Ausiello

*The Making of a New Science*

Available from Springer, as a DRM-free PDF through your academic library.

Both books talk about the early years of computing in Italy, on the industrial and academic side, respectively. They briefly intersect with the story of Olivetti’s Elea computer.

Olivetti was a company that was founded in 1908 to make typewriters, and then branched out to other office/business machines and avionics. In the 1930s, Adriano Olivetti, a son of the founder Camillo Olivetti, took over the company. Adriano Olivetti was an unusual figure of entrepreneur deeply interested in arts, humanities and social sciences, with a utopian vision of a company reinvesting its profits in its community. In the 1950s, he led the company to develop the Elea, the first Italian computer. The Elea was made with transistors, and it came out before IBM had built its own first transistor-based computer.

The development of Elea was led by Mario Tchou. Mario Tchou was a Chinese-Italian born and raised in Rome, who studied electrical engineering at the Sapienza University of Rome and then at Brooklyn Polytechnic, eventually becoming an assistant professor at Columbia University. Olivetti persuaded Tchou to move back to Italy and lead the development of Elea, whose first prototype came out in 1957.

As production was ramping up, tragedy struck: Adriano Olivetti died in 1960, and Mario Tchou died in 1961. To shore up the finances of the company, the new CEO Roberto Olivetti brought in a series of new investors, who pushed to spin off the computer business.

At that point, Olivetti was working on another revolutionary machine, the P101, a programmable desktop calculator billed as the “first desktop computer,” which came out in 1964, attracting huge interest. Nonetheless the company spun off its “computer” division into a joint venture with GE, eventually divesting of it completely. Fortunately, they kept control of the P101 project, because those working on it were careful in branding it internally as a “calculator” (not part of the of deal with GE) rather than a “computer.”

These events are narrated, with a fascinating insider view, in Garuzzo’s book.

Giorgio Ausiello is one of the founding fathers of academic computer science in Italy. His book is a professional memoir that starts in the 1960s, at the time in which he started working on his undergraduate thesis at the Istituto Nazionale per le Applicazioni del Calcolo (INAC, later renamed IAC) at the National Research Council in Rome. At that point INAC had one of Italy’s few computers, a machine bought in 1954 from the Ferranti company in Manchester (when it was installed, it was Italy’s *second* computer).

As narrated in a previous post, Mauro Picone, the mathematician who was leading INAC, brought Corrado Bohm to Rome to work on this computer, and Ausiello started to work with Bohm at the time in which he was just starting to think about models of computation and lambda-calculus.

Later, Ausiello visited Berkeley in the 1968-69 academic year, when Manuel Blum and Dick Karp had just joined the faculty. Ausiello took part in the first STOC, which was held in Marina del Rey in May 1969, and, later that month, he witnessed the occupation of People’s Park in Berkeley.

The Fall of 1969 marks the start of the first Italian undergraduate programs in Computer Science, in just four universities: Bari, ~~Milan,~~ Pisa and Torino. Back in Italy from Berkeley, Ausiello continued to work at the National Research Council in Rome.

The book continues with a behind-the-scene narration of the events that led to the founding of the EATCS professional society, the ICALP conference and the TCS journal. There is also another trip to Berkeley in the 1980s, featuring Silvio Micali and Vijay Vazirani working on their matching algorithm, and Shafi Goldwasser just arriving in Berkeley.

Methodically documented and very detail-oriented, the book is a fascinating read, although it leaves you sometimes wanting to hear more about the personalities and the stories of the people involved and less about the attendance lists of certain meetings.

Even when it comes to the dryer details, however, I am happy that the books documents them and makes them available to future generations that will not have any living memory of the 1960s and 1970s.

I should also mention that Alon Rosen has recently interviewed Christos Papadimitriou and Avi Wigderson and those (*long*) interviews are full of good stories. Finally, the Simons Foundation site has an interview of Laszlo Lovasz in conversation with Avi Wigderson which I very highly recommend everybody to watch.

Congratulations to the Knuth prize committee chaired by Avrim Blum for the excellent choice of awarding the 2019 Knuth prize to Avi Wigderson.

Avi has worked on all aspects of computational complexity theory, and he has had a transformative influence on the way theoretical computer science relates to pure mathematics. I will not repeat what I wrote about his work on the occasion of his 60th birthday here and here. Long-term readers of *in theory* will remember that I consider him as one of the saints of computational complexity.

The organizers of the coming FOCS would like me to remind you that the deadline is this Friday, and that someone, for some reason, has set up a fake submission site (on the domain aconf dot org) but the true submission site (that, to be honest, looks less legit than the fake one) is at focs19.cs.utexas.edu.

Also, the deadline to submit nominations for the inaugural FOCS test of time award is in three weeks. There will be three awards, one for papers appeared in FOCS 1985-89, one for FOCS 1995-99 and one for FOCS 2005-09.

On an unrelated note, GMW appeared in FOCS 1986 and the Nisan-Wigderson “Hardness versus randomness” paper appeared in FOCS 1988.

]]>Like a certain well-known Bay Area institution, Bocconi is a private university that was endowed by a rich merchant in memory of his dead son. Initially characterized by an exclusive focus on law, economics and business, it has had for a while a high domestic recognition for the quality of teaching and, more recently, a good international profile both in teaching and research. Despite its small size, compared to Italy’s giant public universities, in 2017 Bocconi was the Italian university which had received the most ERC grants during the first ten years of existence of the European Research Council (in second place was my Alma Mater, the Sapienza University of Rome, which has about nine times more professors) (source).

About three years ago, Bocconi started planning for a move in the space of computing, in the context of their existing efforts in data science. As a first step, they recruited Riccardo Zecchina. You may remember Riccardo from his work providing a non-rigorous calculation of the threshold of random 3-SAT, his work on the “survey propagation” algorithm for SAT and other constraint satisfaction problems, as well as other work that brought statistical physics techniques to computer science. Currently, Riccardo and his group are doing very exciting work on the theory of deep learning.

Though I knew of his work, I had never met Riccardo until I attended a 2017 workshop at the Santa Fe Institute on “Thermodynamics and computation,” an invitation that I had accepted on whim, mostly based on the fact that I had never been to New Mexico and I had really liked Breaking Bad. Riccardo had just moved to Bocconi, he told me about their future plans, and he asked me if I was interested. I initially politely declined, but one thing led to another, and now here I am putting up my San Francisco house for sale.

Last August, as I was considering this move, I applied for an ERC grant from the European Union, and I just learned that the grant has been approved. This grant is approximately the same amount as the total of all the grants that I have received from the NSF over the past twenty years, and it will support several postdoc positions, as well as visitors ranging from people coming for a week to give a talk and meet with my group to a full-year sabbatical visit.

Although it’s a bit late for that, I am looking for postdocs starting as early as this September: if you are interested please contact me. The postdoc positions will pay a highly competitive salary, which will be free of Italian income tax (~~although American citizens will owe federal income tax to the IRS~~ correction: American citizens would not owe anything to IRS either). As a person from Rome, I am not allowed to say good things about Milan or else I will have to return my Roman card (it’s kind of a NY versus LA thing), but I think that the allure of the city speaks for itself.

Likewise, if you are a senior researcher, and you have always wanted to visit me and work together on spectral methods, approximation algorithms, graph theory or graph algorithms, but you felt that Berkeley had insufficiently many Leonardo mural paintings and opera houses, and that it was too far from the Alps, then now you are in luck!

]]>

Like public-key cryptography, deep learning was ahead of its time when first studied, but, thanks to the pioneering efforts of its founders, it was ready to be used when the technology caught up.

Mathematical developments take a long time to mature, so it is essential that applied mathematical research be done ahead of the time of its application, that is, at a time when it is basic research. Maybe quantum computing will be the next example to teach this lesson.

By the way, this summer the Simons Institute will host a program on the foundations of deep learning, co-organized by Samy Bengio, Aleks Madry, Elchanan Mossel and Matus Telgarsky.

Sometimes, it is not just the practical applications of a mathematical advance that take time to develop: the same can be true even for its *theoretical* applications! Which brings me to the next announcement of this post, namely that the call for nominations for the FOCS test of time award is out. Nominations are due in about four weeks.

新年快乐！

]]>More on Taiwan in a later post. Today I would like to give a couple of updates on the survey paper on average-case complexity theory that Andrej Bogdanov and I wrote in 2006: apologies to the readers for a number of missing references, and news on the front of worst-case to average-case reductions.

**1. Addendum and missing references **

In Section 2 and 3, we discuss Levin’s proof of the analog of NP-completeness for the theory of average-case complexity. Levin proves that there is a decision problem in NP, let’s call it , such that if there is an “average polynomial time” algorithm for with respect to the uniform distribution, then for every decision problem in NP and for every distribution of instances that is “polynomial time computable” in a certain technical sense, then also admits an “average polynomial time” algorithm with respect to the distribution .

To prove such a result, we want to take an instance sampled from and transform it into one (or more) instance , such that if we feed into an “average polynomial time” algorithm for problem then

- the algorithm will run also in “average polynomial time” and
- from the answer of the algorithm we can decide whether or not.

This is different from the Cook-Levin theorem in that we need the instance produced by the reduction to be approximately uniformly distributed, or else we are not guaranteed that the algorithm for that runs in “average polynomial time” with respect to the uniform distribution will also run in “average polynomial time” in whatever is the distribution of .

The key idea in Levin’s proof is to use compression: for the class of “polynomial time computable” distributions that he considers, there exist essentially optimal compression algorithms, such that if is sampled from and then mapped to a compressed (but efficiently invertible) representation , then is “uniformly distributed.” Then one has to map to in a way that does not mess up the distribution too much. (Above, “uniformly distributed” is in quotes because will be a string of variable length, so the sense in which it is approximately uniformly distributed has to be defined carefully.)

In order to turn this intuition into a proof, one has to give a precise definition of “average polynomial time” and of reductions between distributional problems, and prove that reductions preserve the existence of average polynomial time algorithms. These matters are quite subtle, because the first definitions that come to mind don’t quite work well, and the definitions that do work well are a bit complicated.

The paper with Andrej lacks references to several important works that clarified, simplified, and generalized Levin’s original presentation. In particular, our definition of average polynomial time, and the proof that it is preserved by reductions, is due to Russell Impagliazzo, in his famous “five worlds” paper. The presentation of the completeness result follows notes by Oded Goldreich. As in Oded’s notes, we present the completeness of a Bounded Halting problem, which we erroneously attribute to Levin, while it is due to Gurevich. After one proves the existence of one complete problem, one would like to see a series of reductions to natural problems, establishing more completeness results, that is, one would like the average-case complexity analog of Karp’s paper! This has not happened yet, although there have been a number of important average-case completeness results, which we also fail to cite.

We have submitted to the publisher an addendum to correct all the above omissions. We have already privately apologized to the authors of the above works, but we would also like to publicly apologize to the readers of the survey.

**2. Worst-case to average-case reductions **

In Section 7, Andrej and I discuss the problem of constructing “worst-case to average-case reductions” for NP-complete problems. Ideally, we would like to “collapse” Levin’s theory to the standard theory of NP-completeness, and to prove that if , or something morally equivalent like , then Levin’s complete problem does not admit an average polynomial time algorithm with respect to the uniform distribution (same for other complete distributional problems). More ambitiously, one may hope to base symmetric-key cryptography, or even public-key cryptography, on the assumption that , which, short of settling the versus question, would be the soundest possible basis on which to base cryptography. (One would also get high confidence that the resulting cryptosystems are quantum-resistant!)

Indeed, there are problems, like the Permanent and (in a limited sense) the Discrete Log problem in which one can take a worst-case instance , and map into to a sequence of instances , each of which is uniformly distributed and such that solving the problem on the gives a solution for . Thus, an average polynomial time algorithm for such problems, which are called random self-reducible, gives a worst-case (randomized) polynomial time algorithm.

Feigenbaum and Fortnow, unfortunately, prove that if a problem in NP is random self reducible (or even randomly reducible, in a similar sense, to another problem in NP), then it is, essentially, in coNP, and so such techniques cannot apply to NP-complete problem.

In 2003, Andrej Bogdanov and I generalized this to show that if one has any non-adaptive reduction from a problem to a distributional problem where is in and is samplable, then is in and thus unlikely to be NP-complete. Our definition of reduction allows any non-adaptive procedure that, given an oracle that solves in average polynomial time with respect to , is able to solve in worst-case polynomial time.

It remains open whether such a result holds for adaptive reductions, and my guess is that it does. I actually had a much stronger guess, namely that for every NP problem and every samplable distribution , solving well on average with respect to is doable with a computational power in the ballpark of , perhaps in randomized polynomial time with oracle access to a language in . If so there would be a fundamental gap between the complexity of solving NP-complete problems in the worst case versus the average case.

A recent, very exciting paper by Shuichi Hirahara, shows that my guess was wrong, and that there are distributional problems in NP with respect to samplable distributions, whose complexity is fundamentally higher than .

Hirahara considers an NP problem called MINKT, which is the problem of computing a resource-bounded version of Kolmogorov complexity. He shows that if MINKT has an average polynomial time algorithm, then it has a ZPP worst-case algorithm (the latter is an approximation algorithm with additive error). Furthermore, under reasonable assumptions, MINKT is *not* in coNP, or coNP/poly, or in any morally equivalent class.

So did Hirahara come up with an adaptive reduction, in order to get around the result of Andrej and myself? No, remarkably he comes up with an argument that *is not a reduction*! (More modestly, he says that he comes up with a *non-black-box* reduction.)

To clarify terminology, I prefer to call a *reduction* (what most other people prefer to call a “black box” reduction, in this setting) a way of showing that, given an oracle for a problem we show how to solve problem ; hence an algorithm for implies an algorithm for .

This seems to be the only possible way of showing that if we have an algorithm for then we have an algorithm for , but consider the following example. Say that is a complete problem for , the second level of the polynomial time hierarchy; we know that if there is a polynomial time algorithm for 3SAT then there is an algorithm for , however the proof does not involve a (black box, if you must) reduction from to 3SAT. In fact, we do not know how to solve in polynomial time given an oracle for 3SAT and, assuming that the polynomial time hierarchy does not collapse, *there is no way to do that*!

So how do we prove that if 3SAT is in polynomial time then is in polynomial time? We take the “algorithm” for , which is an NP machine with oracle an NP machine, we replace calls to the oracle with executions of the 3SAT algorithm, and now we have a regular NP machine, which we can reduce to 3SAT. Note that, in the first step, we had to *use the code of the algorithm for 3SAT*, and use the assumption that it runs in polynomial time.

Hirahara’s argument, likewise, after assuming that MINKT has an average polynomial time algorithm, uses the code of the algorithm and the assumption about efficiency.

At this point, I do not have a good sense of the computational power that is necessary or sufficient to solve NP distributional problems well on average, but Hirahara’s paper makes me think it’s conceivable that one could base average-case hardness on , a possibility that I previously considered hopeless.

]]>

After Apple made history by becoming the first publicly traded company to be worth more than $1 Trillions in the US stock market, several news sources reported that, *adjusting for inflation,* the East India Dutch company was worth, at one point in the late 1600s or early 1700s, more than $7 Trillions. Here, for example, is a Time.com article putting its adjusted value at $8 Trillion.

There were only about 600 million people in the world at that point (source) mostly living off subsistence agriculture, and that would be more than $10,000 of today’s dollar per person! Maddison estimates the the world’s GDP at that time was about $90 Billion in 1990 dollars (source) or about $180 Billion in 2018 dollars (source for 1990-2018 adjustment) and the GDP of the Netherland at that time was about $4 Billion in 1990 dollars (source) or about $8 Billion in 2018 dollars. Could a company really be worth a thousand time the GDP of its country and 40 times the world’s GDP? Maddison’s estimates are controversial, but even Bairoch’s higher estimates put the combined GDP of North America, Europe, Russia and Japan at about $300 Billion in 2018 dollars (source).

So how much would the 1700s value of the Dutch East Indian Company be worth in 2018 dollars, and where did the crazy $7 Trillion come from? The answer to the first question is about $1 Billion, and the answer to the second question is not clear, but most likely someone named Alex Planes came up with that number with some creative methodology, and then the factoid made it to more and more prestigious publications, each one citing the previous occurrence, and this includes publications with high standards for fact-checking like the Atlantic.

Trying to answer the second question shows how completely broken fact-checking is whenever numbers are involved.

So let’s start with the second question: where does the $7-8 Trillion figure comes from?

The Time.com article cites a Motley Fool article from 2012, which says that the company was worth, at its peak, 78 million guilders, or $7.4 Trillion.

Business Insider also mentions the factoid in 2017, attributing it to Visual Capitalist, which attributes it (in the comments!) to Motley Fool. The Independent mentions it without attribution. Several other publications have credited either Business Insider or Visual Capitalist, thus ultimately referring back to Motley Fool. For example, earlier this year, Wikipedia had the claim, referring it to Digg, which referred it to Visual Capitalist.

The Atlantic mentions it in 2015, and it references a 2010 Bloomberg article that is behind a paywall but mirrored here, which says that a recently discovered share of the company could be worth more than $700,000 at auction, because it is only the fourth such to have been preserved.

The Atlantic link is bizarre but Motley Fool also cites Bloomberg (not a specific article, just “Bloomberg”) and this is the only article in Bloomberg appeared before 2012 that I could find mentioning the East India Dutch company. Furthermore, I could not find any reference to the factoid before the Motley Fool article. So my guess is that it originated in the Motley Fool article, and that it is related to the Bloomberg article. Perhaps the company had 10 million stocks, and Alex Planes, the author of the Motley Fool article, simply multiplied what the certificate would sell for today by the number of stocks? This is also the theory favored in this reddit thread.

What about fact-checking? The Atlantic and the Independent are the kind of publications that would freak out if someone’s last name is misspelled and issue a correction, and if someone’s name was spelled Rxsfgw in the original draft, the copy editor would probably get back to the author and tell them that that’s probably not a real name.

I would argue that the notion that a company could be worth $7 Trillion of today’s dollar in the 1600s or 1700s should look more unlikely than someone’s name being Rxsfgw, and inspire the googling that shows the number to be off by three or four orders of magnitude.

So, how much was the company worth? The Motley Fool article (and all follows-up) say that at the top it was worth 78 million guilder. I could not find any reference to this claim before August 2012, so this could have also been made up in the Motley Fool article, but, taking it for good, this currency converter puts one guilder in 1700 as being about $17 in 2018 dollars (it returns a value in 2015 dollars, which I further adjusted) so that would $1,3 Billion (the reddit thread linked above reaches similar conclusions using other sources, and the value a guilder is estimated as being between $9 and $20).

]]>