Forthcoming workshops in Bocconi

For the Italy-based readers and others that are able to make it to Milan on short notice, I would like to publicize that next week, in Bocconi, there will be a very exciting interdisciplinary workshop on average-case problems from several perspectives: (1) evidence for and against the existence of efficient algorithms in different regimes coming from statistical physics, (2) the evidence for and against the existence of efficient algorithms in different regimes coming from convex optimization techniques, (3) strong evidence of hardness coming from worst-case to averarge-case reductions, (4) applications to cryptography.

On May 31, we are going to hold a Theory Day with a very diverse range of talks by an excellent collection of invited speakers, which includes Cynthia Dwork.

Participation is free for both events, but registration is required.

Feige’s Conjecture and the Magic of Kikuchi Graphs

A question that I am very interested in is whether it is possible to study hypergraphs with techniques that are in the spirit of spectral graph theory.

It is generally possible to “flatten” the adjacency tensor of a hypergraph into a matrix, especially if the hypergraph is {k}-uniform with {k} even, and spectral properties of this matrix give information about the hypergraph, but usually a large amount of information is lost in this process, and the approach can only be applied to rather dense hypergraphs.

If we have a 4-uniform hypergraph with {n} vertices, for example, meaning a hypergraph in which each hyperedge is a set of four elements, its {n \times n \times n \times n} adjacency tensor can be flattened to a {n^2 \times n^2} matrix. Unless the hypergraph has a number of hyperedges that is at least quadratic in {n}, however, such a matrix is too sparse to provide any useful information via basic spectral techniques.

Recently, a number of results of a “spectral hypergraph theory” flavor have appeared that, instead, apply spectral graph theory techniques to the Kikuchi graph associated to a hypergraph, leading to very impressive applications such as new lower bounds for locally correctable codes.

In this post I would like to show a simple but rather magical use of this approach, that gives a proof of Feige’s conjecture concerning a “Moore bound for hypergraphs”.

In an undirected graph, the girth is the length of the shortest simple cycle, and in the previous post we told the story of trade-offs between density of the graph and girth, such as the Moore bound.

In a hypergraph, an interesting analog to the notion of girth is the size of the smallest even cover, where an even cover is a set of hyperedges such that every vertex belongs to an even number of hyperedges in the set. The reader should spend a minute to verify that if the hypergraph is a graph, this definition is indeed equivalent to the girth of the graph.

To see why this is a useful property, the hyperedges of a {k}-uniform hypergraph with vertex set V can be represented as vectors in {{\mathbb F}_2^V} in a standard way: a vector {x\in {\mathbb F}_2^V} represents the set {\{ v : x_v \neq 0 \}}. Under this representation, an even cover is a collection of hyperedges whose corresponding vectors have a linear dependency, so a {k}-uniform hypergraph with {n} vertices, {m} hyperedges and such that there is no even cover of size {\leq L} corresponds to a construction of {m} vectors in {{\mathbb F}_2^n}, each of Hamming weight {k}, such that any {L} of them are linearly independent. Having large collections of sparse vectors that don’t have small linear dependencies is useful in several applications.

It is easy to study the size of even covers in random hypergraphs, and a number of results about CSP refutations and SoS lower bounds rely on such calculations. Feige made the following worst-case conjecture:

Conjecture 1 (Moore Bound for Hypergraphs) If a {k}-uniform hypergraph has {n} vertices and {n\left( \frac{n}{r} \right) ^{\frac k2 -1}} hyperedges, then there must exist an even cover of size {\tilde O( r )}

Where the “tilde” hides a polylogn multiplicative factor. For {k=3}, for example, the conjecture asserts that a hypergraph with {n} vertices and {m} hyperedges must contain an even cover of size {\tilde O(n^3/m^2)}. For {k=4}, the bound is {\tilde O(n^2/m)}.

The Feige conjecture was recently proved by Guruswami, Kothari and Manohar using Kikuchi graphs and their associated matrices. In this post, we will see a simplified proof by Hsieh, Kothari and Mohanty (we will only see the even {k} case, which is much easier to analyze, and we will not prove the case of odd {k}, which is considerably more difficult).

Continue reading

The Moore Bound for Irregular Graphs

Guruswami, Kothari and Manohar recently solved a conjecture of Feige about even covers in hypergraphs, with beautiful techniques that have several applications. I would like to describe some of their ideas in a subsequent post or series of posts.

Feige’s conjecture is about an hypergraph analog of the question of how big can the girth of a graph be relative to its density, and I would like to start by sharing a “proof from the book” about the latter problem.

In an undirected graph, the girth is the length of the shortest simple cycle. If a graph has odd girth {g}, and we perform a BFS in it starting from any vertex, and we stop it after no more than {(g-1)/2} hops, then we are guaranteed to keep finding new vertices at every step of the exploration: if we had two paths of length at most {(g-1)/2} from the same source to the same destination we would actually have a cycle of length {g-1}. If the graph is {d}-regular, this means that, in this truncated BFS, we see at least

\displaystyle   n \geq 1 + d \cdot \sum_{i = 0}^{\frac {g-1}2 -1} (d-1)^i \ \ \ \ \ (1)

different vertices (because what we are seeing are the first {\frac {g-1}2} levels of a complete {d}-ary tree). There is a similar expression for even {g}.

This remarkable property of high-girth graphs to look “locally tree-like” is very interesting and it is useful in certain applications, but it clearly puts a trade-off between the number nodes and the number of edges in high-girth regular graphs. Obviously, the total number of vertices of the graph has to be at least the bound (1), meaning that if {n} is the number of vertices and {d} is the degree then the girth can be at most something like {2\log_{d-1} n}.

There are lots of questions about existence and constructions of high-girth graphs, and about the tightness of the above bound, but another really good question, raised by Bollobas, is what happens in irregular undirected graphs. Does a lower bound like (1), which is called the Moore bound, hold in irregular graphs, if we replace the regular degree {d} by the average degree {\bar d = 2|E|/|V|}? It took about 30 years for this question to be positively resolved by Alon, Hoory and Linial.

A simplified proof was then found by Babu and Radhakrishnan, with an information-theoretic approach. I will try to explain how one might have conceived and developed the latter proof. I want to thank Lucas Pesenti for explaining the proof to me.

Continue reading

Introducing Bocconi’s new M.Sc. in Artificial Intelligence

This September, Bocconi will start a new M.Sc. in Artificial Intelligence. It will be a two-year computer science degree meant for students with Bachelor degrees in computer science, engineering, math, statistics, physics and related quantitative fields.

In the first year, courses on algorithms, mathematical methods, optimization, information theory, and software engineering will build a foundation in math and CS, then courses on deep learning, reinforcement learning, natural language processing and computer vision and image processing will go in depth on machine learning and some of its applications. In the second year there are various options and elective courses, with the possibility to study, for example, cryptography and blockchains, or bio-medical applications. As common for the second year of Bocconi’s M.Sc. degrees, there will be options for exchange programs to spend a semester abroad. Students also take a seminar on ethics in AI, a project-oriented AI lab, and a foreign language (not English and not the student’s native language) course. The language of instruction is English.

Tomorrow at 5pm CET there will be an online information session: those interested can sign up here.

More information about the degree are at www.unibocconi.eu/ai-msc.

Applications open today and are due by May 25th.

Workshop on Fairness in AI

Next Monday, June 27, I am organizing a workshop on issues around fairness, bias and discrimination in AI and Machine Learning.

Here is a link to the program. Remote participation is possible (link in the website), and in-person participation is free but we ask people to register so we can print badges and order the appropriate number of coffee breaks.

This workshop is being organized in partnership with EDGE, an Italian NGO that works on LGBT rights, and it is the first event of their initiative “A+I: Algoritmi + Inclusivi”, which will feature an awareness campaign and a series of video interviews that will start after the summer.

In next week’s workshop, Oreste Pollicino from Bocconi will talk about the perspective of the legal community around algorithmic discrimination, Symeon Papadopoulos from ITI Patras will give a survey on issues of fairness in image processing and image understanding, Sanghamitra Dutta from J.P. Morgan AI will talk about how to use the theory of causality to reason about fairness, Debora Nozza and Dirk Hovy from Bocconi will talk about issues of fairness in language models and natural language processing, and Omer Reingold from Stanford and Cynthia Dwork from Harvard will talk about modeling and achieving fairness in prediction models.

The last morning session will be a panel discussion moderated by Damiano Terziotti from EDGE about perspectives from the social sciences and from outside academia. It will feature, among others, Brando Benifei, a member of the EU parliament who has played a leading role in the 2021 draft EU regulations on AI. The other panel members are Alessandro Bonaita, who is a data science lead in Generali (Italy’s largest insurance company), Luisella Giani, who is leading a technology consulting branch of Oracle for Europe, Middle East and Africa, Cinzia Maiolini, who is in the national secretariat of CGIL, an Italian Union, and Massimo Airoldi from the University of Milan.

If you are in or near Milan next week, come to what is shaping up to be a memorable event!

Workshop in Milan Next Week

As previously announced, next week Alon Rosen and I are organizing a workshop at Bocconi, which will actually be the union of two workshops, one on Recent Advances in Cryptography and one on Spectral and Convex Optimization Techniques in Graph Algorithms. Here is the program. In short:

  • where: Bocconi University’s Roentgen Building (via Roentgen 1, Milano), Room AS01
  • when: June 15-18
  • what: talks on cryptography and graph algorithms, including two hours devoted to Max Flow in nearly-linear time
  • how: register for free

The First XL Computer Scientist

Some time ago, I received a message to the effect that I was being considered for membership in the “Academy of the XL”, to which my reaction was, hey, we have all gone out of shape during the pandemic, and body-shaming is never… then it was explained to me that, in this context, “XL” means “forty” and that the Academy of the Forty is Italy’s National Academy of Science.

Italy has a wonderfully named, and well-known within the country, National Academy of Arts and Science, the Accademia dei Lincei, which means something like academy of the “eagle-eyed” (literally, lynx-eyed), that is, people that can see far. The Accademia dei XL is much less well known, although it has a distinguished 240-year history, during which people like Guglielmo Marconi and Enrico Fermi were members. More recently, the much beloved Rita Levi-Montalcini, Holocaust survivor, Nobel Laureate, and Senator-for-life, was a member. Current members include Nobel Laureates Carlo Rubbia and Giorgio Parisi. Noted algebraist Corrado De Concini is the current president.

Be that as it may, the academicians did vote to make me a member, their first computer scientist ever. Next week, at the inauguration of their 240th academic year, I will speak to the other members about randomness and pseudorandomness in computation.

This Year, for Lent, Bocconi Gave Up Not Having a CS Department

Yesterday, Bocconi’s Rector signed the decree that created the new Computing Sciences department. This is only the ninth department to be created in our 120 year old university, and the first, I believe, in a couple of decades. It is the first department with an engineering and science mission (the other eight department are, in random order, Accounting, Marketing, Finance, Economics, Managements, Social Sciences, Law, and Decision Sciences).

A few weeks ago, we were joined by Francesca Buffa and Marc Mezard.

Francesca, a computational biologist formerly at Oxford medical school, is now the fourth out of four computer science tenured faculty in our new department to have an active ERC grant.

Marc’s work has spanned theoretical physics, information theory and computation, including his collaboration with Giorgio Parisi’s Nobel Prize winning work, and he has been most recently the president of the Ecole National Superieure in Paris. When we asked for letters for his tenure case, one of the reviewers wrote, more or less in so many words, “you would be lucky to have Marc in your university, though it’s very unlikely that he will accept your offer”. At that point Marc had already accepted.

Hasselmann, Manabe and Parisi win 2021 Physics Nobel Prize

itscomingrome-lorenza-parisi

Today the Italian academic community, along with lots of other people, was delighted to hear that Giorgio Parisi is one of the three recipients of the 2021 Nobel Prize for Physics.

Parisi has been a giant in the area of understanding “complex” and “disordered” systems. Perhaps, his most influential contribution has been his “replica method” for the analysis of the Sherrington-Kirkpatrick model. His ideas have led to several breakthroughs in statistical physics by Parisi and his collaborators, and they have also found applications in computer science: to tight analyses on a number of questions about combinatorial optimization on random graphs, to results on random constraint satisfaction problems (including the famous connection with random k-SAT analyzed by Mezard, Parisi and Zecchina) and random error correcting codes, and to understanding the solution landscape in optimization problems arising from machine learning. Furthermore these ideas have also led to the development and analysis of algorithms.

The news was particularly well received at Bocconi, where most of the faculty of the future CS department has done work that involved the replica method. (Not to be left out, even I have recently used replica methods.)

Mezard and Montanari have written a book-length treatment on the interplay between ideas from statistical physics, algorithms, optimization, information theory and coding theory that arise from this tradition. Readers of in theory looking for a shorter exposition aimed at theoretical computer scientists will enjoy these notes posted by Boaz Barak, or this even shorter post by Boaz.

In this post, I will try to give a sense to the reader of what the replica method for the Sherrington-Kirkpatrick model looks like when applied to the average-case analysis of optimization problems, stripped of all the physics. Of course, without the physics, nothing makes any sense, and the interested reader should look at Boaz’s posts (and to references that he provides) for an introduction to the context. I did not have time to check too carefully what I wrote, so be aware that several details could be wrong.

What is the typical value of the max cut in a {G_{n,\frac 12}} random graph with {n} vertices?

Working out an upper bound using union bounds and Chernoff bound, and a lower bound by thinking about a greedy algorithm, we can quickly convince ourselves that the answer is {\frac {n^2}{4} + \Theta(n^{1.5})}. Great, but what is the constant in front of the {n^{1.5}}? This question is answered by the Parisi formula, though this fact was not rigorously established by Parisi. (Guerra proved that the formula gives an upper bound, Talagrand proved that it gives a tight bound.)

Some manipulations can reduce the question above to the following question: suppose that I pick a random {n\times n} symmetric matrix {M}, say with zero diagonal, and such that (up to the symmetry requirement) the entries are mutually independent and each entry is equally likely to be {+1} or {-1}, or perhaps each entry is distributed according to a standard normal distribution (the two versions can be proved to be equivalent), what is the typical value of

\displaystyle  \max _{x \in \{+1,1\}^n } \ \ \frac 1{n^{1.5}} x^T M x

up to {o_n(1)} additive terms,?

As a first step, we could replace the maximum with a “soft-max,” and note that, for every choice of {\beta>0}, we have

\displaystyle  \max _{x \in \{+1,1\}^n } \ \ x^T M x \leq \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}

The above upper bound gets tighter and tighter for larger {\beta}, so if we were able to estimate

\displaystyle  \mathop{\mathbb E} \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}

for every {\beta} (where the expectation is over the randomness of {M}) then we would be in good shape.

We could definitely use convexity and write

\displaystyle  \mathop{\mathbb E} \max _{x \in \{+1,1\}^n } \ \ x^T M x \leq \frac 1 \beta \mathop{\mathbb E} \log \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx} \leq \frac 1 \beta \log \mathop{\mathbb E} \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}

and then use linearity of expectation and independence of the entries of {M} to get to

\displaystyle  \leq \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j }

Now things simplify quite a bit, because for all {i<j} the expression {M_{i,j} x_i x_j}, in the Rademacher setting, is equally likely to be {+1} or {-1}, so that, for {\beta = o(1)}, we have

\displaystyle  \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } = cosh (2\beta) \leq 1 + O(\beta^2) \leq e^{O(\beta^2)}

and

\displaystyle  \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } \leq 2^n \cdot e^{O(\beta^2 n^2)}

so that

\displaystyle  \frac 1 \beta \log \sum_{x \in \{+1,1\}^n } \prod_{1\leq i < j\leq n} \mathop{\mathbb E} e^{2\beta M_{i,j} x_i x_j } \leq \frac {O(n)}{\beta} + O(\beta n^2)

which, choosing {\beta = 1/\sqrt n}, gives an {O(n^{1.5})} upper bound which is in the right ballpark. Note that this is exactly the same calculations coming out of a Chernoff bound and union bound. If we optimize the choice of {\beta} we unfortunately do not get the right constant in front of {n^{1.5}}.

So, if we call

\displaystyle  F := \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx}

we see that we lose too much if we do the step

\displaystyle  \mathop{\mathbb E} \log F \leq \log \mathop{\mathbb E} F

But what else can we do to get rid of the logarithm and to reduce to an expression in which we take expectations of products of independent quantities (if we are not able to exploit the assumption that {M} has mutually independent entries, we will not be able to make progress)?

The idea is that if {k>0} is a small enough quantity (something much smaller than {1/\log F}), then {F^k} is close to 1 and we have the approximation

\displaystyle  \log F^k \approx F^k-1

and we obviously have

\displaystyle  \log F^k = k \log F

so we can use the approximation

\displaystyle  \log F \approx \frac 1k (F^k - 1)

and

\displaystyle  \mathop{\mathbb E} \log F \approx \frac 1k (\mathop{\mathbb E} F^k - 1)

Let’s forget for a moment that we want {k} to be a very small parameter. If {k} was an integer, we would have

\displaystyle  \mathop{\mathbb E} F^k = \mathop{\mathbb E} \left( \sum_{x \in \{+1,1\}^n } e^{\beta x^T Mx} \right)^k = \sum_{x^{(1)},\ldots x^{(k)} \in \{+1,-1\}^n} \mathop{\mathbb E} e^{\beta \cdot ( x^{(1) T} M x^{(1)} + \cdots + x^{(k)T} M x^{(k)}) }

\displaystyle  = \sum_{x^{(1)},\ldots x^{(k)} \in \{+1,-1\}^n} \ \ \prod_{i< j} \ \mathop{\mathbb E} e^{2\beta M_{i,j} \cdot ( x^{(1)}_i x^{(1)}_j + \cdots + x^{(k)}_i x^{(k)}_j )}

Note that the above expression involves choices of {k}-tuples of feasible solutions of our maximization problem. These are the “replicas” in “replica method.”

The above expression does not look too bad, and note how we were fully able to use the independence assumption and “simplify” the expression. Unfortunately, it is actually still very bad. In this case it is preferable to assume the {M_{i,j}} to be Gaussian, write the expectation as an integral, do a change of variable and some tricks so that we reduce to computing the maximum of a certain function, let’s call it {G(z)}, where the input {z} is a {k \times k} matrix, and then we have to guess what is an input of maximum value for this function. If we are lucky, the maximum is equivalent by a {z} in which all entries are identical, the replica symmetric solution. In the Sherrington-Kirkpatrick model we don’t have such luck, and the next guess is that the optimal {z} is a block-diagonal matrix, or a replica symmetry-breaking solution. For large {k}, and large number of blocks, we can approximate the choice of such matrices by writing down a system of differential equations, the Parisi equations, and we are going to assume that such equations do indeed describe an optimal {z} and so a solution to the integral, and so they give as a computation of {(\mathop{\mathbb E} F^k - 1)/k}.

After all this, we get an expression for {(\mathop{\mathbb E} F^k - 1)/k} for every sufficiently large integer {k}, as a function of {k} up to lower order errors. What next? Remember how we wanted {k} to be a tiny real number and not a sufficiently large integer? Well, we take the expression, we forget about the error terms, and we set {k=0\ldots}