And what has the theory of computing done for us in the last twenty years?
Differential privacy? Apple just announced it will be used in iOS 10
Yes, and the application to preventing false discovery and overfitting is now used in production.
Ok, fine, but apart from differential privacy, what has theory done for us in the last twenty years?
Quantum algorithms? There wouldn’t be such a push to realize quantum computers if it wasn’t for Shor’s algorithm.
And quantum error correcting! There would be no hope of realizing quantum computers without quantum error correction
Very well, but apart from differential privacy and quantum computing, what has theory done for us in the …
Streaming algorithms? It all started with a theory paper and now it is a major interdisciplinary effort.
Yes, fair enough, but apart from differential privacy, quantum computing, and streaming algorithms, what has theory done for us…
Linear time decodable LDPC error-correcting codes? The first generation was not practical, but now they are part of major standards
Sure, ok, but apart from differential privacy, quantum computing, streaming algorithms, and error-correcting codes, what has theory…
Homomorphic encryption? The first-generation solutions were inefficient, but it might be only a matter of time before we have usable homomorphic encryption standards.
Linear-time SDD solvers? Algorithms like this and this are implementable and we may be one more idea away from algorithms that can be put in production.
Sublinear time algorithms like sparse FFT?
All right! But apart from differential privacy, quantum computing, streaming algorithms, error-correcting codes, homomorphic encryption, linear-time equation solvers and sub-linear time algorithms, what has the theory of computing ever done for us in the past twenty years?
. . .
[Could be continued. Indeed, please continue in the comments]
It’s like if you were on a plane and you wanted to choose a pilot. You have one person, Hillary, who says, “Here’s my license. Here’s all the thousands of flights that I’ve flown. Here’s planes I’ve flown in really difficult situations. I’ve had some good flights and some bad flights, but I’ve been flying for a very long time, and I know exactly how this plane works.” Then you’ve got Bernie, who says, “Everyone should get a ride right to their house with this plane.” “Well, how are you going to do that?” “I just think we should. It’s only fair that everyone gets to use the plane equally.” And then Trump says, “I’m going to fly so well. You’re not going to believe how good I’m going to fly this plane, and by the way, Hillary never flew a plane in her life.” “She did, and we have pictures.” “No, she never did it.”
Bernie Sanders for President of the United States
Kamala Harris for United States Senator
Nancy Pelosi for United States Representative
Scott Weiner for California State Senator
David Chiu for California State Assemblyman
Victor Hwang for Superior Court Judge
Perhaps the right starting point for this story is 1936, when Erdos and Turan conjectured that, for every , if is a subset of without -terms arithmetic progressions, then , or, equivalently, that if is a subset of the integers of positive density, then it must have arbitrarily long arithmetic progressions. Their goal in stating this conjecture was that resolving it would be a stepping stone to proving that the prime numbers have arbitrarily long arithmetic progressions. This vision came true several decades later. Szemeredi proved the conjecture in 1975, and Green and Tao proved that the primes contain arbitrarily long arithmetic progressions in 2004, with Szemeredi’s theorem being a key ingredient in their proof.
Rewinding a bit, the first progress on the Erdos-Turan conjecture came from Roth, who proved the case In 1955. Roth’s proof establishes that if does not have length-3 arithmetic progressions, then is at most, roughly . Erdos also conjectured that the bound should be , and if this were true it would imply that the primes have infinitely many length-3 arithmetic progressions simply because of their density.
Roth’s proof uses Fourier analysis, and Meshulam, in 1995, noted that the proof becomes much cleaner, and it leads to better bounds, if one looks at the analog problem in , where is a finite field (of characteristic different from 2). In this case, the question is how big can be if it does not have three points on a line. An adaptation of Roth’s techniques gives an upper bound of the order of , which, for constant , is of the order of if is the size of the universe of which is a subset.
Bourgain introduced a technique to work on “as if” it where a vector space over a finite field, and proved upper bounds of the order of and then to the size of a subset of without length-3 arithmetic progressions. The latest result in this line is by Sanders, who proved a bound of , very close to Erdos’s stronger conjecture.
How far can these results be pushed? A construction of Behrend’s shows that there is a set with no length-3 arithmetic progression and size roughly . The construction is simple (it is a discretization of a sphere in dimensions) and it has some unexpected other applications. This means that the right bound in Roth’s theorem is of the form and that the “only” question is what is the term.
In the finite vector space case, there is no analog of Behrend’s construction, and so the size of say, the largest subset of without three points on a line, was completely open, with an upper bound of the order of and lower bounds of the order of for some constant . The cap problem was the question of whether the right bound is of the form or not.
Two weeks ago, Croot, Lev and Pach proved that if is a subset of without length-3 arithmetic progressions, then is at most of the order of . This was a strong indication that the right bound in the cap problem should be sub-exponential.
This was done a couple of days ago by Ellenberg, who proved an upper bound of the form holds in . The proof is not specific to and generalizes to all finite fields.
Both proofs use the polynomial method. Roughly speaking, the method is to associate a polynomial to a set of interest (for example, by finding a non-zero low-degree polynomial that is zero for all points in the set), and then to proceed with the use of simple properties of polynomials (such as the fact that the space of polynomials of a certain degree has a bounded dimension, or that the set of zeroes of a univariate non-zero polynomial is at most the degree) applied either to the polynomial that we constructed or to the terms of its factorization.
Let be the vector space of -variate polynomials over of total degree that are cube-free (that is, such that all variables occur in monomials with degree 0, 1, or 2), and let be its dimension.
If is a set such that there are no distinct such that (a different property from being on a line, but the draft claims that the same argument works for the property of not having three points on a line as well), then Ellenberg shows that
then the bound follows from computing that and for .
The finite field Kakeya problem is another example of a problem that had resisted attacks from powerful Fourier-analytic proofs, and was solved by Zeev Dvir with a relatively simple application of the polynomial method. One may hope that the method has not yet exhausted its applicability.
Gil Kalai has posted about further consequence of the results of Croot, Lev, Pach and Ellenberg.
]]>
J.Z.: In China, we say that if you sneeze once, it means that someone is thinking of you. If you sneeze twice, it means someone is cursing you.
Me: and what does it mean when I sneeze three times or more?
J.Z.: it means you have a cold.
It would make sense if, to mitigate his negatives, Trump chose a person of color and someone who has a history of speaking out against income inequality.
He or she would have to be someone who is media-savvy and with some experience running a campaign, but definitely not a career politician. And of course he or she should be someone who endorsed Trump early on, like, say, in January.
I can think of only one person: Jimmy McMillan!
In which we prove properties of expander graphs.
1. Quasirandomness of Expander Graphs
Recall that if is a -regular graph, and is its adjacency matrix, then, if we call the eigenvalues of with repetitions, we are interested in the parameter , and we have
where is the matrix with a one in each entry, and is the matrix norm .
Our fist result today is to show that, when is small, the graph has the following quasirandomness property: for every two disjoint sets , the number of edges between and is close to what we would expect in a random graph of average degree , that is, approximately .
For two (possibly overlapping) sets of vertices , we define to be the number of edges with one endpoint in and one endpoint in , with edges having both endpoints in , if any, counted twice.
Lemma 1 (Expander Mixing Lemma) Let be a -regular graph, and let and be two disjoint subsets of vertices. Then
Proof: We have
and
so
Note that, for every disjoint , we have , and so the right-hand side in the expander mixing lemma is at most , which is a small fraction of the total number of edges if is small compared to .
2. Random Walks in Expanders
A -step random walk is the probabilistic process in which we start at a vertex, then we pick uniformly at random one of the edges incident on the vertices and we move to the other endpoint of the edge, and then repeat this process times.
If is the normalized adjacency matrix of an undirected regular graph , then is the probability that, in one step, a random walk started at reaches . This is why the normalized adjacency matrix of a regular graph is also called its transition matrix.
Suppose that we start a random walk at a vertex chosen according to a probability distribution , which we think of as a vector such that for every and . After taking one step, the probability of being at vertex is , which means that the probability distribution after one step is described by the vector , and because of the symmetric of , this is the same as .
Iterating the above reasoning, we see that, after a -step random walk whose initial vertex is chosen according to distribution , the last vertex reached by the walk is distributed according to .
The parameter of is equal to , and so if has a parameter bounded away from , and if is large enough, we have that the parameter of is very small, and so is close to in matrix norm. If was actually equal to , then would be equal to the uniform distribution, for every distribution . We would thus expect to be close to the uniform distribution for large enough .
Before formalizing the above intuition, we need to fix a good measure of distance for distributions. If we think of distributions as vectors, then a possible notion of distance between two distributions is the Euclidean distance between the corresponding vectors. This definition, however, has various shortcoming and, in particular, can assign small distance to distributions that are intuitively very different. For example, suppose that and are distributions that are uniform over a set , and over the complement of , respectively, where is a set of size . Then all the entries of are and so , which is vanishingly small even though distributions over disjoint supports should be considered as maximally different distributions.
A very good measure is the total variation distance, defined as
that is, as the maximum over all events of the difference between the probability of the event happening with respect to one distribution and the probability of it happening with respect to the other distribution. This measure is usually called statistical distance in computer science. It is easy to check that the total variation distance between and is precisely . Distributions with disjoint support have total variation distance 1, which is largest possible.
Lemma 2 (Mixing Time of Random Walks in Expanders) Let be a regular graph, and be its normalized adjacency matrix. Then for every distribution over the vertices and every , we have
where is the uniform distribution.
In particular, if , then , where is an absolute constant.
Proof: Let be the normalized adjacency matrix of a clique with self-loops. Then, for every distribution , we have . Recall also that .
We have
The last result that we discussed today is one more instantiation of the general phenomenon that “if is small then a result that is true for the clique is true, within some approximation, for .”
Suppose that we take a -step random walk in a regular graph starting from a uniformly distributed initial vertex. If is a clique with self-loops, then the sequence of vertices encountered in the random walk is a sequence of independent, uniformly distributed, vertices. In particular, if is a bounded function, the Chernoff-Hoeffding bounds tell us that the empirical average of over the points of the random walk is very close to the true average of , except with very small probability, that is, if we denote by the set of vertices encountered in the random walk, we have
where . A corresponding Chernoff-Hoeffding bound can be proved for the case in which the random walk is taken over a regular graph such that is small.
Lemma 3 (Chernoff-Hoeffding Bound for Random Walks in Expanders) Let be a regular graph, and the distribution of -tuples constructed by sampling independently, and then performing a -step random walk starting at . Let be any bounded function. Then
We will not prove the above result, but we briefly discuss one of its many applications.
Suppose that we have a polynomial-time probabilistic algorithm that, on inputs of length , uses random bits and then outputs the correct answer with probability, say, at least . One standard way to reduce the error probability is to run the algorithm times, using independent randomness each time, and then take the answer that comes out a majority of the times. (This is for problems in which we want to compute a function exactly; in combinatorial optimization we would run the algorithm times and take the best solutions, and in an application in which the algorithm performs an approximate function evaluation we would run the algorithm times and take the median. The reasoning that follows for the case of exact function computation can be applied to the other settings as well.)
On average, the number of iterations of the algorithms that give a correct answer is , and the cases in which the majority is erroneous correspond to cases in which the number of iterations giving a correct answer is . This means that the case in which the modified algorithm makes a mistake correspond to the case in which the empirical average of independent 0/1 random variables deviates from its expectation by more than , which can happen with probability at most , which becomes vanishingly small for large .
This approach uses random bits. Suppose, instead, that we consider the following algorithm: pick random strings for the algorithm by performing a -step random walk in an expander graph of degree with vertices and such that , and then take the majority answer. A calculation using the Chernoff bound for expander graphs show that the error probability is , and it is achieved using only random bits instead of .
]]>
In which we present a probabilistic construction of expanders.
We have seen a combinatorial construction of expanders, based on the zig-zag graph product, and an algebraic one. Today we see how to use the probabilistic method to show that random graphs, selected from an appropriate probability distribution, are expanders with high probability.
A non-trivial part of today’s lecture is the choice of distribution. For us, a family of expanders is a family of regular graphs of fixed degree, but if we pick a graph at random according to the Erd\” os-Renyi distribution, selecting each pair to be an edge independently with probability , then we do not get a regular graph and, indeed, not even a bounded-degree graph. (Even when , ensuring constant average degree, the maximum degree is of the order of .)
This means that we have to study distributions of graphs in which there are correlations between edges, which are often difficult to reason about.
We could study the expansion of random -regular graphs, but that is a particularly challenging distribution of graphs to analyze. Instead, the following distributions over -regular graphs are usually considered over a vertex set :
The first method is applicable when is even, and the second method is applicable when is even. (When and are both odd, it is not possible to have an -vertex -regular graph, because the number of edges in such a graph is .)
We will study the expansion of graphs generated according to the first distribution, and show that there exists an integer and a , such that a random regular graph on vertices has probability at least of having edge expansion at least .
In particular, we will show that for , the probability that a random -regular graph has expansion is at least . Our bounds will be very loose, and much tighter analyses are possible.
We have to show that, with high probability over the choice of the graph, every set of vertices with has at least edges leaving it.
The common approach to prove that a random object satisfies a certain collection of properties is to prove that each property holds with high probability, and then to use a union bound to show that all properties are simultaneously true with high probability. For a -regular graph to have expansion , we want every set of size to have at least outgoing edges; a naive approach would be to show that such a property holds for every fixed set except with probability at most , and then take a union bound over all sets of size .
Unfortunately the naive approach does not work, because the probability that small sets fail to expand is much higher than . For example, the probability that a fixed set of nodes form a clique is at least of the order of . Fortunately, the number of small sets is small, and if the probability of a fixed set of size being non-expanding is, say, at most , then, by taking a union bound over all sets of size , the probability that there is a non-expanding set of size is at most , and then by taking a union bound over all sizes we get that the probability that there is a non-expanding set is at most inverse polynomial in .
Let denote the set of nodes that have at least a neighbor in . If , then there are most edges leaving from . In order to upper bound the probability that there are edges leaving , we will upper bound the probability that .
It will be convenient to have the following model in mind for how a random perfect matchings is chosen. Let be an arbirtary ordering of the vertices such that , then the following algorithm samples a random perfect matching over :
It is easy to see that the above algorithm has possible outputs, each equally likely, each distinct, and that is also the number of perfect matchings over a set of vertices, so that the algorithm indeed samples a uniformly distributed perfect matching.
Now, fix a set of size and a set of size . The probability that, in a random matching, the vertices of are all matched to vertices in is at most the probability that, during the first executions of the “while” loop, the randomly selected vertex is in .
For , conditioned on the first iterations picking a vertex , the probability that this happens on the -th iteration is the number of unmatched vertices in which is , divided by the total number of unmatched vertices in , which is .
Thus, the probability that, in a random matching, all vertices of are matched to vertices in is at most
when we pick as the union of random matchings, the probability that all the neighbors of are in is at most the above bound raised to the power of :
and taking a union bound over all choices of of size and all choices of of size , we have
Now, for , taking a union bound over all (in a graph without self-loops, every singleton set is expanding), we have
]]>
In which we present an algebraic construction of expanders.
1. The Marguli-Gabber-Galil Expanders
We present a construction of expander graphs due to Margulis, which was the first explicit construction of expanders, and its analysis due to Gabber and Galil. The analysis presented here includes later simplifications, and it follows an exposition of James Lee.
For every , we construct graphs with vertices, and we think of the vertex set as , the group of pairs from where the group operation is coordinate-wise addition modulo .
Define the functions and , where all operations are modulo . Then the graph has vertex set and the vertex is connected to the vertices
so that is an 8-regular graph. (The graph has parallel edges and self-loops.)
We will prove that there is a constant such that for every .
The analysis will be in four steps, and it will refer to certain infinite “graphs.”
We define an infinite family of graphs , such that the vertex set of is , that is, every vertex of is a pair , where and , and we think of and as elements of the group in which we do addition modulo . Every vertex of is connected to the vertices
and is 4-regular. For each of these graphs, we will define a “spectral gap” ; we put “spectral gap” in quotes because, although it is actually the second smallest eigenvalue of a Laplacian operator, we will define it purely formally as the minimum of a certain optimization problem.
We will also define the graph , whose vertex set is , and such that each vertex is connected to
so that is also -regular. We will define a “spectral gap” , again purely formally as the infimum of a certain expression, although it is the infimum of the spectrum of a certain Laplacian operator. We will also define an “edge expansion” of .
The proof of the expansion of will proceed by establishing the following four facts:
The first step will be a discretization argument, showing that a test vector of small Rayleigh quotient for can be turned into a test function of small Rayleigh quotient for . The second step is the most interesting and unexpected part of the proof; we will not spoil the surprise of how it works. The third step is proved the same way as Cheeger’s inequality. The fourth step is just a careful case analysis.
2. First Step: The Continuous Graph
Let be set of functions such that is well defined and finite. Then we define the following quantity, that we think of as the spectral gap of :
We could define a Laplacian operator and show that the above quantity is indeed the second smallest eigenvalue, but it will not be necessary for our proof.
We have the following bound.
Theorem 1 .
Proof: Let be the function such that
For a point , define . We extend to a function by defining
This means that we tile the square into unit squares whose corners are integer-coordinate, and that is constant on each unit square, and it equals the value of at the left-bottom corner of the square.
It is immediate to see that
and so, up to a factor of , the denominator of the Rayleigh quotient of is the same as the denominator of the Rayleigh quotient of .
It remains to bound the numerators.
Observe that for every , we have that equals either or , and that equals either or . The numerator of the Rayleigh quotient of is
because for a randomly chosen in the square , there is probability that and probability that .
Now we can use the “triangle inequality”
to bound the above quantity
which simplifies to
which is at most times the numerator of the Rayleigh quotient of .
3. Second Step: The Countable Graph
We now define the graph of vertex set , where each vertex is connected to
Note
For a -regular graph with an countably infinite set of vectors, define to be the set of functions such that is finite, and define the smallest eigenvalue of as
So that
We want to show the following result.
Theorem 2 For every , .
Proof: This will be the most interesting part of the argument. Let be any function such that , we will show that the Fourier transform of has a Rayleigh quotient for that is at most the Rayleigh quotient of for .
First, we briefly recall the definitions of Fourier transforms. If is such that
then we can write the linear combination
where the basis functions are
and the coefficients are
The condition gives
and the Parseval identity gives
and so we have that the denominator of the Rayleigh quotient of for and of for As usual, the numerator is more complicated.
We can break up the numerator of the Rayleigh quotient of as
where and , and we can use Parseval’s identity to rewrite it as
The Fourier coefficients of the function can be computed as
where we used the change of variable .
Similarly, . This means that the numerator of the Rayleigh quotient of for is equal to the numerator of the Rayleigh quotient of for .
4. Third Step: A Cheeger Inequality for countable graphs
Define the edge expansion of a -regular graph with a countably infinite set of vertices as
Note that the edge expansion can be zero even if the graph is connected.
Theorem 3 (Cheeger inequality for countable graphs) For every graph with a countably infinite set of vertices we have
Proof: This is similar to the proof for finite graphs, with the simplification that we do not need to worry about constructing a set containing at most half of the vertices.
Let be any function. We will show that is at most where
is the Rayleigh quotient of .
For every threshold , define the set as
and note that each set is finite because is finite. We have, for ,
and, for all
Now we compute the integral of the numerator and denominator of the above expression, and we will find the numerator and denominator of the Rayleigh quotient .
and
Which means
Now we proceed with Cauchy Schwarz:
And we have
5. Expansion of
After all these reductions, we finally come to the point where we need to prove that something is an expander.
Theorem 4
Proof: Let be a finite subset of .
Let be the set of elements of that have one 0 coordinate. Let be the set of elements of with nonzero coordinate that belong to the 1st, 2nd, 3rd and 4th quadrant. (Starting from the quadrant of points having both coordinates positive, and numbering the remaining ones clockwise.)
Claim 1 .
Proof: Consider the sets and ; both and are permutations, and so . Also, and are disjoint, because if we had then we would have while all the coordinates are strictly positive. Finally, and are also contained in the first quadrant, and so at least of the edges leaving lands outside . We can make a similar argument in each quadrant, considering the sets and in the second quadrant, the sets and in the third, and and in the fourth.
Claim 2
Proof: All the edges that have one endpoint in have the other endpoint outside of . Some of those edges, however, may land in . Overall, can account for at most edges, and we have already computed that at least of them land into , so can absorb at most of the outgoing edges of .
Balancing the two equalities (adding the first plus times the second) gives us the theorem.
In which we analyze the zig-zag graph product.
In the previous lecture, we claimed it is possible to “combine” a -regular graph on vertices and a -regular graph on vertices to obtain a -regular graph on vertices which is a good expander if the two starting graphs are. Let the two starting graphs be denoted by and respectively. Then, the resulting graph, called the zig-zag product of the two graphs is denoted by .
We will use to denote the eigenvalue with the second-largest absolute value of the normalized adjacency matrix of a -regular graph . If are the eigenvalues of the normalized Laplacian of , then .
We claimed that if and , then . In this lecture we shall recall the construction for the zig-zag product and prove this claim.
1. Replacement Product and Zig-Zag Product
We first describe a simpler product for a “small” -regular graph on vertices (denoted by ) and a “large” -regular graph on vertices (denoted by ). Assume that for each vertex of , there is some ordering on its neighbors. Then we construct the replacement product (see figure) as follows:
Note that the replacement product constructed as above has vertices and is -regular.
2. Zig-zag product of two graphs
Given two graphs and as above, the zig-zag product is constructed as follows (see figure):
It is easy to see that the zig-zag product is a -regular graph on vertices.
Let be the normalized adjacency matrix of . Using the fact that each edge in is made up of three steps in , we can write as , where
And if is the -th neighbor of and is the -th neighbor of , and otherwise.
Note that is the adjacency matrix for a matching and is hence a permutation matrix.
3. A Technical Preliminary
We will use the following fact. Suppose that is the normalized adjacency matrix of a graph . Thus the largest eigenvalue of is 1, with eigenvector ; we have
which is a corollary of the following more general result. Recall that a vector space is an invariant subspace for a matrix if for every .
Lemma 1 Let be a symmetric matrix, and be a -dimensional invariant subspace for . Thus, (from the proof of the spectral theorem) we have that has an orthonormal basis of eigenvectors; let be the corresponding eigenvalues with multiplicities; we have
Proof: If the largest eigenvalue in absolute value is , then
and if it is (because is negative, and )
Finally, if is the basis of orthonormal eigenvectors in such that , then, for every , we can write and
and the Lemma follows by combining (2), (3) and (4).
4. Analysis of the zig-zag Product
Theorem 2 Let be a -regular graph with nodes, be a -regular graph with nodes, and let , , and let the normalized adjacency matrix of be where and are as defined in Section 1.
Then
Proof: Let be such that . We refer to a set of coordinates of corresponding to a copy of as a “block” of coordinate.
We write , where is constant within each block, and sums to zero within each block. Note both and are orthogonal to , and that they are orthogonal to each other.
We want to prove
We have (using the fact that is symmetric)
And it remains to bound the three terms.
Because, after writing , we see that , because is the same as , the tensor product of the identity and of the normalized adjacency matrix of . The normalized adjacency matrix of leaves a vector parallel to all-ones unchanged, and so leaves every vector that is constant in each block unchanged.
Thus
Let be the vector such that is equal to the value that has in the block of . Then
because and
Now let us call the restriction of to coordinates of the form for . Then each is orthogonal to the all-one vector and , so
Because, from Cauchy-Schwarz, the fact that and the fact that permutation matrices preserve length, we have
and we proved above that
so
]]>