J.Z.: In China, we say that if you sneeze once, it means that someone is thinking of you. If you sneeze twice, it means someone is cursing you.
Me: and what does it mean when I sneeze three times or more?
J.Z.: it means you have a cold.
It would make sense if, to mitigate his negatives, Trump chose a person of color and someone who has a history of speaking out against income inequality.
He or she would have to be someone who is media-savvy and with some experience running a campaign, but definitely not a career politician. And of course he or she should be someone who endorsed Trump early on, like, say, in January.
I can think of only one person: Jimmy McMillan!
In which we prove properties of expander graphs.
1. Quasirandomness of Expander Graphs
Recall that if is a -regular graph, and is its adjacency matrix, then, if we call the eigenvalues of with repetitions, we are interested in the parameter , and we have
where is the matrix with a one in each entry, and is the matrix norm .
Our fist result today is to show that, when is small, the graph has the following quasirandomness property: for every two disjoint sets , the number of edges between and is close to what we would expect in a random graph of average degree , that is, approximately .
For two (possibly overlapping) sets of vertices , we define to be the number of edges with one endpoint in and one endpoint in , with edges having both endpoints in , if any, counted twice.
Lemma 1 (Expander Mixing Lemma) Let be a -regular graph, and let and be two disjoint subsets of vertices. Then
Proof: We have
and
so
Note that, for every disjoint , we have , and so the right-hand side in the expander mixing lemma is at most , which is a small fraction of the total number of edges if is small compared to .
2. Random Walks in Expanders
A -step random walk is the probabilistic process in which we start at a vertex, then we pick uniformly at random one of the edges incident on the vertices and we move to the other endpoint of the edge, and then repeat this process times.
If is the normalized adjacency matrix of an undirected regular graph , then is the probability that, in one step, a random walk started at reaches . This is why the normalized adjacency matrix of a regular graph is also called its transition matrix.
Suppose that we start a random walk at a vertex chosen according to a probability distribution , which we think of as a vector such that for every and . After taking one step, the probability of being at vertex is , which means that the probability distribution after one step is described by the vector , and because of the symmetric of , this is the same as .
Iterating the above reasoning, we see that, after a -step random walk whose initial vertex is chosen according to distribution , the last vertex reached by the walk is distributed according to .
The parameter of is equal to , and so if has a parameter bounded away from , and if is large enough, we have that the parameter of is very small, and so is close to in matrix norm. If was actually equal to , then would be equal to the uniform distribution, for every distribution . We would thus expect to be close to the uniform distribution for large enough .
Before formalizing the above intuition, we need to fix a good measure of distance for distributions. If we think of distributions as vectors, then a possible notion of distance between two distributions is the Euclidean distance between the corresponding vectors. This definition, however, has various shortcoming and, in particular, can assign small distance to distributions that are intuitively very different. For example, suppose that and are distributions that are uniform over a set , and over the complement of , respectively, where is a set of size . Then all the entries of are and so , which is vanishingly small even though distributions over disjoint supports should be considered as maximally different distributions.
A very good measure is the total variation distance, defined as
that is, as the maximum over all events of the difference between the probability of the event happening with respect to one distribution and the probability of it happening with respect to the other distribution. This measure is usually called statistical distance in computer science. It is easy to check that the total variation distance between and is precisely . Distributions with disjoint support have total variation distance 1, which is largest possible.
Lemma 2 (Mixing Time of Random Walks in Expanders) Let be a regular graph, and be its normalized adjacency matrix. Then for every distribution over the vertices and every , we have
where is the uniform distribution.
In particular, if , then , where is an absolute constant.
Proof: Let be the normalized adjacency matrix of a clique with self-loops. Then, for every distribution , we have . Recall also that .
We have
The last result that we discussed today is one more instantiation of the general phenomenon that “if is small then a result that is true for the clique is true, within some approximation, for .”
Suppose that we take a -step random walk in a regular graph starting from a uniformly distributed initial vertex. If is a clique with self-loops, then the sequence of vertices encountered in the random walk is a sequence of independent, uniformly distributed, vertices. In particular, if is a bounded function, the Chernoff-Hoeffding bounds tell us that the empirical average of over the points of the random walk is very close to the true average of , except with very small probability, that is, if we denote by the set of vertices encountered in the random walk, we have
where . A corresponding Chernoff-Hoeffding bound can be proved for the case in which the random walk is taken over a regular graph such that is small.
Lemma 3 (Chernoff-Hoeffding Bound for Random Walks in Expanders) Let be a regular graph, and the distribution of -tuples constructed by sampling independently, and then performing a -step random walk starting at . Let be any bounded function. Then
We will not prove the above result, but we briefly discuss one of its many applications.
Suppose that we have a polynomial-time probabilistic algorithm that, on inputs of length , uses random bits and then outputs the correct answer with probability, say, at least . One standard way to reduce the error probability is to run the algorithm times, using independent randomness each time, and then take the answer that comes out a majority of the times. (This is for problems in which we want to compute a function exactly; in combinatorial optimization we would run the algorithm times and take the best solutions, and in an application in which the algorithm performs an approximate function evaluation we would run the algorithm times and take the median. The reasoning that follows for the case of exact function computation can be applied to the other settings as well.)
On average, the number of iterations of the algorithms that give a correct answer is , and the cases in which the majority is erroneous correspond to cases in which the number of iterations giving a correct answer is . This means that the case in which the modified algorithm makes a mistake correspond to the case in which the empirical average of independent 0/1 random variables deviates from its expectation by more than , which can happen with probability at most , which becomes vanishingly small for large .
This approach uses random bits. Suppose, instead, that we consider the following algorithm: pick random strings for the algorithm by performing a -step random walk in an expander graph of degree with vertices and such that , and then take the majority answer. A calculation using the Chernoff bound for expander graphs show that the error probability is , and it is achieved using only random bits instead of .
]]>
In which we present a probabilistic construction of expanders.
We have seen a combinatorial construction of expanders, based on the zig-zag graph product, and an algebraic one. Today we see how to use the probabilistic method to show that random graphs, selected from an appropriate probability distribution, are expanders with high probability.
A non-trivial part of today’s lecture is the choice of distribution. For us, a family of expanders is a family of regular graphs of fixed degree, but if we pick a graph at random according to the Erd\” os-Renyi distribution, selecting each pair to be an edge independently with probability , then we do not get a regular graph and, indeed, not even a bounded-degree graph. (Even when , ensuring constant average degree, the maximum degree is of the order of .)
This means that we have to study distributions of graphs in which there are correlations between edges, which are often difficult to reason about.
We could study the expansion of random -regular graphs, but that is a particularly challenging distribution of graphs to analyze. Instead, the following distributions over -regular graphs are usually considered over a vertex set :
The first method is applicable when is even, and the second method is applicable when is even. (When and are both odd, it is not possible to have an -vertex -regular graph, because the number of edges in such a graph is .)
We will study the expansion of graphs generated according to the first distribution, and show that there exists an integer and a , such that a random regular graph on vertices has probability at least of having edge expansion at least .
In particular, we will show that for , the probability that a random -regular graph has expansion is at least . Our bounds will be very loose, and much tighter analyses are possible.
We have to show that, with high probability over the choice of the graph, every set of vertices with has at least edges leaving it.
The common approach to prove that a random object satisfies a certain collection of properties is to prove that each property holds with high probability, and then to use a union bound to show that all properties are simultaneously true with high probability. For a -regular graph to have expansion , we want every set of size to have at least outgoing edges; a naive approach would be to show that such a property holds for every fixed set except with probability at most , and then take a union bound over all sets of size .
Unfortunately the naive approach does not work, because the probability that small sets fail to expand is much higher than . For example, the probability that a fixed set of nodes form a clique is at least of the order of . Fortunately, the number of small sets is small, and if the probability of a fixed set of size being non-expanding is, say, at most , then, by taking a union bound over all sets of size , the probability that there is a non-expanding set of size is at most , and then by taking a union bound over all sizes we get that the probability that there is a non-expanding set is at most inverse polynomial in .
Let denote the set of nodes that have at least a neighbor in . If , then there are most edges leaving from . In order to upper bound the probability that there are edges leaving , we will upper bound the probability that .
It will be convenient to have the following model in mind for how a random perfect matchings is chosen. Let be an arbirtary ordering of the vertices such that , then the following algorithm samples a random perfect matching over :
It is easy to see that the above algorithm has possible outputs, each equally likely, each distinct, and that is also the number of perfect matchings over a set of vertices, so that the algorithm indeed samples a uniformly distributed perfect matching.
Now, fix a set of size and a set of size . The probability that, in a random matching, the vertices of are all matched to vertices in is at most the probability that, during the first executions of the “while” loop, the randomly selected vertex is in .
For , conditioned on the first iterations picking a vertex , the probability that this happens on the -th iteration is the number of unmatched vertices in which is , divided by the total number of unmatched vertices in , which is .
Thus, the probability that, in a random matching, all vertices of are matched to vertices in is at most
when we pick as the union of random matchings, the probability that all the neighbors of are in is at most the above bound raised to the power of :
and taking a union bound over all choices of of size and all choices of of size , we have
Now, for , taking a union bound over all (in a graph without self-loops, every singleton set is expanding), we have
]]>
In which we present an algebraic construction of expanders.
1. The Marguli-Gabber-Galil Expanders
We present a construction of expander graphs due to Margulis, which was the first explicit construction of expanders, and its analysis due to Gabber and Galil. The analysis presented here includes later simplifications, and it follows an exposition of James Lee.
For every , we construct graphs with vertices, and we think of the vertex set as , the group of pairs from where the group operation is coordinate-wise addition modulo .
Define the functions and , where all operations are modulo . Then the graph has vertex set and the vertex is connected to the vertices
so that is an 8-regular graph. (The graph has parallel edges and self-loops.)
We will prove that there is a constant such that for every .
The analysis will be in four steps, and it will refer to certain infinite “graphs.”
We define an infinite family of graphs , such that the vertex set of is , that is, every vertex of is a pair , where and , and we think of and as elements of the group in which we do addition modulo . Every vertex of is connected to the vertices
and is 4-regular. For each of these graphs, we will define a “spectral gap” ; we put “spectral gap” in quotes because, although it is actually the second smallest eigenvalue of a Laplacian operator, we will define it purely formally as the minimum of a certain optimization problem.
We will also define the graph , whose vertex set is , and such that each vertex is connected to
so that is also -regular. We will define a “spectral gap” , again purely formally as the infimum of a certain expression, although it is the infimum of the spectrum of a certain Laplacian operator. We will also define an “edge expansion” of .
The proof of the expansion of will proceed by establishing the following four facts:
The first step will be a discretization argument, showing that a test vector of small Rayleigh quotient for can be turned into a test function of small Rayleigh quotient for . The second step is the most interesting and unexpected part of the proof; we will not spoil the surprise of how it works. The third step is proved the same way as Cheeger’s inequality. The fourth step is just a careful case analysis.
2. First Step: The Continuous Graph
Let be set of functions such that is well defined and finite. Then we define the following quantity, that we think of as the spectral gap of :
We could define a Laplacian operator and show that the above quantity is indeed the second smallest eigenvalue, but it will not be necessary for our proof.
We have the following bound.
Theorem 1 .
Proof: Let be the function such that
For a point , define . We extend to a function by defining
This means that we tile the square into unit squares whose corners are integer-coordinate, and that is constant on each unit square, and it equals the value of at the left-bottom corner of the square.
It is immediate to see that
and so, up to a factor of , the denominator of the Rayleigh quotient of is the same as the denominator of the Rayleigh quotient of .
It remains to bound the numerators.
Observe that for every , we have that equals either or , and that equals either or . The numerator of the Rayleigh quotient of is
because for a randomly chosen in the square , there is probability that and probability that .
Now we can use the “triangle inequality”
to bound the above quantity
which simplifies to
which is at most times the numerator of the Rayleigh quotient of .
3. Second Step: The Countable Graph
We now define the graph of vertex set , where each vertex is connected to
Note
For a -regular graph with an countably infinite set of vectors, define to be the set of functions such that is finite, and define the smallest eigenvalue of as
So that
We want to show the following result.
Theorem 2 For every , .
Proof: This will be the most interesting part of the argument. Let be any function such that , we will show that the Fourier transform of has a Rayleigh quotient for that is at most the Rayleigh quotient of for .
First, we briefly recall the definitions of Fourier transforms. If is such that
then we can write the linear combination
where the basis functions are
and the coefficients are
The condition gives
and the Parseval identity gives
and so we have that the denominator of the Rayleigh quotient of for and of for As usual, the numerator is more complicated.
We can break up the numerator of the Rayleigh quotient of as
where and , and we can use Parseval’s identity to rewrite it as
The Fourier coefficients of the function can be computed as
where we used the change of variable .
Similarly, . This means that the numerator of the Rayleigh quotient of for is equal to the numerator of the Rayleigh quotient of for .
4. Third Step: A Cheeger Inequality for countable graphs
Define the edge expansion of a -regular graph with a countably infinite set of vertices as
Note that the edge expansion can be zero even if the graph is connected.
Theorem 3 (Cheeger inequality for countable graphs) For every graph with a countably infinite set of vertices we have
Proof: This is similar to the proof for finite graphs, with the simplification that we do not need to worry about constructing a set containing at most half of the vertices.
Let be any function. We will show that is at most where
is the Rayleigh quotient of .
For every threshold , define the set as
and note that each set is finite because is finite. We have, for ,
and, for all
Now we compute the integral of the numerator and denominator of the above expression, and we will find the numerator and denominator of the Rayleigh quotient .
and
Which means
Now we proceed with Cauchy Schwarz:
And we have
5. Expansion of
After all these reductions, we finally come to the point where we need to prove that something is an expander.
Theorem 4
Proof: Let be a finite subset of .
Let be the set of elements of that have one 0 coordinate. Let be the set of elements of with nonzero coordinate that belong to the 1st, 2nd, 3rd and 4th quadrant. (Starting from the quadrant of points having both coordinates positive, and numbering the remaining ones clockwise.)
Claim 1 .
Proof: Consider the sets and ; both and are permutations, and so . Also, and are disjoint, because if we had then we would have while all the coordinates are strictly positive. Finally, and are also contained in the first quadrant, and so at least of the edges leaving lands outside . We can make a similar argument in each quadrant, considering the sets and in the second quadrant, the sets and in the third, and and in the fourth.
Claim 2
Proof: All the edges that have one endpoint in have the other endpoint outside of . Some of those edges, however, may land in . Overall, can account for at most edges, and we have already computed that at least of them land into , so can absorb at most of the outgoing edges of .
Balancing the two equalities (adding the first plus times the second) gives us the theorem.
In which we analyze the zig-zag graph product.
In the previous lecture, we claimed it is possible to “combine” a -regular graph on vertices and a -regular graph on vertices to obtain a -regular graph on vertices which is a good expander if the two starting graphs are. Let the two starting graphs be denoted by and respectively. Then, the resulting graph, called the zig-zag product of the two graphs is denoted by .
We will use to denote the eigenvalue with the second-largest absolute value of the normalized adjacency matrix of a -regular graph . If are the eigenvalues of the normalized Laplacian of , then .
We claimed that if and , then . In this lecture we shall recall the construction for the zig-zag product and prove this claim.
1. Replacement Product and Zig-Zag Product
We first describe a simpler product for a “small” -regular graph on vertices (denoted by ) and a “large” -regular graph on vertices (denoted by ). Assume that for each vertex of , there is some ordering on its neighbors. Then we construct the replacement product (see figure) as follows:
Note that the replacement product constructed as above has vertices and is -regular.
2. Zig-zag product of two graphs
Given two graphs and as above, the zig-zag product is constructed as follows (see figure):
It is easy to see that the zig-zag product is a -regular graph on vertices.
Let be the normalized adjacency matrix of . Using the fact that each edge in is made up of three steps in , we can write as , where
And if is the -th neighbor of and is the -th neighbor of , and otherwise.
Note that is the adjacency matrix for a matching and is hence a permutation matrix.
3. A Technical Preliminary
We will use the following fact. Suppose that is the normalized adjacency matrix of a graph . Thus the largest eigenvalue of is 1, with eigenvector ; we have
which is a corollary of the following more general result. Recall that a vector space is an invariant subspace for a matrix if for every .
Lemma 1 Let be a symmetric matrix, and be a -dimensional invariant subspace for . Thus, (from the proof of the spectral theorem) we have that has an orthonormal basis of eigenvectors; let be the corresponding eigenvalues with multiplicities; we have
Proof: If the largest eigenvalue in absolute value is , then
and if it is (because is negative, and )
Finally, if is the basis of orthonormal eigenvectors in such that , then, for every , we can write and
and the Lemma follows by combining (2), (3) and (4).
4. Analysis of the zig-zag Product
Theorem 2 Let be a -regular graph with nodes, be a -regular graph with nodes, and let , , and let the normalized adjacency matrix of be where and are as defined in Section 1.
Then
Proof: Let be such that . We refer to a set of coordinates of corresponding to a copy of as a “block” of coordinate.
We write , where is constant within each block, and sums to zero within each block. Note both and are orthogonal to , and that they are orthogonal to each other.
We want to prove
We have (using the fact that is symmetric)
And it remains to bound the three terms.
Because, after writing , we see that , because is the same as , the tensor product of the identity and of the normalized adjacency matrix of . The normalized adjacency matrix of leaves a vector parallel to all-ones unchanged, and so leaves every vector that is constant in each block unchanged.
Thus
Let be the vector such that is equal to the value that has in the block of . Then
because and
Now let us call the restriction of to coordinates of the form for . Then each is orthogonal to the all-one vector and , so
Because, from Cauchy-Schwarz, the fact that and the fact that permutation matrices preserve length, we have
and we proved above that
so
]]>
In which we give an explicit construction of expander graphs of polylogarithmic degree, state the properties of the zig-zag product of graphs, and provide an explicit construction of a family of constant-degree expanders using the zig-zag product and the polylogarithmic-degree construction.
A family of expanders is a family of graphs , , such that each graph is -regular, and the edge-expansion of each graph is at least , for an absolute constant independent of . Ideally, we would like to have such a construction for each , although it is usually enough for most applications that, for some constant and every , there is an for which the construction applies in the interval , or even the interval . We would also like the degree to be slowly growing in and, ideally, to be bounded above by an explicit constant. Today we will see a simple construction in which and a more complicated one in which .
An explicit construction of a family of expanders is a construction in which is “efficiently computable” given . The weakest sense in which a construction is said to be explicit is when, given , the (adjacency matrix of the) graph can be constructed in time polynomial in . A stronger requirement, which is necessary for several applications, is that given and , the list of neighbors of the -th vertex of can be computed in time polynomial in .
In many explicit constructions of constant-degree expanders, the construction is extremely simple, and besides satisfying the stricter definition of “explicit” above, it is also such that the adjacency list of a vertex is given by a “closed-form formula.” The analysis of such constructions, however, usually requires very sophisticated mathematical tools.
Example 1 Let be a prime, and define the graph in which , and, for , the vertex is connected to , to and to its multiplicative inverse . The vertex is connected to , to , and has a self-loop. Counting self-loops, the graph is 3-regular: it is the union of a cycle over and of a matching over the vertices ; the vertices , , have a self-loop each. There is a constant such that, for each , the graph has edge expansion at least . Unfortunately, no elementary proof of this fact is known. The graph is shown in the picture below.
Constructions based on the zig-zag graph product, which we shall see next, are more complicated to describe, but much simpler to analyze.
We begin by describing a building block in the construction, which is also an independently interesting construction: a family of expanders with polylogarithmic degree, which have both a very simple description and a very simple analysis.
1. Expanders of Logarithmic Degree
Let be a prime and . We’ll construct a -regular multigraph with vertices. The vertex set of the graph will be the -dimensional vector space over .
For each vertex , and every two scalars , we have the edges .
In other words, the graph is a Cayley graph of the additive group of , constructed using the generating multiset
Note that the generating set is symmetric, that is, if then (with the same multiplicity), and so the resulting multigraph is undirected.
Let be the adjacency matrix of and be the normalized Laplacian matrix. We will prove the following bound on the eigenvalues of .
Theorem 1 For every prime and every , if we let be the eigenvalues of with multiplicities, then, for every
For example, setting gives us a family of graphs such that for each graph in the family, and hence , and the number of vertices is , while the degree is , meaning the degree is .
Proof: We will compute the eigenvalues of the adjacency matrix of , and prove that, except the largest one which is , all the others are non-negative and at most .
Recall our characterization of the eigenvalues of the adjacency matrix of a Cayley multigraph of an abelian group with generating multiset : we have one eigenvector for each character of the group, and the corresponding eigenvalue is .
What are the characters of the additive group of ? It is the product of copies of the additive group of , or, equivalently, the product of copies of the cyclic group . Following our rules for constructing the character of the cyclic group and of products of groups, we see that the additive group of has one character for each , and the corresponding character is
where
Thus, for each , we have an eigenvalue
When then the corresponding character is always equal to one, and the corresponding eigenvalue is .
Now consider any , and define the polynomial . Note that it is a non-zero polynomial of degree at most , and so it has at most roots. The eigenvalue corresponding to is
where we use the fact that, for every , the sum equals zero, since it is the sum of the values of the non-trivial character , and we proved that, for every non-trivial character, the sum is zero.
In conclusion, we have
2. The Zig-Zag Graph Product
Given a -regular graph with adjacency matrix , if are the eigenvalues of with multiplicities we define
In particular, , and if we are able to construct a family of graphs such that is at most a fixed constant bounded away from one times , then we have a family of expanders. (Our construction will be inductive and, as often happens with inductive proofs, it will be easier to maintain this stronger property than the property that is bounded away from one.)
Given graphs and of compatible sizes, with small degree and large edge expansion, the zig zag product is a method of constructing a larger graph also with small degree and large edge expansion.
If:
Then:
We will see the construction and analysis of the zig zag product in the next lecture.
For the remainder of today, we’ll see how to use the zig zag product to construct arbitrarily large graphs of fixed degree with large edge expansion.
Fix a large enough constant . ( will do.) Construct a -regular graph on vertices with . (For example is a degree graph on vertices with .)
For any graph , let represent the graph on the same vertex set whose edges are the paths of length two in . Thus is the graph whose adjacency matrix is the square of the adjacency matrix of . Note that if is -regular then is -regular
Using the from above we’ll construct inductively, a family of progressively larger graphs, all of which are -regular and have .
Let . For let .
Theorem 2 For each , has degree and .
Proof: We’ll prove this by induction.
Base case: is -regular. Also, .
Inductive step: Assume the statement for , that is, has degree and . Then has degree , so that the product is defined. Moreover, . Applying the construction, we get that has degree and This completes the proof.
Finally note that has vertices.
]]>
In which we show how to find the eigenvalues and eigenvectors of Cayley graphs of Abelian groups, we find tight examples for various results that we proved in earlier lectures, and, along the way, we develop the general theory of harmonic analysis which includes the Fourier transform of periodic functions of a real variable, the discrete Fourier transform of periodic functions of an integer variable, and the Walsh transform of Boolean functions.
Earlier, we prove the Cheeger inequalities
and the fact that Fiedler’s algorithm, when given an eigenvector of , finds a cut such that . We will show that all such results are tight, up to constants, by proving that
In this lecture we will develop some theoretical machinery to find the eigenvalues and eigenvectors of Cayley graphs of finite Abelian groups, a class of graphs that includes the cycle and the hypercube, among several other interesting examples. This theory will also be useful later, as a starting point to talk about constructions of expanders.
For readers familiar with the Fourier analysis of Boolean functions, or the discrete Fourier analysis of functions , or the standard Fourier analysis of periodic real functions, this theory will give a more general, and hopefully interesting, way to look at what they already know.
1. Characters
We will use additive notation for groups, so, if is a group, its unit will be denoted by , its group operation by , and the inverse of element by . Unless, noted otherwise, however, the definitions and results apply to non-abelian groups as well.
Definition 1 (Character) Let be a group (we will also use to refer to the set of group elements). A function is a character of if
- is a group homomorphism of into the multiplicative group .
- for every ,
Though this definition might seem to not bear the slightest connection to our goals, the reader should hang on because we will see next time that finding the eigenvectors and eigenvalues of the cycle is immediate once we know the characters of the group , and finding the eigenvectors and eigenvalues of the hypercube is immediate once we know the characters of the group .
Remark 1 (About the Boundedness Condition) If is a finite group, and is any element, then
and so if is a group homomorphism then
and so is a root of unity and, in particular, . This means that, for finite groups, the second condition in the definition of character is redundant. In certain infinite groups, however, the second condition does not follow from the first, for example defined as is a group homomorphism of into but it is not a character.
Just by looking at the definition, it might look like a finite group might have an infinite number of characters; the above remark, however, shows that a character of a finite group must map into -th roots of unity, of which there are only , showing a finite upper bound to the number of characters. Indeed, a much stronger upper bound holds, as we will prove next, after some preliminaries.
Lemma 2 If is finite and is a character that is not identically equal to 1, then
Proof: Let be such that . Note that
where we used the fact that the mapping is a permutation. (We emphasize that even though we are using additive notation, the argument applies to non-abelian groups.) So we have
and since we assumed , it must be .
If is finite, given two functions , define the inner product
Lemma 3 If are two different characters of a finite group , then
We will prove Lemma 3 shortly, but before doing so we note that, for a finite group , the set of functions is a -dimensional vector space, and that Lemma 3 implies that characters are orthogonal with respect to an inner product, and so they are linearly independent. In particular, we have established the following fact:
Corollary 4 If is a finite group, then it has at most characters.
It remains to prove Lemma 3, which follows from the next two statements, whose proof is immediate from the definitions.
Fact 5 If are characters of a group , then the mapping is also a character.
Fact 6 If is a character of a group , then the mapping is also a character, and, for every , we have .
To complete the proof of Lemma 3, observe that:
Notice that, along the way, we have also proved the following fact:
Fact 7 If is a group, then the set of characters of is also a group, with respect to the group operation of pointwise multiplication. The unit of the group is the character mapping every element to 1, and the inverse of a character is the pointwise conjugate of the character.
The group of characters is called the Pontryagin dual of , and it is denoted by .
We now come to the punchline of this discussion.
Theorem 8 If is a finite abelian group, then it has exactly characters.
Proof: We give a constructive proof. We know that every finite abelian group is isomorphic to a product of cyclic groups
so it will be enough to prove that
For the first claim, consider, for every , the function
Each such function is clearly a character ( maps to 1, is the multiplicative inverse of , and, recalling that for every integer , we also have ), and the values of are different for different values of , so we get distinct characters. This shows that has at least characters, and we already established that it can have at most characters.
For the second claim, note that if is a character of and is a character of , then it is easy to verify that the mapping is a character of . Furthermore, if and are two distinct pairs of characters, then the mappings and are two distinct characters of , because we either have an such that , in which case , or we have a such that , in which case . This shows that has at least characters, and we have already established that it can have at most that many
This means that the characters of a finite abelian group form an orthogonal basis for the set of all functions , so that any such function can be written as a linear combination
For every character , , and so the characters are actually a scaled-up orthonormal basis, and the coefficients can be computed as
Example 1 (The Boolean Cube) Consider the case , that is the group elements are , and the operation is bitwise xor. Then there is a character for every bit-vector , which is the function
Every boolean function can thus be written as
where
which is the boolean Fourier transform.
Example 2 (The Cyclic Group) To work out another example, consider the case . Then every function can be written as
where
which is the discrete Fourier transform.
2. A Look Beyond
Why is the term ”Fourier transform” used in this context? We will sketch an answer to this question, although what we say from this point on is not needed for our goal of finding the eigenvalues and eigenvectors of the cycle and the hypercube.
The point is that it is possible to set up a definitional framework that unifies both what we did in the previous section with finite Abelian groups, and the Fourier series and Fourier transforms of real and complex functions.
In the discussion of the previous section, we started to restrict ourselves to finite groups when we defined an inner product among functions .
If is an infinite abelian group, we can still define an inner product among functions , but we will need to define a measure over and restrict ourselves in the choice of functions. A measure over (a sigma-algebra of subsets of) is a Haar measure if, for every measurable subset and element we have , where . For example, if is finite, is a Haar measure. If , then is also a Haar measure (it is ok for a measure to be infinite for some sets), and if then the Lebesgue measure is a Haar measure. When a Haar measure exists, it is more or less unique up to multiplicative scaling. All locally compact topological abelian groups have a Haar measure, a very large class of abelian groups, that include all finite ones, , , and so on.
Once we have a Haar measure over , and we have defined an integral for functions , we say that a function is an element of if
For example, if is finite, then all functions are in , and a function is in if the series converges.
If , we can define their inner product
and use Cauchy-Schwarz to see that .
Now we can repeat the proof of Lemma 3 that for two different characters, and the only step of the proof that we need to verify for infinite groups is an analog of Lemma 2, that is we need to prove that if is a character that is not always equal to 1, then
and the same proof as in Lemma 2 works, with the key step being that, for every group element ,
because of the property of being a Haar measure.
We don’t have an analogous result to Theorem 8 showing that and are isomorphic, however it is possible to show that itself has a Haar measure , that the dual of is isomorphic to , and that if is continuous, then it can be written as the “linear combination”
where
In the finite case, the examples that we developed before correspond to setting and .
Example 3 (Fourier Series) The set of characters of the group with the operation of addition modulo 1 is isomorphic to , because for every integer we can define the function
and it can be shown that there are no other characters. We thus have the Fourier series for continuous functions ,
where
3. Cayley Graphs and Their Spectrum
Let be a finite group. We will use additive notation, although the following definition applies to non-commutative groups as well. A subset is symmetric if .
Definition 9 For a group and a symmetric subset , the Cayley graph is the graph whose vertex set is , and such that is an edge if and only if . Note that the graph is undirected and -regular.
We can also define Cayley weighted graphs: if is a function such that for every , then we can define the weighted graph in which the edge has weight . We will usually work with unweighted graphs, although we will sometimes allow parallel edges (corresponding to positive integer weights).
Example 4 (Cycle) The -vertex cycle can be constructed as the Cayley graph .
Example 5 (Hypercube) The -dimensional hypercube can be constructed as the Cayley graph
where the group is the set with the operation of bit-wise xor, and the set is the set of bit-vectors with exactly one .
If we construct a Cayley graph from a finite abelian group, then the eigenvectors are the characters of the groups, and the eigenvalues have a very simple description.
Lemma 10 Let be a finite abelian group, be a character of , be a symmetric set. Let be the adjacency matrix of the Cayley graph . Consider the vector such that .
Then is an eigenvector of , with eigenvalue
Proof: Consider the -th entry of :
And so
The eigenvalues of the form , where is a character, enumerate all the eigenvalues of the graph, as can be deduced from the following observations:
It is remarkable that, for a Cayley graph, a system of eigenvectors can be determined based solely on the underlying group, independently of the set .
4. The Cycle
The -cycle is the Cayley graph . Recall that, for every , the group has a character .
This means that for every we have the eigenvalue
where we used the facts that , that , and .
For we have the eigenvalue . For we have the second largest eigenvalue . If is an eigenvalue of the adjacency matrix, then is an eigenvalue of the normalized Laplacian. From the above calculations, we have that the second smallest Laplacian eigenvalue is .
The expansion of the cycle is , and so the cycle is an example in which the second Cheeger inequality is tight.
5. The Hypercube
The group with bitwise xor has characters; for every there is a character defined as
Let us denote the set by , where we let denote the bit-vector that has a in the -th position, and zeroes everywhere else. This means that, for every bit-vector , the hypercube has the eigenvalue
where we denote by the weight of , that is, the number of ones in .
Corresponding to , we have the eigenvalue .
For each of the vectors with exactly one , we have the second largest eigenvalue. The second smallest Laplacian eigenvalue is .
Let us compute the expansion of the hypercube. Consider “dimension cuts” of the form . The set contains half of the vertices, and the number of edges that cross the cut is also equal to half the number of vertices (because the edges form a perfect matching), so we have and so .
These calculations show that the first Cheeger inequality is tight for the hypercube.
Finally, we consider the tightness of the approximation analysis of Fiedler’s algorithm.
We have seen that, in the -dimensional hypercube, the second eigenvalue has multiplicity , and that its eigenvectors are vectors such that . Consider now the vector ; this is still clearly an eigenvector of the second eigenvalue. The entries of the vector are
Suppose now that we apply Fiedler’s algorithm using as our vector. This is equivalent to considering all the cuts in the hypercube in which we pick a threshold and define .
Some calculations with binomial coefficients show that the best such “threshold cut” is the “majority cut” in which we pick , and that the expansion of is
This gives an example of a graph and of a choice of eigenvector for the second eigenvalue that, given as input toFiedler’s algorithm, result in the output of a cut such that . Recall that we proved , which is thus tight, up to constants.
]]>
In which we complete the analysis of the ARV rounding algorithm
We are finally going to complete the analysis of the Arora-Rao-Vazirani rounding algorithm, which rounds a Semidefinite Programming solution of a relaxation of sparsest cut into an actual cut, with an approximation ratio .
In previous lectures, we reduced the analysis of the algorithm to the following claim.
Lemma 1 Let be a semi-metric over a set such that for all , let be a collection of vectors in , such that is a semimetric, let be a random Gaussian vector in , define , and suppose that, for every , we can define a set of disjoint pairs such that, with probability 1 over ,
and
Then
1. An Inductive Proof that Gives a Weaker Result
In this section we will prove a weaker lower bound on , of the order of . We will then show how to modify the proof to obtain the tight result.
We begin will the following definitions. We define the ball or radius centered at as
We say that a point has the -Large-Projection-Property, or that it is -LPP if
Lemma 2 Under the assumptions of Lemma 1, there is a constant (that depends only on and ) such that for all , at least elements of have the Large Projection Property.
Proof: We will prove the Lemma by induction on . We call the set of elements of that are -LPP
Let be the set of ordered pairs such that and , and hence . Because and have the same distribution, we have that, for every , there is probability that there is a such that (a fact that we will use in the inductive step).
For the base case there is nothing to prove.
For the inductive case, define the function (which will be a random variable dependent on ) such that is the lexicographically smallest such that if such a exists, and otherwise. The definition of is that for every , and the inductive assumption is that .
By a union bound, for every , there is probability at least that there is an such that and . In this case, we will define , otherwise .
Note that the above definition is consistent, because is a set of disjoint pairs, so for every there is at most one that could be used to define . We also note that, if , then
and
Now we can use another averaging argument to say that there have to be at least elements of such that
Let us call the set of such element. As required, .
By applying concentration of measure, the fact that, for every we have
implies that, for every
and the inductive step is proved, provided
which is true when
which proves the lemma if we choose appropriately.
Applying the previous lemma with , we have that, with probability , there is a pair in such that
and
but we also know that, with probability, for all pairs in ,
and so
implying
2. The Tight Bound
In the result proved in the previous section, we need , which is a constant, to be bigger than the loss incurred in the application of concentration of measure, which is of the order of . A factor of simply comes from the distances between the points that we are considering; an additional factor of comes from the fact that we need to push up the probability from a bound that is exponentially small in .
The reason for such a poor probability bound is the averaging argument: each element of has probability of being the “middle point” of the construction, so that the sum over the elements of of the probability that has adds up to ; such overall probability, however, could be spread out over all of , with each element of getting a very low probability of the order of , which is exponentially small in .
Not all elements of , however, can be a for which ; this is only possible for elements that are within distance from . If the set has cardinality of the same order of , then we only lose a constant factor in the probability, and we do not pay the extra term in the application of concentration of measure. But what do we do if is much bigger than ? In that case we may replace and and have similar properties.
Lemma 3 Under the assumptions of Lemma 1, if is a set of points such that for every
then, for every distance , every , and every
That is, if all the elements of are -LPP, then all the elements of are -LPP.
Proof: If , then there is such that , and, with probability we have . The claim follows from a union bound.
Lemma 4 Under the assumptions of Lemma 1, there is a constant (that depends only on and ) such that for all , there is a set such that , every element of is -LPP, and
Proof: The base case is proved by setting .
For the inductive step, we define and as in the proof of Lemma 2. We have that if , then
and
Now we can use another averaging argument to say that there have to be at least elements of such that
Let us call the set of such elements.
Define , , and so on, and let be the first time such that . We will define . Note that
which implies that . We have so we satisfy the inductive claim about the size of . Regarding the other properties, we note that , and that every element of is
so we also have that every element of is
provided
which we can satisfy with an appropriate choice of , recalling that .
Then we apply concentration of measure to deduce that every element of is
provided that
which we can again satisfy with an appropriate choice of , because and is smaller than or equal to zero.
Finally,
because, as we established above,
By applying Lemma 4 with , we find that there is probability that there are in such that
which, together, imply
]]>
In which we continue the analysis of the ARV rounding algorithm
We are continuing the analysis of the Arora-Rao-Vazirani rounding algorithm, which rounds a Semidefinite Programming solution of a relaxation of sparsest cut into an actual cut, with an approximation ratio .
In previous lectures, we reduced the analysis of the algorithm to the following claim.
Lemma 1 Let be a negative-type semimetric over a set , let be vectors such that , let be a random vector with a Gaussian distribution, and let .
Suppose that, for constants and a parameter , we have that there is a probability that there are at least pairs such that and .
Then there is a constant , that depends only on and , such that
1. Concentration of Measure
In the last lecture, we are have already introduced two useful properties of Gaussian distributions: that there is a small probability of being much smaller than the standard deviation in absolute value, and a very small probability of being much larger than the standard deviation in absolute value. Here we introduce a third property of a somewhat different flavor.
For a set and a distance parameter , define
the set of points at distance at most from . Then we have:
Theorem 2 (Gaussian concentration of measure) There is a constant such that, for every and for every set , if
then
for every , where the probabilities are taken according to the Gaussian measure in , that is , where and the are independent Gaussians of mean 0 and variance 1.
The above theorem says that if we have some property that is true with probability for a random Gaussian vector , then there is a probability that is within distance of a vector that satisfies the required property. In high dimension , this is a non-trivial statement because, with very high probability is about , and so the distance between and is small relative to the length of the vector.
We will use the following corollary.
Corollary 3 Let be vectors in and let . Let be a random Gaussian vector in , and let . If, for some and , we have
then
Proof: Let
By assumption, we have , and so, by concentration of measure:
The even in the above probability can be rewritten as
and the above condition gives us
The (use of the) above statement is by far the most innovative part of the analysis of Arora, Rao and Vazirani, so it is worth developing an intuitive feeling for its meaning.
Let’s say that we are interested in the distribution of . We know that the random variables are Gaussians of mean 0 and standard deviation at most , but it is impossible to say anything about, say, the average value or the median value of without knowing something about the correlation of the random variables .
Interestingly, the above Corollary says something about the concentration of without any additional information. The corollary says that, for example, the first percentile of and the 99-th percentile of differ by at most , and that we have a concentration result of the form
which is a highly non-trivial statement for any configuration of for which .
2. Reworking the Assumption
Lemma 4 Under the assumptions of Lemma 1, there is a fixed set , , and a set of disjoint pairs , dependent on , such that, for every and for every pair we have
and
and for all we have
Proof: Let be the set of disjoint pairs promised by the assumptions of Lemma 1. Construct a weighted graph , where the weight of the edge is . The degree of every vertex is at most 1, and the sum of the degrees is twice the expectation of , and so, by the assumptions of Lemma 1, it is at least .
Now, repeatedly delete from all vertices of degree at most , and all the edges incident on them, until no such vertex remains. At the end we are left with a (possibly empty!) graph in which all remaining vertices have degree at most ; each deletion reduces the sum of the degree by at most , and so the residual graph has total degree at least , and hence at least vertices
By the above Lemma, the following result implies Lemma 1 and hence the ARV Main Lemma.
Lemma 5 Let be a semi-metric over a set such that for all , let be a collection of vectors in , such that is a semimetric, let be a random Gaussian vector in , define , and suppose that, for every , we can define a set of disjoint pairs such that, with probability 1 over ,
and
Then
]]>