# Proof of the Cheeger inequality in manifolds

Having (non-rigorously) defined the Laplacian operator in manifolds in the previous post, we turn to the proof of the Cheeger inequality in manifolds, which we restate below.

Theorem 1 (Cheeger’s inequality) Let ${M}$ be an ${n}$-dimensional smooth, compact, Riemann manifold without boundary with metric ${g}$, let ${L:= - {\rm div} \nabla}$ be the Laplace-Beltrami operator on ${M}$, let ${0=\lambda_1 \leq \lambda_2 \leq \cdots }$ be the eigenvalues of ${L}$, and define the Cheeger constant of ${M}$ to be

$\displaystyle h(M):= \inf_{S\subseteq M : \ 0 < \mu(S) \leq \frac 12 \mu(M)} \ \frac{\mu_{n-1}(\partial(S))}{\mu(S)}$

where the ${\partial (S)}$ is the boundary of ${S}$, ${\mu}$ is the ${n}$-dimensional measure, and ${\mu_{n-1}}$ is ${(n-1)}$-th dimensional measure defined using ${g}$. Then

$\displaystyle h(M) \leq 2 \sqrt{\lambda_2} \ \ \ \ \ (1)$

We begin by recalling the proof of the analogous result in graphs, and then we will repeat the same steps in the context of manifolds.

Theorem 2 (Cheeger’s inequality in graphs) Let ${G=(V,E)}$ be a ${d}$-regular graph, ${A}$ be its adjacency matrix, ${L:= I - \frac 1d A}$ be its normalized Laplacian matrix, ${0 = \lambda_1 \leq \lambda_2 \leq \cdots \leq \lambda_{|V|}}$ be the eigenvalues of ${L}$, and define ${\mu(S):= d \cdot |S|}$ for every subset of vertices ${S\subseteq V}$. Define the conductance of ${G}$ as

$\displaystyle \phi(G) := \min_{S\subseteq V: \ 0 < \mu(S) \leq \frac 12 \mu(V) } \frac{| \partial S|}{\mu(S)}$

where ${\partial S}$ is the number of edges with one endpoint in ${S}$ and one endpoint in ${\bar S}$. Then

$\displaystyle \phi(G) \leq \sqrt{2 \lambda_2} \ \ \ \ \ (2)$

1. Proof the Cheeger inequality in graphs

We will use the variational characterization of the eigenvalues of the Laplacian ${L}$ of a graph ${G}$.

$\displaystyle \lambda_1 = \min_{f \in {\mathbb R}^v} \frac {f^T Lf}{f^T f} \ \ \ \ \ (3)$

and if ${f_1}$ is a minimizer in the above expression then

$\displaystyle \lambda_2 = \min_{f \in {\mathbb R}^V: \ f \perp f_1} \frac {f^T Lf}{f^T f}$

Following the definition of ${L}$ we see that

$\displaystyle f^T L f = \frac 1d \sum_{(u,v)\in E} |f_u - f_v |^2$

and so the minimum in (3) is 0, and it is achieved for ${f = (1,\cdots,1)}$. This means that

$\displaystyle \lambda_2 = \min_{f \in {\mathbb R}^V: \ \sum_v f_v = 0} \frac {\sum_{(u,v) \in E} |f_u-f_v|^2}{d \sum_v f_v^2} \ \ \ \ \ (4)$

The expression in the right-hand-side of (4) is an important one, and it is called the Rayleigh quotient of ${f}$, which we will denote by ${R(f)}$:

$\displaystyle R(f):= \frac {\sum_{(u,v) \in E} |f_u-f_v|^2}{d \sum_v f_v^2}$

It is also useful to consider the variant of the Rayleigh quotient where there are no squares; this does not have a standard name, so let us call it the ${\ell_1}$ Rayleigh quotient and denote it by ${R_1}$:

$\displaystyle R_1(g):= \frac {\sum_{(u,v) \in E} |g_u-g_v|}{d \sum_v |g_v|}$

The proof of the graph Cheeger inequality now continues with the proof of the following three facts.

Lemma 3 (Rounding of ${\ell_1}$ embeddings) For every non-negative vector ${g\in {\mathbb R}^V_{\geq 0}}$ there is a value ${t\geq 0}$ such that

$\displaystyle \phi(\{ v: g(v) > t \}) \leq R_1(g)$

Lemma 4 (Embedding of ${\ell_2^2}$ into ${\ell_1}$) For every non-negative vector ${f\in {\mathbb R}^V_{\geq 0}}$, we have

$\displaystyle R_1(f^2) \leq \sqrt{2 R(f)}$

Lemma 5 (From an eigenvector to a non-negative vector) For every ${f\in {\mathbb R}^V}$ such that ${\sum_v f_v =0}$ there is a non-negative ${f'\in {\mathbb R}^V_{\geq 0}}$ such that ${\mu(\{ v: f'(v) >0 \}) \leq \frac 12 \mu(V)}$ and such that

$\displaystyle R(f') \leq R(f)$

Now let us start from a function ${f}$ that optimizes (4), so that ${\sum_v f_v = 0}$ and ${R(f) = \lambda_2}$, then apply Lemma 5 to find a function ${f'}$ such that the volume of the vertices having positive coordinates in ${f'}$ is at most ${\frac 12 \mu(V)}$ and such that ${R(f') \leq R(f) = \lambda_2}$. Then consider the vector ${g\in {\mathbb R}^V}$ such that ${g_v := f'^2_v}$; by Lemma 4, we have ${R_1(g) \leq \sqrt{2 R(f')} \leq \sqrt{2\lambda_2}}$, and by Lemma 3 there is a threshold ${t}$ such that the set ${S:= \{ v: g(v) > t \}}$ has conductance ${h(S) \leq \sqrt {2\lambda_2}}$. Since ${S}$ is a subset of the vertices having positive coordinates in ${g}$, we have ${\mu(S) \leq \frac 12 \mu(V)}$, and so

$\displaystyle \phi(G) \leq \sqrt{2 \lambda_2 }$

which is the Cheeger inequality for graphs. It remains to prove the three lemmas.

Proof: of Lemma 3. For each threshold ${t}$, define the set

$\displaystyle S_t := \{ v: g_v > t \}$

The idea of the proof is that if we pick ${t}$ at random then the probability that an edge belongs to ${\partial S_t}$ is proportional to ${|g_u - g_v|}$ and the probability that ${v\in S_t}$ is proportional to ${|g_v|}$, so that the expected number of edges in ${\partial S_t}$ is proportional to the numerator of ${R_1(g)}$ and the expected number of vertices in ${S_t}$ is proportional to the denominator of ${R_1(g)}$; if ${R_1(g)}$ is small, it is not possible for ${\partial S_t/\mu(S_t)}$ to always be large for every ${t}$.

To avoid having to normalize the range of ${t}$ to be between ${0}$ and ${1}$, instead of taking averages over a random choice of ${t}$, we will consider the integral over all values of ${t}$. We have

$\displaystyle \int_0^\infty |\partial (S_t) | {\rm d} t = \sum_{u,v} |g_u - g_v |$

because we can write ${|\partial (S_t)| = \sum_{(u,v) \in E} I_{u,v} (t)}$, where ${I_{u,v} (t) = 1}$ if ${(u,v) \in \partial S_t}$ and ${I_{u,v}(t) = 0}$ otherwise, and we see that only the values of ${t}$ between ${g_u}$ and ${g_v}$ make ${I_{u,v}(t) =1}$, so ${\int_{0}^\infty I_{u,v} (t) {\rm d} t = |g_u - g_v|}$.

We also have

$\displaystyle \int_0^\infty d |S_t| {\rm d} t = d \sum_v |g_v|$

and if we denote by ${t^*}$ the threshold such that ${\phi(S_{t^*})}$ is smallest among all the ${\phi(S)}$, then

$\displaystyle \sum_{u,v} |g_u - g_v | = \int_0^\infty |\partial (S_t) | {\rm d} t$

$\displaystyle \geq \int_0^\infty h(S_{t^*}) d|S_t| {\rm d} t$

$\displaystyle = h(S_{t^*}) d \sum_v |g_v|$

so that

$\displaystyle h(S_{t^*}) \leq \frac { \sum_{u,v} |g_u - g_v | }{d \sum_v |g_v| } = R_1(g)$

$\Box$

Proof: of Lemma 3. Let us consider the numerator of ${R_1(f^2)}$; it is:

$\displaystyle \sum_{(u,v)\in E} |f^2_u - f^2_v|$

$\displaystyle = \sum_{(u,v)\in E} |f_u - f_v| \cdot (f_u + f_v)$

$\displaystyle \leq \sqrt{\sum_{(u,v)\in E} |f_u - f_v|^2} \sqrt{ \sum_{(u,v)\in E}(f_u + f_v)^2 }$

(we used Cauchy-Swarz)

$\displaystyle \leq \sqrt{R(f) \cdot d\sum_v f_v^2} \sqrt{ \sum_{(u,v)\in E} 2f_u^2 + 2f_v^2 }$

(we used the definition of ${R_2}$ and Cauchy-Swarz again)

$\displaystyle = \sqrt{R(f) \cdot d\sum_v |f_v|^2} \sqrt{ 2 d\sum_{v} f_v^2 }$

$\displaystyle = \sqrt{2 R(f) } \cdot d \sum_v f_v^2$

And so

$\displaystyle R_1(f^2) = \frac {\sum_{(u,v)\in E} |f^2_u - f^2_v|}{d \sum_v f_v^2} \leq \sqrt{2 R_2(f) }$

$\Box$

Proof: of Lemma 5. Let ${m}$ be the median of ${f}$, and consider ${\bar f}$ defined as ${\bar f_v := f_v - m}$. We have

$\displaystyle R(\bar f) \leq R(f)$

because the numerators of ${R(\bar f)}$ and ${R(f)}$ are the same (the additive term ${-m}$ cancels). The denominators are such that

$\displaystyle \sum_v \bar f_v^2 = || \bar f||^2 = || f||^2 + || - m \cdot {\bf 1} ||^2 \geq || f||^2 = \sum_v f_v^2$

because ${f}$ and the vector ${{\bf 1} = (1,\ldots,1)}$ are orthogonal, and so by Pythagoras’s theorem the length-squared of ${\bar f = f - m {\bf 1}}$ equals the length-squared of ${f}$ plus the length-squared of ${-m {\bf 1}}$.

Let us define ${f^+_v := \min\{ 0, \bar f_v\}}$ and ${f^-_v := \min \{ 0, -\bar f_v\}}$ so that ${\bar f = f^+ - f^-}$. We use the following fact:

Fact 6 Let ${a,b \in {\mathbb R}^V_{\geq 0}}$ be disjointly supported non-negative vectors (“disjointly supported” means that they are non-zero on disjoint subsets of coordinates), then

$\displaystyle \min\{ R(a) , R(b) \} \leq R(a-b)$

Proof: The numerator of ${R(a-b)}$ is

$\displaystyle \sum_{(u,v)\in E} |a_v - b_v + b_u - a_v|^2 \geq \sum_{(u,v)\in E} |a_v - a_u|^2 + |b_v - b_u|^2$

and, using orthogonality and Pythagoras’s theorem, the denominator of ${R(a-b)}$ is

$\displaystyle d || a- b||^2 = d||a ||^2 + d|| b||^2$

The fact now follows from the inequality

$\displaystyle \min \left \{ \frac {n_1}{d_1} , \frac{n_2}{d_2} \right \} \leq \frac{n_1+n_2}{d_1+d_2}$

$\Box$

The lemma now follows by observing that ${f^+}$ and ${f^-}$ are non-negative and disjointly supported, so

$\displaystyle \min\{ R(f^+) , R(f^-) \} \leq R(\bar f) \leq R(f)$

and that both ${f^+}$ and ${f^-}$ have at most ${n/2}$ non-zero coordinate. $\Box$

2. Proof of the Cheeger inequality in manifolds

We will now translate the proof of the graph Cheeger inequality to the setting of manifolds.

As you may remember, we started off by saying that ${L}$ is symmetric and so all its eigenvalues are real and they are given by the variational characterization. Now we are already in trouble because the operator ${L}$ on manifolds cannot be thought of as a matrix, so what does it mean for it to be symmetric? The consequence of symmetry that is exploited in the analysis of the spectrum of symmetric matrices is the fact that if ${A}$ is symmetric, then for every ${x,y}$ we have

$\displaystyle \langle x,Ay \rangle = x^T Ay = (A^Tx)^Ty = (Ax)^Ty = \langle Ax,y \rangle$

and the property ${\langle x, Ay \rangle = \langle Ax , y \rangle}$ makes no references to coordinates, and it is well defined even for linear operators over infinite-dimensional spaces, provided that there is a notion of inner product. If we the define the inner product

$\displaystyle \langle f,g \rangle := \int_M f(x)\cdot g(x) \ {\rm d} \mu(x)$

on functions ${f: M \rightarrow {\mathbb R}}$, and more generally

$\displaystyle \langle f,g \rangle := \int_M \langle f(x), g(x)\rangle_X \ {\rm d} \mu(x)$

for functions ${f: M \rightarrow X}$, where ${X}$ is a vector space with inner product ${\langle \cdot,\cdot,\rangle_X}$, then we can say that an operator ${A}$ is self-adjoint if

$\displaystyle \langle f, Ag \rangle = \langle Af ,g \rangle$

for all (appropriately restricted) functions ${f,g}$. If ${M}$ is compact, this property is true for the Laplacian, and, in particular, ${-{\rm div}}$ and ${\nabla}$ are adjoints of each others, that is,

$\displaystyle \langle \nabla f, g \rangle = \langle f, - {\rm div} g \rangle$

(The discrete analog would be that ${C^T}$ is the transpose of ${C}$.)

Self-adjointness (and appropriate conditions on ${M}$) imply a version of the spectral theorem and of the variational characterization. In particular, all eigenvalues of ${L}$ are real, and if there is a minimum one then it is

$\displaystyle \lambda_1 = \min_{f: M \rightarrow {\mathbb R}} \frac {\langle f, Lf \rangle}{\langle f,f\rangle}$

and if ${f_1}$ is a minimizer of the above, then

$\displaystyle \lambda_2 = \min_{f: M \rightarrow {\mathbb R},\ \langle f, f_1 \rangle = 0} \frac {\langle f, Lf \rangle}{\langle f,f\rangle}$

(The minimization is quantified over all functions that are square-integrable, and the minimum is achieved because if ${M}$ is compact then the space of such functions is also compact and the cost function that we are minimizing is continuous. In this post, whenever we talk about “all functions,” it should be understood that we are restricting to whatever space of functions makes sense in the context.)

From the property that ${-{\rm div}}$ and ${\nabla}$ are adjoint, we have

$\displaystyle \langle f, Lf \rangle = \langle f, -{\rm div} \nabla f \rangle = \langle \nabla f,\nabla f \rangle$

so

$\displaystyle \lambda_1 = \min_{f:M \rightarrow {\mathbb R}} \frac{ \int_M \langle \nabla f(x),\nabla f(x) \rangle \ {\rm d} \mu(x) } {\int M f^2(x) \ {\rm d} \mu (x)}$

where the Rayleigh quotient

$\displaystyle R(f):= \frac{ \int_M \langle \nabla f(x),\nabla f(x) \rangle \ {\rm d} \mu(x) } {\int M f^2(x) \ {\rm d} \mu (x)} = \frac {\int_M ||\nabla f(x)||^2 \ {\rm d} \mu (x)}{\int_M f^2(x) \ {\rm d} \mu (x)}$

is always non-negative, and it is zero for constant ${f\equiv 1}$, so we see that ${\lambda_1=0}$ and

$\displaystyle \lambda_2 = \min_{f:M \rightarrow {\mathbb R} : \int_M f=0 \ {\rm d} \mu} R(f)$

By analogy with the graph case, we define the “${\ell_1}$ Rayleigh quotient”

$\displaystyle R_1(g) := \frac {\int_M ||\nabla g(x)|| \ {\rm d} \mu (x)}{\int_M |g(x)| \ {\rm d} \mu (x)}$

And we can prove the analogs of the lemmas that we proved for graphs.

Lemma 7 (Rounding of ${\ell_1}$ embeddings) For every non-negative function ${g: V \rightarrow {\mathbb R}_{\geq 0}}$ there is a value ${t\geq 0}$ such that

$\displaystyle h(\{ x: g(x) > t \}) \leq R_1(g)$

where the Cheeger constant ${h(S)}$ of a subset ${S\subseteq M}$ of the manifold is

$\displaystyle h(S) := \frac{\mu_{n-1} (\partial S)}{\mu(S)}$

Lemma 8 (Embedding of ${\ell_2^2}$ into ${\ell_1}$) For every non-negative function ${f; m \rightarrow {\mathbb R}_{\geq 0}}$, we have

$\displaystyle R_1(f^2) \leq \sqrt{2 R(f)}$

Lemma 9 (From an eigenfunction to a non-negative function) For every function ${f: M \rightarrow R}$ such that ${\int_M f \ {\rm d} \mu =0}$ there is a non-negative ${f': M \rightarrow {\mathbb R}_{\geq 0}}$ such that ${\mu(\{ x: f'(x) >0 \}) \leq \frac 12 \mu(V)}$ and such that

$\displaystyle R(f') \leq R_2(f)$

Let us see the proof of these lemmas.

Proof: of Lemma 7. For each threshold ${t}$, define the set

$\displaystyle S_t := \{ x: g(x) > t \}$

Let ${t*}$ be a threshold for which ${h(S_{t*})}$ is minimized

We will integrate the numerator and denominator of ${h(S_t)}$ over all ${t}$. The coarea formula for nonnegative functions is

$\displaystyle \int_M || \nabla g || {\rm d} \mu = \int_{0}^\infty \mu_{n-1} ( \partial \{ x: g(x) > t \}) {\rm d} t$

and we also have

$\displaystyle \int_M |g| {\rm d} \mu = \int_{0}^\infty \mu ( \{ x: g(x) > t \}) {\rm d} t$

which combine to

$\displaystyle \int_M || \nabla g || {\rm d} t = \int_0^\infty \mu_{n-1} ( \partial S_t) {\rm d} t$

$\displaystyle = \int_0^\infty h(S_t) \mu(S_t) {\rm d} t$

$\displaystyle \geq h(S_{t^*}) \int_0^\infty \mu(S_t) {\rm d} t$

$\displaystyle = h(S_{t^*}) \int_M g {\rm d} t$

so that

$\displaystyle h(S_{t^*}) \leq \frac{\int_M || \nabla g || {\rm d} t }{ \int_M g {\rm d} t} = R_1(g)$

$\Box$

Proof: of Lemma 8. Let us consider the numerator of ${R_1(f^2)}$; it is:

$\displaystyle \int_M || \nabla f^2|| {\rm d} \mu$

We can apply the chain rule, and see that

$\displaystyle \nabla f^2(x) = 2f(x) \cdot \nabla f(x)$

which implies

$\displaystyle \int_M || \nabla f^2 || {\rm d} \mu$

$\displaystyle = \int_M || 2f(x) \nabla f(x)|| {\rm d} \mu(x)$

$\displaystyle = \int_M 2|f(x)| \cdot ||\nabla f(x)|| {\rm d} \mu (x)$

and, after applying Caucy-Swarz,

$\displaystyle \leq \sqrt{\int_M 4 f^2(x) {\rm d} \mu(x)} \cdot \sqrt{\int_M ||\nabla f(x)||^2 {\rm d} \mu(x)}$

$\displaystyle = 2 \cdot \left( \int_M f^2 {\rm d} \mu \right) \cdot \sqrt{ R(f)}$

And so

$\displaystyle R_1(f^2) \leq 2 \sqrt{R_2(f) }$

$\Box$

Proof: of Lemma 9. Let ${m}$ be a median of ${f}$, and consider ${\bar f}$ defined as ${\bar f(x) := f(x) - m}$. We have

$\displaystyle R(\bar f) \leq R(f)$

because the numerators of ${R(\bar f)}$ and ${R(f)}$ are the same (the derivatives of functions that differ by a constant are identical) and the denominators are such that

$\displaystyle \int_M \bar f^2{\rm d} \mu$

$\displaystyle = \int_M f^2 - 2mf + m^2 {\rm d} \mu$

$\displaystyle = \int_M f^2 + m^2 {\rm d} \mu$

$\displaystyle \geq \int_M f^2 {\rm d} \mu$

where we used the fact the integral of ${f}$ is zero.

Let us define ${f^+(x) := \min\{ 0, \bar f(x)\}}$ and ${f^-_v := \min \{ 0, -\bar f(x)\}}$ so that ${\bar f(x) = f^+(x) - f^-(x)}$. We use the following fact:

Fact 10 Let ${a,b : M \rightarrow {\mathbb R}_{\geq 0}}$ be disjointly supported non-negative functions (“disjointly supported” means that they are non-zero on disjoint subsets of inputs), then

$\displaystyle \min\{ R(a) , R(b) \} \leq R(a-b)$

Proof: We begin with the following observation: if ${a}$ is a non-negative function, and ${a(x)=0}$, then ${\nabla a(x) = \{ \bf 0 \}}$, because ${x}$ has to be a local minimum.

Consider the expression ${||\nabla (a-b)||^2}$ occurring in the numerator of ${R(a-b)}$. We have

$\displaystyle ||\nabla (a-b)||^2$

$\displaystyle = || \nabla a - \nabla b ||^2$

$\displaystyle = || \nabla a||^2 + || \nabla b||^2 - 2 \langle \nabla a,\nabla b \rangle$

But

$\displaystyle \langle \nabla a,\nabla b \rangle = 0$

because for every ${x}$ at least one of ${a(x)}$ or ${b(x)}$ is zero, and so at least one of ${\nabla a(x)}$ or ${\nabla b(x)}$ is zero.

Using this fact, we have that the numerator of ${R(a-b)}$ is equal to the sum of the numerators of ${R(a)}$ and ${R(b)}$:

$\displaystyle \int_M ||\nabla a-b||^2 {\rm d} \mu = \int_M ||\nabla a-b||^2 {\rm d} \mu + \int_M ||\nabla a-b||^2 {\rm d}$

and the denominator of ${R(a-b)}$ is also the sum of the denominators of ${R(a)}$ and ${R(b)}$:

$\displaystyle =\int_M (a-b)^2 {\rm d} \mu$

$\displaystyle =\int_M a^2 {\rm d} \mu + \int_M b^2 {\rm d} \mu - 2 \int_M ab {\rm d} \mu$

$\displaystyle =\int_M a^2 {\rm d} \mu + \int_M b^2 {\rm d} \mu$

because ${a(x)b(x)=0}$ for every ${x}$. The fact now follows from the inequality

$\displaystyle \min \left \{ \frac {n_1}{d_1} , \frac{n_2}{d_2} \right \} \leq \frac{n_1+n_2}{d_1+d_2}$

$\Box$

The lemma now follows by observing that ${f^+}$ and ${f^-}$ are non-negative and disjointly supported, so

$\displaystyle \min\{ R(f^+) , R(f^-) \} \leq R(\bar f) \leq R(f)$

and that both ${f^+}$ and ${f^-}$ have a support of volume at most ${\frac 12 \mu(M)}$. $\Box$

If anybody is still reading, it is worth observing a couple of differences between the discrete proof and the continuous proof.

The ${\ell_1}$ Rayleigh quotient is defined slightly differently in the continuous case. It would correspond to defining it as

$\displaystyle \frac {\sum_{u\in V} \sqrt{\sum_{v: (u,v)\in E} (f_u-f_v)^2 } }{d \sum_v |f_v|}$

in the discrete case.

If ${a,b \in {\mathbb R}^V}$ are disjointly supported and nonnegative, the sum of the numerators of the Rayleigh quotients ${R(a)}$ and ${R(-b)}$ can be strictly smaller than the numerator of ${R(a-b)}$, while we always have equality in the continuous case. In the discrete case, the sum of the numerators of ${R(a)}$ and ${R(b)}$ can be up to twice the numerator of ${R(a+b)}$ (this fact is useful, but it did not come up in this proof), while again we have exact equality in the continuous case.

The chain rule calculation

$\displaystyle \nabla f^2 = 2f \nabla f$

corresponds to the step

$\displaystyle f_v^2 - f_u^2 = (f_v + f_u) \cdot (f_v - f_u)$

In the continuous case, ${f_v}$ and ${f_u}$ are “infinitesimally close”, so we can approximate ${f_v + f_u}$ by ${2f_v}$.

1. Why do you take $f\in\mathbb R^V$? Although the eigenvalues are real numbers, the eigenvectors generally are not real. At least in Cayley graphs over Abelian groups each group character is an eigenvector. This is mostly a problem for Lemma 5.