Proof of Cheeger's Inequality

August 24, 2024

Themistoklis Haris

Continuing on our journey through expander graphs, we prove the essential Cheeger inequality. We define $e(S,T)$ as the number of edges going from $S$ to $T$ . $\partial S$ is the size of the edge boundary set $E(S,V\setminus S)$ . Recall that the isoperimetric number or Cheeger constant of a graph is the infimum ratio $\frac{|\partial S|}{e(S,V)}$ over all subsets $S$ of size at most $|V|/2$ .

We will be thinking of $d$ -regular graphs for which we have considered the notion of spectral expansion through the spectral gap of their normalized adjacency matrix $A_G$ : $\lambda(G) := \max\{|\lambda_2|,|\lambda_n|\} \leq \lambda$ .

The Cheeger inequality gives us a way to relate the two notions of expansion.

Theorem

The Cheeger inequality states that:
$\frac{1-\lambda_2}{2}\leq h(G) \leq \sqrt{2(1-\lambda_2)}$

Let's prove it! We'll be working off of the wonderful [notes][https://cseweb.ucsd.edu/classes/sp21/cse291-g/] from Shachar Lovett's class with small changes and simplifications at some parts. The proof has two parts, one for each side of the inequality.

Lower Bound

Let us first remember the Courant-Fischer (CF) Theorem, which gives a characterization of the spectrum in terms of Rayleigh coefficients. If $M$ is a symmetric matrix with eigenvalues $v_1\leq...\leq v_n$ and corresponding eigenvectors $\psi_1,...,\psi_n$ . Then we have that:

v_i = \min\limits_{\substack{||x||=1\\x^T \psi_j = 0,j<i}}x^T M x

Now, let's remember the Laplacian matrix of $G$ to be defined as $L = I-A_G$ . Let $0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n$ be the spectrum of $L$ . We know that $\mu_2 = 1-\lambda_2$ so it suffices to show that $\mu_2 \leq 2h(G)$ . By CF, we have that:

\mu_2 = \min\limits_{x\perp 1^n}\frac{x^TLx}{x^T x}

So, all we have to do is find some $x\in\mathbb{R}^n$ that satisfies $\frac{x^T L x}{x^T x} \leq 2h(G)$ . We know that $h(G)$ is a minimum over all cuts, so let's define a cut $S \subset V$ and consider its indicator vector:

x = 1_S= \begin{cases} 1&, \text{if } v\in S\\ 0&, \text{otherwise} \end{cases}

Now we have to do some calculations:

$x^T x = |S|$ .
$x^T L x = \sum\limits_{(u,v)\in E}\frac{1}{d}(x_u - x_v)^2 = \frac{|\partial S|}{d}$ .

Cool trick

We would be remiss if we did not stand and appreciate the power of examining Laplacians through their quadratic forms. This is a very neat trick and powerful tool for interpreting $L$ . Specifically, we can write all quadratic forms as:
$x^T L x = \sum\limits_{(u,v) \in E}w_{uv}(x_u-x_v)^2$
It just so happens here that $w_{uv} = \frac{1}{d}$ for all edges.

So then $\frac{x^T L x}{x^T x} = \frac{|\partial S|}{|S|}$ . We're on the right track! Recall that $h(G) = h(S) = \frac{|\partial S|}{e(S,V)}$ for some $S$ and we have that $e(S,V) = d|S|$ . We'll have to change $x$ a little. Of course, we have another constraint to satisfy: we must not forget that the $x$ we choose must be orthogonal to $1^n$ . To do this, define:

x_S= \begin{cases} 1-\frac{|S|}{n}&, \text{if } v\in S\\ -\frac{|S|}{n}&, \text{otherwise} \end{cases}

So we just subtract $\frac{|S|}{n}$ from $1_S$ . Now, what happens? Let's try to go through the calculations again:

$\langle x_S, 1^n \rangle = |S|-\frac{|S|^2}{n} + (n-|S|)(-\frac{|S|}{n}) = 0$ , so we get our orthogonality.
$x_S^T x_S = |S|(1-\frac{|S|}{n})^2 + (n-|S|)\frac{|S|^2}{n^2} = ... =\frac{|S|\cdot |S^c|}{n}$ , where $S^c = V\setminus S$ .
$x_S^T L x_S = \sum\limits_{(u,v)\in E}\frac{1}{d}(x_S(u)-x_S(v))^2 = \frac{|\partial S|}{d}$

Now what do we get if we form the Rayleigh coefficient?

\frac{x_S^T L x_S}{x_S^T x_S} = \frac{n|\partial S|}{d|S||S^c|}

Now we know that $|S^c| \leq \frac{n}{2}$ and $\frac{|\partial S|}{d|S|} = h(S)$ meaning that we get what we were looking for:

\frac{x_S^T L x_S}{x_S^T x_S} \leq 2h(G)

and that proves our upper bound!

Upper Bound

This is the harder part of the proof. There are a few moving parts. We want to prove that $h(G) \leq \sqrt{2\mu_2}$ . We know that $\mu_2 = \frac{x^T L x}{x^T x}$ for some $x \perp 1^n$ , meaning that we need to show

h(G) \leq \sqrt{2\frac{x^T L x}{x^T x}}

for that $x$ . However, we don't know what that $x$ is, and we don't really care to know. We'll prove something (way) stronger. If we have any $x \perp 1^n$ then we can find some cut $S$ for which

\max\{h(S),h(S^c)\} \leq \sqrt{2\frac{x^T L x}{x^T x}}

Of course, $h(G) \leq \max\{h(S),h(S^c)\}$ for any cut $S$ so this seems we're asking for too much. But this turns out to be just enough! The other concern is how can we create a cut $S$ from some arbitrary vector $x$ . In the lower bound proof we did the easier thing: we created a vector from a cut. The opposite requires some more ingenuity.

Fiedler's Algorithm

The task is really about finding sparse cuts. Fiedler*'s algorithm produces sparse cuts from vectors. Given a vector $x$ , proceed as follows:

Sort the coordinates of $x$ as: $x_{v_1} \leq \cdots x_{v_n}$ .
For $i \in [n]$ :
1. Let $S_i := \{v_i,...,v_n\}$
2. Compute $M_i = \max\{h(S_i), h(S_i^c)\}$
Output the cut $S$ with the minimum $M_i$ (the sparsest one)

Our crux is the following lemma about the sparsity of the cut Fiedler's algorithm produces:

Lemma

Let $x \geq 0$ and $x \neq 0$ . Fiedler's algorithm produces a cut $S$ with
$h(S) \leq \sqrt{2\frac{x^T L x}{x_T x}}$
The form of the cut is $S = S^t = \{v : x_v \geq t\}$ for some $t > 0$ .

This is all about understanding what Fiedler does. Notice that we aren't tackling the full case $x \perp 1$ . We are just thinking about $x > 0$ , but massaging things will slowly get us to the right solution.

To prove the lemma, we'll use the probabilistic method. If you aren't familiar with it, all it says is that to prove an object with some property exists in some universe you just have to show that the probability of choosing an object with this property is strictly positive under some distribution over the universe. For us, the objects are cuts, so we are looking for a distribution $D$ over all cuts such that:

\Pr_{S_i \sim D}\left[h(S_i) \leq \sqrt{2\frac{x^T L x}{x^T x}}\right] > 0

Now, dealing with probabilities is often quite inconvenient. A classic trick is to use an averaging argument so we deal with expectations instead. If $Z$ is a random variable with $\mathbb{E}[Z] = 0$ , then $\Pr[Z \leq 0] > 0$ . So we'll rephrase our goal in terms of that. Let $X = e(S_i, V\setminus S_i), Y = d|S_i|$ and $r = \mathbb{E}[X]/\mathbb{E}[Y]$ . By linearity of expectation, we have that $\mathbb{E}[X-rY] = 0$ , so $\Pr[X/Y \leq r] > 0$ .

Thus, it suffices to show that for some distribution $D$ over cuts we have that:

r = \frac{\mathbb{E}[e(S_i, V\setminus S_i)]}{d\cdot \mathbb{E}[|S_i|]} \leq \sqrt{2\frac{x^T L x}{x^T x}}

WLOG, let's assume that $x_{v_n} = 1$ , as the Rayleigh coefficient is immune to scaling. Now, for $t \in [0,1]$ , define $S^t := \{v : x_v \geq t\}$ . Let's make the important observation that each $S^t$ is some $S_i$ considered by Fiedler. Why? Because $S_i := \{v_i,...,v_n\}$ can be defined as all $v \in V$ for which $x_v \geq x_{v_i} = t_i$ due to our ordering. And this sequence $\{t_i\}_{i=1}^n$ is increasing in $i$ . Conversely, if we pick some $t \in [t_i, t_{i+1}]$ , the set $S_i$ doesn't change.

Enough prep-work! Time to see what our distribution $D$ should be.

Making our distribution

We draw $S^t$ such that $t^2$ is uniformly drawn from $[0,1]$ .

This is a strangest part of possibly the whole proof. Why would we do that? To give some intuition, think about what would happen if we draw $S^t$ so that $t$ is uniform at random from $[0,1]$ . That seems like a more natural choice. Then:

d\cdot \mathbb{E}[|S^t|] = d \sum \limits_{v \in V}\Pr[v \in S^t] = d\sum\limits_{v \in V}\Pr[x_v \geq t] = d\sum\limits_{v\in V}x_v

Here's the problem! We want this to be in our denominator, so we want it to equal $x^T x$ . That would be the case if the final sum was $\sum x_v^2$ , which is what we get with our chosen distribution.

The other term has some more algebra:

\mathbb{E}[e(S^t, V\setminus S^t)] = \sum\limits_{\{u,v\} \in E}\Pr[\{u,v\} \text{ is cut}] = \sum\limits_{\{u,v\} \in E}\Pr[x_u \leq t \leq x_v]

= \sum\limits_{\{u,v\} \in E}|x_u^2 - x_v^2| = \sum\limits_{\{u,v\} \in E}|x_u-x_v|\cdot (x_u+x_v)\leq \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{\sum\limits_{\{u,v\} \in E}(x_u+x_v)^2}

by Cauchy-Schwarz. Then the last term can be bounded as:

...\leq \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{\sum\limits_{\{u,v\} \in E}2x_u^2 + 2x_v^2} = \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{2dx^T x}

Then finally we can see that:

\frac{\mathbb{E}[e(S_i, V\setminus S_i)]}{d\cdot \mathbb{E}[|S_i|]} \leq \sqrt{2\frac{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}{dx^T x}} \leq \sqrt{2\frac{x^T L x}{x^T x}}

as desired.

That's nice. We concluded the proof to our lemma. But again, this was for $x \geq 0$ which is impossible if we want $x \perp 1^n$ . We'll need another lemma to get past this hurdle:

Lemma

Let $x \perp 1^n$ . There exists a vector $y\geq 0$ with support at most $|V|/2$ such that
$\frac{y^T L y}{y^T y} \leq \frac{x^T L x}{x^T x}$
Additionally, every cut $S_y^t$ that Fiedler considers on $y$ is also considered on $x$ .

First, what does this give us? Consider $x \perp 1^n$ and let $y \geq 0$ be the vector we are guaranteed from the lemma. We run Fiedler's algorithm on $y$ and get a cut $S^t$ , for which our previous lemma says that:

h(S^t) \leq \sqrt{2\frac{y^T L y}{y^T y}} \leq \sqrt{2\frac{x^T L x}{x^T x}}

Because $S^t$ is considered by Fiedler's algorithm on $x$ as well and because $|S^t| \leq \frac{|V|}{2}$ we are comfortably done:

h(S^t) = \max\{h(S^t),h(V\setminus S^t)\} \leq \sqrt{2\frac{x^T L x}{x^T x}}

as desired.

So the final thing to do is prove our lemma above. First, note that if we shift $x$ be a constant we can only lower its Rayleigh coefficient (because $1^n$ is an eigenvector of $L$ with eigenvalue $0$ )

\frac{(x+1^n c)^T L (x+1^n c)}{(x+1^n c)^T (x+1^n c)} \leq \frac{x^T L x}{x^T x}

Now, let $m$ be the median of $x$ and $x' = x-1^n m$ . We know that $x'$ has at most $|V|/2$ positive and at most $|V|/2$ negative coordinates. Define $x^+$ and $x^-$ according to the absolute values of the coordinates of $x'$ that are positive and negative. These two vectors have support $\leq |V|/2$ and are both positive. Then we have that

x' = x^+ - x^-

Now imagine running Fiedler on $x'$ . For any $t > 0$ , the cuts $\{v : x_v^+ \geq t\}$ and $\{v:x_v^- \geq t\}$ are both considered. Why? It's because the coordinates of $x'$ are ordered and both $S^T$ and $V\setminus S^t$ are considered as cuts. Now we willshow that either $x^+$ or $x^-$ has a Rayleigh coefficient at most $\frac{x'^T L x'}{x'^T x'}$ . We can calculate:

\frac{(x')^T L (x')}{(x')^T (x')} = \frac{\sum\limits_{\{u,v\} \in E}\frac{1}{d}(x_u-x_v)^2}{d(x')^T (x')} = \frac{\sum\limits_{\{u,v\} \in E}((x_u^+ - x_v^+)-(x_u^- - x_v^-))^2}{(x^+)^T (x^+) + (x^-)^T (x^-)}

\geq \frac{\sum\limits_{\{u,v\} \in E}((x_u^+ - x_v^+)^2 + \sum\limits_{\{u,v\} \in E}((x_u^- - x_v^-)^2}{(x^+)^T (x^+) + (x^-)^T (x^-)}

\geq \frac{\frac{(x^+)^T L (x^+)}{(x^+)^T (x^+)}(x^+)^T (x^+)}{(x^+)^T (x^+) + (x^-)^T (x^-)} + \frac{\frac{(x^-)^T L (x^-)}{(x^-)^T (x^-)}(x^-)^T (x^-)}{(x^+)^T (x^+) + (x^-)^T (x^-)}

\geq \min\left\{\frac{(x^+)^T L (x^+)}{(x^+)^T (x^+)}, \frac{(x^-)^T L (x^-)}{(x^-)^T (x^-)}\right\}

And that settles it!

Recap

To recap this long proof a little bit:

The lower bound was all about Courant-Fischer and recognizing the role of the Laplacian in this proof. All we had to do is convert a vector into a cut.
The upper bound required making a sparse cut from a vector. This is something one can do using Fiedler's algorithm.
- However, are we producing a sparse enough cut?
- We first prove that we do if the vector is positive by using the probabilistic method and an averaging argument.
  - Our distribution of choice is a little funky, but it makes sense if we look closely at the calculations.
- Then we extend the lemma to all $x \perp 1^n$ by looking at positive and negative entries of $x$ separately.