Proof of Cheeger's Inequality

Themistoklis Haris

Themistoklis Haris

Continuing on our journey through expander graphs, we prove the essential Cheeger inequality. We define e(S,T)e(S,T) as the number of edges going from SS to TT. S\partial S is the size of the edge boundary set E(S,VS)E(S,V\setminus S). Recall that the isoperimetric number or Cheeger constant of a graph is the infimum ratio Se(S,V)\frac{|\partial S|}{e(S,V)} over all subsets SS of size at most V/2|V|/2.

We will be thinking of dd-regular graphs for which we have considered the notion of spectral expansion through the spectral gap of their normalized adjacency matrix AGA_G: λ(G):=max{λ2,λn}λ\lambda(G) := \max\{|\lambda_2|,|\lambda_n|\} \leq \lambda.

The Cheeger inequality gives us a way to relate the two notions of expansion.

Theorem

The Cheeger inequality states that:

1λ22h(G)2(1λ2)\frac{1-\lambda_2}{2}\leq h(G) \leq \sqrt{2(1-\lambda_2)}

Let's prove it! We'll be working off of the wonderful [notes][https://cseweb.ucsd.edu/classes/sp21/cse291-g/] from Shachar Lovett's class with small changes and simplifications at some parts. The proof has two parts, one for each side of the inequality.

Lower Bound

Let us first remember the Courant-Fischer (CF) Theorem, which gives a characterization of the spectrum in terms of Rayleigh coefficients. If MM is a symmetric matrix with eigenvalues v1...vnv_1\leq...\leq v_n and corresponding eigenvectors ψ1,...,ψn\psi_1,...,\psi_n. Then we have that:

vi=minx=1xTψj=0,j<ixTMxv_i = \min\limits_{\substack{||x||=1\\x^T \psi_j = 0,j<i}}x^T M x

Now, let's remember the Laplacian matrix of GG to be defined as L=IAGL = I-A_G. Let 0=μ1μ2μn0 = \mu_1 \leq \mu_2 \leq \cdots \leq \mu_n be the spectrum of LL. We know that μ2=1λ2\mu_2 = 1-\lambda_2 so it suffices to show that μ22h(G)\mu_2 \leq 2h(G). By CF, we have that:

μ2=minx1nxTLxxTx\mu_2 = \min\limits_{x\perp 1^n}\frac{x^TLx}{x^T x}

So, all we have to do is find some xRnx\in\mathbb{R}^n that satisfies xTLxxTx2h(G)\frac{x^T L x}{x^T x} \leq 2h(G). We know that h(G)h(G) is a minimum over all cuts, so let's define a cut SVS \subset V and consider its indicator vector:

x=1S={1,if vS0,otherwisex = 1_S= \begin{cases} 1&, \text{if } v\in S\\ 0&, \text{otherwise} \end{cases}

Now we have to do some calculations:

  • xTx=Sx^T x = |S|.
  • xTLx=(u,v)E1d(xuxv)2=Sdx^T L x = \sum\limits_{(u,v)\in E}\frac{1}{d}(x_u - x_v)^2 = \frac{|\partial S|}{d}.
Cool trick

We would be remiss if we did not stand and appreciate the power of examining Laplacians through their quadratic forms. This is a very neat trick and powerful tool for interpreting LL. Specifically, we can write all quadratic forms as:

xTLx=(u,v)Ewuv(xuxv)2x^T L x = \sum\limits_{(u,v) \in E}w_{uv}(x_u-x_v)^2

It just so happens here that wuv=1dw_{uv} = \frac{1}{d} for all edges.

So then xTLxxTx=SS\frac{x^T L x}{x^T x} = \frac{|\partial S|}{|S|}. We're on the right track! Recall that h(G)=h(S)=Se(S,V)h(G) = h(S) = \frac{|\partial S|}{e(S,V)} for some SS and we have that e(S,V)=dSe(S,V) = d|S|. We'll have to change xx a little. Of course, we have another constraint to satisfy: we must not forget that the xx we choose must be orthogonal to 1n1^n. To do this, define:

xS={1Sn,if vSSn,otherwisex_S= \begin{cases} 1-\frac{|S|}{n}&, \text{if } v\in S\\ -\frac{|S|}{n}&, \text{otherwise} \end{cases}

So we just subtract Sn\frac{|S|}{n} from 1S1_S. Now, what happens? Let's try to go through the calculations again:

  • xS,1n=SS2n+(nS)(Sn)=0\langle x_S, 1^n \rangle = |S|-\frac{|S|^2}{n} + (n-|S|)(-\frac{|S|}{n}) = 0, so we get our orthogonality.
  • xSTxS=S(1Sn)2+(nS)S2n2=...=SScnx_S^T x_S = |S|(1-\frac{|S|}{n})^2 + (n-|S|)\frac{|S|^2}{n^2} = ... =\frac{|S|\cdot |S^c|}{n}, where Sc=VSS^c = V\setminus S.
  • xSTLxS=(u,v)E1d(xS(u)xS(v))2=Sdx_S^T L x_S = \sum\limits_{(u,v)\in E}\frac{1}{d}(x_S(u)-x_S(v))^2 = \frac{|\partial S|}{d}

Now what do we get if we form the Rayleigh coefficient?

xSTLxSxSTxS=nSdSSc\frac{x_S^T L x_S}{x_S^T x_S} = \frac{n|\partial S|}{d|S||S^c|}

Now we know that Scn2|S^c| \leq \frac{n}{2} and SdS=h(S)\frac{|\partial S|}{d|S|} = h(S) meaning that we get what we were looking for:

xSTLxSxSTxS2h(G)\frac{x_S^T L x_S}{x_S^T x_S} \leq 2h(G)

and that proves our upper bound!

Upper Bound

This is the harder part of the proof. There are a few moving parts. We want to prove that h(G)2μ2h(G) \leq \sqrt{2\mu_2}. We know that μ2=xTLxxTx\mu_2 = \frac{x^T L x}{x^T x} for some x1nx \perp 1^n, meaning that we need to show

h(G)2xTLxxTxh(G) \leq \sqrt{2\frac{x^T L x}{x^T x}}

for that xx. However, we don't know what that xx is, and we don't really care to know. We'll prove something (way) stronger. If we have any x1nx \perp 1^n then we can find some cut SS for which

max{h(S),h(Sc)}2xTLxxTx\max\{h(S),h(S^c)\} \leq \sqrt{2\frac{x^T L x}{x^T x}}

Of course, h(G)max{h(S),h(Sc)}h(G) \leq \max\{h(S),h(S^c)\} for any cut SS so this seems we're asking for too much. But this turns out to be just enough! The other concern is how can we create a cut SS from some arbitrary vector xx. In the lower bound proof we did the easier thing: we created a vector from a cut. The opposite requires some more ingenuity.

Fiedler's Algorithm

The task is really about finding sparse cuts. Fiedler*'s algorithm produces sparse cuts from vectors. Given a vector xx, proceed as follows:

  1. Sort the coordinates of xx as: xv1xvnx_{v_1} \leq \cdots x_{v_n}.
  2. For i[n]i \in [n]:
    1. Let Si:={vi,...,vn}S_i := \{v_i,...,v_n\}
    2. Compute Mi=max{h(Si),h(Sic)}M_i = \max\{h(S_i), h(S_i^c)\}
  3. Output the cut SS with the minimum MiM_i (the sparsest one)

Our crux is the following lemma about the sparsity of the cut Fiedler's algorithm produces:

Lemma

Let x0x \geq 0 and x0x \neq 0. Fiedler's algorithm produces a cut SS with

h(S)2xTLxxTxh(S) \leq \sqrt{2\frac{x^T L x}{x_T x}}

The form of the cut is S=St={v:xvt}S = S^t = \{v : x_v \geq t\} for some t>0t > 0.

This is all about understanding what Fiedler does. Notice that we aren't tackling the full case x1x \perp 1. We are just thinking about x>0x > 0, but massaging things will slowly get us to the right solution.

To prove the lemma, we'll use the probabilistic method. If you aren't familiar with it, all it says is that to prove an object with some property exists in some universe you just have to show that the probability of choosing an object with this property is strictly positive under some distribution over the universe. For us, the objects are cuts, so we are looking for a distribution DD over all cuts such that:

PrSiD[h(Si)2xTLxxTx]>0\Pr_{S_i \sim D}\left[h(S_i) \leq \sqrt{2\frac{x^T L x}{x^T x}}\right] > 0

Now, dealing with probabilities is often quite inconvenient. A classic trick is to use an averaging argument so we deal with expectations instead. If ZZ is a random variable with E[Z]=0\mathbb{E}[Z] = 0, then Pr[Z0]>0\Pr[Z \leq 0] > 0. So we'll rephrase our goal in terms of that. Let X=e(Si,VSi),Y=dSiX = e(S_i, V\setminus S_i), Y = d|S_i| and r=E[X]/E[Y]r = \mathbb{E}[X]/\mathbb{E}[Y]. By linearity of expectation, we have that E[XrY]=0\mathbb{E}[X-rY] = 0, so Pr[X/Yr]>0\Pr[X/Y \leq r] > 0.

Thus, it suffices to show that for some distribution DD over cuts we have that:

r=E[e(Si,VSi)]dE[Si]2xTLxxTxr = \frac{\mathbb{E}[e(S_i, V\setminus S_i)]}{d\cdot \mathbb{E}[|S_i|]} \leq \sqrt{2\frac{x^T L x}{x^T x}}

WLOG, let's assume that xvn=1x_{v_n} = 1, as the Rayleigh coefficient is immune to scaling. Now, for t[0,1]t \in [0,1], define St:={v:xvt}S^t := \{v : x_v \geq t\}. Let's make the important observation that each StS^t is some SiS_i considered by Fiedler. Why? Because Si:={vi,...,vn}S_i := \{v_i,...,v_n\} can be defined as all vVv \in V for which xvxvi=tix_v \geq x_{v_i} = t_i due to our ordering. And this sequence {ti}i=1n\{t_i\}_{i=1}^n is increasing in ii. Conversely, if we pick some t[ti,ti+1]t \in [t_i, t_{i+1}], the set SiS_i doesn't change.

Enough prep-work! Time to see what our distribution DD should be.

  • Making our distribution

We draw StS^t such that t2t^2 is uniformly drawn from [0,1][0,1].

This is a strangest part of possibly the whole proof. Why would we do that? To give some intuition, think about what would happen if we draw StS^t so that tt is uniform at random from [0,1][0,1]. That seems like a more natural choice. Then:

dE[St]=dvVPr[vSt]=dvVPr[xvt]=dvVxvd\cdot \mathbb{E}[|S^t|] = d \sum \limits_{v \in V}\Pr[v \in S^t] = d\sum\limits_{v \in V}\Pr[x_v \geq t] = d\sum\limits_{v\in V}x_v

Here's the problem! We want this to be in our denominator, so we want it to equal xTxx^T x. That would be the case if the final sum was xv2\sum x_v^2, which is what we get with our chosen distribution.

The other term has some more algebra:

E[e(St,VSt)]={u,v}EPr[{u,v} is cut]={u,v}EPr[xutxv]\mathbb{E}[e(S^t, V\setminus S^t)] = \sum\limits_{\{u,v\} \in E}\Pr[\{u,v\} \text{ is cut}] = \sum\limits_{\{u,v\} \in E}\Pr[x_u \leq t \leq x_v]
={u,v}Exu2xv2={u,v}Exuxv(xu+xv){u,v}E(xuxv)2{u,v}E(xu+xv)2= \sum\limits_{\{u,v\} \in E}|x_u^2 - x_v^2| = \sum\limits_{\{u,v\} \in E}|x_u-x_v|\cdot (x_u+x_v)\leq \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{\sum\limits_{\{u,v\} \in E}(x_u+x_v)^2}

by Cauchy-Schwarz. Then the last term can be bounded as:

...{u,v}E(xuxv)2{u,v}E2xu2+2xv2={u,v}E(xuxv)22dxTx...\leq \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{\sum\limits_{\{u,v\} \in E}2x_u^2 + 2x_v^2} = \sqrt{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}\sqrt{2dx^T x}

Then finally we can see that:

E[e(Si,VSi)]dE[Si]2{u,v}E(xuxv)2dxTx2xTLxxTx\frac{\mathbb{E}[e(S_i, V\setminus S_i)]}{d\cdot \mathbb{E}[|S_i|]} \leq \sqrt{2\frac{\sum\limits_{\{u,v\} \in E}(x_u-x_v)^2}{dx^T x}} \leq \sqrt{2\frac{x^T L x}{x^T x}}

as desired.


That's nice. We concluded the proof to our lemma. But again, this was for x0x \geq 0 which is impossible if we want x1nx \perp 1^n. We'll need another lemma to get past this hurdle:

Lemma

Let x1nx \perp 1^n. There exists a vector y0y\geq 0 with support at most V/2|V|/2 such that

yTLyyTyxTLxxTx\frac{y^T L y}{y^T y} \leq \frac{x^T L x}{x^T x}

Additionally, every cut SytS_y^t that Fiedler considers on yy is also considered on xx.

First, what does this give us? Consider x1nx \perp 1^n and let y0y \geq 0 be the vector we are guaranteed from the lemma. We run Fiedler's algorithm on yy and get a cut StS^t, for which our previous lemma says that:

h(St)2yTLyyTy2xTLxxTxh(S^t) \leq \sqrt{2\frac{y^T L y}{y^T y}} \leq \sqrt{2\frac{x^T L x}{x^T x}}

Because StS^t is considered by Fiedler's algorithm on xx as well and because StV2|S^t| \leq \frac{|V|}{2} we are comfortably done:

h(St)=max{h(St),h(VSt)}2xTLxxTxh(S^t) = \max\{h(S^t),h(V\setminus S^t)\} \leq \sqrt{2\frac{x^T L x}{x^T x}}

as desired.


So the final thing to do is prove our lemma above. First, note that if we shift xx be a constant we can only lower its Rayleigh coefficient (because 1n1^n is an eigenvector of LL with eigenvalue 00)

(x+1nc)TL(x+1nc)(x+1nc)T(x+1nc)xTLxxTx\frac{(x+1^n c)^T L (x+1^n c)}{(x+1^n c)^T (x+1^n c)} \leq \frac{x^T L x}{x^T x}

Now, let mm be the median of xx and x=x1nmx' = x-1^n m. We know that xx' has at most V/2|V|/2 positive and at most V/2|V|/2 negative coordinates. Define x+x^+ and xx^- according to the absolute values of the coordinates of xx' that are positive and negative. These two vectors have support V/2\leq |V|/2 and are both positive. Then we have that

x=x+xx' = x^+ - x^-

Now imagine running Fiedler on xx'. For any t>0t > 0, the cuts {v:xv+t}\{v : x_v^+ \geq t\} and {v:xvt}\{v:x_v^- \geq t\} are both considered. Why? It's because the coordinates of xx' are ordered and both STS^T and VStV\setminus S^t are considered as cuts. Now we willshow that either x+x^+ or xx^- has a Rayleigh coefficient at most xTLxxTx\frac{x'^T L x'}{x'^T x'}. We can calculate:

(x)TL(x)(x)T(x)={u,v}E1d(xuxv)2d(x)T(x)={u,v}E((xu+xv+)(xuxv))2(x+)T(x+)+(x)T(x)\frac{(x')^T L (x')}{(x')^T (x')} = \frac{\sum\limits_{\{u,v\} \in E}\frac{1}{d}(x_u-x_v)^2}{d(x')^T (x')} = \frac{\sum\limits_{\{u,v\} \in E}((x_u^+ - x_v^+)-(x_u^- - x_v^-))^2}{(x^+)^T (x^+) + (x^-)^T (x^-)}
{u,v}E((xu+xv+)2+{u,v}E((xuxv)2(x+)T(x+)+(x)T(x)\geq \frac{\sum\limits_{\{u,v\} \in E}((x_u^+ - x_v^+)^2 + \sum\limits_{\{u,v\} \in E}((x_u^- - x_v^-)^2}{(x^+)^T (x^+) + (x^-)^T (x^-)}
(x+)TL(x+)(x+)T(x+)(x+)T(x+)(x+)T(x+)+(x)T(x)+(x)TL(x)(x)T(x)(x)T(x)(x+)T(x+)+(x)T(x)\geq \frac{\frac{(x^+)^T L (x^+)}{(x^+)^T (x^+)}(x^+)^T (x^+)}{(x^+)^T (x^+) + (x^-)^T (x^-)} + \frac{\frac{(x^-)^T L (x^-)}{(x^-)^T (x^-)}(x^-)^T (x^-)}{(x^+)^T (x^+) + (x^-)^T (x^-)}
min{(x+)TL(x+)(x+)T(x+),(x)TL(x)(x)T(x)}\geq \min\left\{\frac{(x^+)^T L (x^+)}{(x^+)^T (x^+)}, \frac{(x^-)^T L (x^-)}{(x^-)^T (x^-)}\right\}

And that settles it!


Recap

To recap this long proof a little bit:

  • The lower bound was all about Courant-Fischer and recognizing the role of the Laplacian in this proof. All we had to do is convert a vector into a cut.
  • The upper bound required making a sparse cut from a vector. This is something one can do using Fiedler's algorithm.
    • However, are we producing a sparse enough cut?
    • We first prove that we do if the vector is positive by using the probabilistic method and an averaging argument.
      • Our distribution of choice is a little funky, but it makes sense if we look closely at the calculations.
    • Then we extend the lemma to all x1nx \perp 1^n by looking at positive and negative entries of xx separately.