CS294 Lecture 0: Linear Algebra Review

In which we review linear algebra prerequisites.

The following background from linear algebra will be sufficient for the sake of this course: to know what is an eigenvalue and an eigenvector, to know that real symmetric matrices have real eigenvalues and their real eigenvectors are orthogonal, and to know the variational characterization of eigenvalues.

1. Basic Definitions

If {x=a+ib} is a complex number, then we let {\bar x = a- ib} denote its conjugate. Note that a complex number {x} is real if and only if {x = \bar x}. If {M \in {\mathbb C}^{m \times n}} is a matrix, then {M^*} denotes the conjugate transpose of {M}, that is, {(M^*)_{i,j} = \overline{M_{j,i}}}. If the entries of {M} are real, then {M^* = M^T}, where {M^T} is the transpose of {M}, that is, the matrix such that {(M^T)_{i,j} = M_{j,i}}.

We say that a matrix {M} is Hermitian if {M = M^*}. In particular, real symmetric matrices are Hermitian.

If {{\bf x},{\bf y} \in {\mathbb C}^n} are two vectors, then their inner product is defined as

\displaystyle  \langle {\bf v},{\bf w} \rangle := {\bf v}^* {\bf w} = \sum_i \overline{v_i} \cdot w_i \ \ \ \ \ (1)

Notice that, by definition, we have {\langle {\bf v}, {\bf w} \rangle = (\langle {\bf w}, {\bf v} \rangle)^*} and {\langle {\bf v}, {\bf v} \rangle = || {\bf v} ||^2}. Note also that, for two matrices {A,B}, we have {(A \cdot B)^* = B^* \cdot A^*}, and that for every matrix {M} and every two vectors {{\bf x}}, {{\bf y}}, we have

\displaystyle  \langle M {\bf x} , {\bf y} \rangle = {\bf x}^* M^* {\bf y} = \langle {\bf x} , M^* {\bf y} \rangle

If {M\in {\mathbb C}^{n \times n}} is a square matrix, {\lambda \in {\mathbb C}} is a scalar, {{\bf v} \in {\mathbb C}^n - \{ {\bf {0}} \}} is a non-zero vector and we have

\displaystyle   M {\bf v} = \lambda {\bf v} \ \ \ \ \ (2)

then we say that {\lambda} is an eigenvalue of {M} and that {{\bf v}} is eigenvector of {M} corresponding to the eigenvalue {\lambda}.

2. The Spectral Theorem

We want to prove

Theorem 1 (Spectral Theorem) Let {M\in {\mathbb R}^{n\times n}} be a symmetric matrix with real-valued entries, then there are {n} real numbers (not necessarily distinct) {\lambda_1,\ldots,\lambda_n} and {n} orthonormal real vectors {{\bf x}_1,\ldots,{\bf x}_n}, {{\bf x}_i \in {\mathbb R}^n} such that {{\bf x}_i} is an eigenvector of {\lambda_i}.

Assuming the fundamental theorem of algebra (that every polynomial has a complex root) and basic properties of the determinant, the cleanest proof of the spectral theorem is to proceed by induction on {n}, and to show that {M} must have a real eigenvalue {\lambda_1} with a real eigenvector {{\bf v}_1}, and to show that {M} maps vectors orthogonal to {{\bf v}_1} to vectors orthogonal to {{\bf v}_1}. Then one applies the inductive hypothesis to {M} restricted to the {(n-1)}-dimensional space of vectors orthogonal to {{\bf v}_1} and one recovers the remaining {(n-1)} eigenvalues and eigenvectors.

The cleanest way to formalize the above proof is to give all definitions and results in terms of linear operators {T: X \rightarrow X} where {X} is an arbitrary vector space over the reals. This way, however, we would be giving several definitions that we would never use in the future, so, instead, the inductive proof will use a somewhat inelegant change of basis to pass from {M} to an {(n-1) \times (n-1)} matrix {M'}.

We begin by showing that a real symmetric matrix has real eigenvalues and eigenvectors.

Theorem 2 If {M\in {\mathbb R}^{n\times n}} is symmetric, then there is a real eigenvalue {\lambda \in {\mathbb R}} and a real eigenvector {{\bf v} \in {\mathbb R}^n} such that {M {\bf v} = \lambda {\bf v}}.

We begin by noting that every matrix has a complex eigenvalue.

Lemma 3 For every matrix {M\in {\mathbb C}^{n \times n}}, there is an eigenvalue {\lambda \in {\mathbb C}} and an eigenvector {{\bf v} \in {\mathbb C}^n} such that {M {\bf v} = \lambda {\bf v}}.

Proof: Note that {\lambda} is an eigenvalue for {M} if and only if

\displaystyle  \exists {\bf x} \neq {\bf 0}. \ ( M - \lambda I ) {\bf x} = {\bf 0}

which is true if and only if the rows of {M-\lambda I} are not linearly independent, which is true if and only if

\displaystyle  \det (M - \lambda I ) = 0

Now note that the mapping {t \rightarrow \det (M - t I )} is a univariate polynomial of degree {n} in {t}, and so it must have a root {\lambda} by the fundamental theorem of algebra. \Box

Next we show that if {M} is real and symmetric, then its eigenvalues are real.

Lemma 4 If {M} is Hermitian, then, for every {{\bf x}} and {{\bf y}},

\displaystyle  \langle M {\bf x} , {\bf y} \rangle = \langle {\bf x} , M {\bf y} \rangle


\displaystyle  \langle M {\bf x} , {\bf y} \rangle = \langle {\bf x} , M^* {\bf y} \rangle = \langle {\bf x} , M {\bf y} \rangle


Lemma 5 If {M} is Hermitian, then all the eigenvalues of {M} are real.

Proof: Let {M} be an Hermitian matrix and let {\lambda} be a scalar and {{\bf x}} be a non-zero vector such that {M {\bf x} = \lambda {\bf x}}. We will show that {\lambda = \lambda^*}, which implies that {\lambda} is a real number.

We note that

\displaystyle  \langle M{\bf x}, {\bf x} \rangle = \langle \lambda {\bf x}, {\bf x} \rangle = \lambda^* ||x||^2


\displaystyle  \langle {\bf x}, M {\bf x} \rangle = \langle {\bf x}, \lambda {\bf x} \rangle = \lambda ||x||^2

and by the fact that {\langle M{\bf x}, {\bf x} \rangle = \langle {\bf x}, M{\bf x} \rangle} , we have {\lambda = \lambda^*}. \Box

In order to prove Theorem 2, it remains to argue that, for a real eigenvalue of a real symmetric matrix, we can find a real eigenvector.

Proof: } Let {M \in {\mathbb R}^{n \times n}} be a real symmetric matrix, then {M} has a real eigenvalue {\lambda} and a (possibly complex valued) eigenvector {{\bf z} = {\bf x} + i {\bf y}}, where {{\bf x}} and {{\bf y}} are real vectors. We have

\displaystyle  M {\bf x} + i M {\bf y} = \lambda {\bf x} + i \lambda {\bf y}

from which (recalling that the entries of {M} and the scalar {\lambda} are real) it follows that {M {\bf x} = \lambda {\bf x}} and that {M {\bf y} = \lambda {\bf y}}; since {{\bf x}} and {{\bf y}} cannot both be zero, it follows that {\lambda} has a real eigenvector. \Box

We are now ready to prove the spectral theorem

Proof: We proceed by induction on {n}. The case {n=1} is trivial.

Assume that the statement is true for dimension {n-1}. Let {\lambda_1} be a real eigenvalue of {M} and {{\bf x}_1} be a real eigenvector {\lambda_1}.

Now we claim that for every vector {{\bf y}} that is orthogonal to {{\bf x}_1}, then {M{\bf y}} is also orthogonal to {{\bf x}_1}. Indeed, the inner product of {M{\bf y}} and {{\bf x}_1} is

\displaystyle  \langle {\bf x}_1, M{\bf y} \rangle = \langle M {\bf x}_1 , {\bf y} \rangle = \langle \lambda {\bf x}_1,{\bf y} \rangle = 0

Let {X} be the {n-1}-dimensional subspace of {{\mathbb R}^n} that contains all the vectors orthogonal to {{\bf x}_1}. We want to apply the inductive hypothesis to {M} restricted to {X}; we cannot literally do that, because the theorem is not stated in terms of arbitrary linear operators over vector spaces, so we will need to do that by fixing an appropriate basis for {X}.

let {B \in {\mathbb R}^{n \times (n-1)}} be a matrix that computes a bijective map from {{\mathbb R}^{n-1}} to {X}. (If {{\bf b}_1, \ldots,{\bf b}_{n-1}} is an orthonormal basis for {X}, then {B} is just the matrix whose columns are the vectors {{\bf b}_i}.) Let also {B' \in {\mathbb R}^{(n-1) \times n}} be the matrix such that, for every {{\bf y} \in X}, {BB' {\bf y} = {\bf y}}. (We can set {B' = B^T} where {B} is as described above.) We apply the inductive hypothesis to the matrix

\displaystyle  M' := B'MB \in {\mathbb R}^{(n-1) \times (n-1)}

and we find eigenvalues {\lambda_2,\ldots,\lambda_n} and orthonormal eigenvectors {{\bf y}_2,\ldots,{\bf y}_n} for {M'}.

For every {i=2,\ldots,n}, we have

\displaystyle  B'MB {\bf y}_i = \lambda_i {\bf y}_i

and so

\displaystyle  BB'MB{\bf y}_i = \lambda_i B {\bf y}_i

Since {B{\bf y}_i} is orthogonal to {{\bf x}_1}, it follows that {MB{\bf y}_i} is also orthogonal to {{\bf x}_1}, and so {BB'MB{\bf y}_i = MB{\bf y}_i}, so we have

\displaystyle  MB{\bf y}_i = \lambda_i B{\bf y}_i

and, defining {{\bf x}_i := B{\bf y}_i}, we have

\displaystyle  M {\bf x}_i = \lambda_i {\bf x}_i

Finally, we observe that the vectors {{\bf x}_i} are orthogonal. By construction, {{\bf x}_1} is orthogonal to {{\bf x}_2,\ldots,{\bf x}_n}, and, for every {2 \leq i < j \leq n}, we have that

\displaystyle  \langle {\bf x}_i , {\bf x}_j \rangle = \langle B {\bf y}_i , B {\bf y}_j \rangle = \langle {\bf y}_i , B^T B {\bf y} _j \rangle = \langle {\bf y}_i , {\bf y}_j \rangle = 0

We have not verified that the vectors {{\bf x}_i} have norm 1 (which is true), but we can scale them to have norm 1 if not. \Box

3. Variational Characterization of Eigenvalues

We conclude these notes with the variational characterization of eigenvalues for real symmetric matrices.

Theorem 6 Let {M\in {\mathbb R}^{n \times n}} be a symmetric matrix, and {\lambda_1 \leq \lambda_2 \leq \cdots \lambda_n} be the eigenvalues of {M} in non-increasing order. Then

\displaystyle  \lambda_k = \min_{k-{\rm dim} \ X} \ \ \ \max_{{\bf x} \in X - \{ {\bf 0} \} } \ \ \frac{ {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}}

The quantity {\frac { {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}} } is called the Rayleigh quotient of {{\bf x}} with respect to {M}, and we will denote it by {R_M({\bf x})}.

Proof: Let {{\bf v}_1,\ldots,{\bf v}_n} be orthonormal eigenvectors of the eigenvalues {\lambda_1,\ldots,\lambda_n}, as promised by the spectral theorem. Consider the {k}-dimensional space spanned by {{\bf v}_1, \ldots,{\bf v}_k}. For every vector {{\bf x} = \sum_{i=1}^k a_i {\bf v}_i} in such a space, the numerator of the Rayleigh quotient is

\displaystyle  \sum_{i,j} a_i a_j {\bf v}_i^T M {\bf v}_j = \sum_{i,j} a_i a_j \lambda_j {\bf v}_i^T {\bf v}_j = \sum_{i=1}^k \lambda_i a_i^2 \leq \lambda_k \cdot \sum_{i=1}^k a_i^2

The denominator is clearly {\sum_{i=1}^k a_j^2}, and so {R_M({\bf x}) \leq \lambda_k}. This shows that

\displaystyle  \lambda_k \geq \min_{k-{\rm dim} \ X} \ \ \ \max_{{\bf x} \in X - \{ {\bf 0} \}} \ \ \frac{ {\bf x}^T M {\bf x}}{{\bf x}^T {\bf x}}

For the other direction, let {X} be any {k}-dimensional space: we will show that {X} must contain a vector of Rayleigh quotient {\geq \lambda_k}. Let {S} be the span of {{\bf v}_k,\ldots,{\bf v}_n}; since {S} has dimension {n-k+1} and {X} has dimension {k}, they must have some non-zero vector in common. Let {{\bf x}} be one such vector, and let us write {{\bf x} = \sum_{i=k}^n a_i {\bf v}_i}. The numerator of the Rayleigh quotient of {{\bf x}} is

\displaystyle  \sum_{i=k}^n \lambda_i a_i^2 \geq \lambda_k \sum_i a_i^2

and the denominator is {\sum_i a_i^2}, so {R_M({\bf x}) \geq \lambda_k}. \Box

We have the following easy consequence.

Fact 7 If {\lambda_1} is the smallest eigenvalue of a real symmetric matrix {M}, then

\displaystyle  \lambda_1 = \min_{{\bf x} \neq {\bf 0}} R_M({\bf x})

Furthermore, every minimizer is an eigenvector of {\lambda_1}.

Proof: The identity is the {k=1} case of the previous theorem. For the furthermore part, let {\lambda_1 \leq \cdots \lambda_n} be the list of eigenvalues of {M} in non-decreasing order, and {{\bf v}_1,\ldots,{\bf v}_n} be corresponding eigenvectors. If {{\bf x} = \sum_i a_i {\bf v}_i} is any vector, then

\displaystyle  R_M( {\bf x} ) = \frac { \sum_i \lambda_i a_i^2}{\sum_i a_i^2}

If {R_M({\bf x}) = \lambda_1}, then {a_i = 0} for every {i} such that {\lambda_i > \lambda_1}, that is, {{\bf x}} is a linear combination of eigenvectors of {\lambda_1}, and hence it is an eigenvector of {\lambda_1}. \Box

Fact 8 If {\lambda_n} is the largest eigenvalue of a real symmetric matrix {M}, then

\displaystyle  \lambda_n = \max_{{\bf x} \neq {\bf 0}} R_M({\bf x})

Furthermore, every maximizer is an eigenvector of {\lambda_n}.

Proof: Apply Fact 7 to the matrix {-M}. \Box

Fact 9 If {\lambda_1} is the smallest eigenvalue of a real symmetric matrix {M}, and {{\bf x}_1} is an eigenvector of {\lambda_1}, then

\displaystyle  \lambda_2 = \min_{{\bf x} \neq {\bf 0},\ {\bf x} \perp {\bf x}_1} \ \ \ R_M({\bf x})

Furthermore, every minimizer is an eigenvector of {\lambda_2}.

Proof: A more conceptual proof would be to consider the restriction of {M} to the space orthogonal to {{\bf x}_1}, and then apply Fact 7 to such a linear operator. But, since we have not developed the theory for general linear operators, we would need to explicitly reduce to an {(n-1)}-dimensional case via a projection operator as in the proof of the spectral theorem.

Instead, we will give a more hands-on proof. Let {\lambda_1 \leq \lambda_2 \leq \cdots \lambda_n} be the list of eigenvalues of {M}, with multiplicities, and {{\bf v}_1,\ldots,{\bf v}_n} be orthonormal vectors as given by the spectral theorem. We may assume that {{\bf v}_1 = {\bf x}_1}, possibly by changing the orthonormal basis of the eigenspace of {\lambda_1}. For every vector {{\bf x} = \sum_{i=2}^k a_i {\bf v}_i } orthogonal to {{\bf v}_1}, its Rayleigh quotient is

\displaystyle  R_M({\bf x}) = \frac { \sum_{i=2}^n \lambda_i a_i^2}{\sum_i a_i^2} \geq \lambda_2

and the minimum is achieved by vectors {{\bf x}} such that {a_i = 0} for every {\lambda_i > \lambda_2}, that is, for vectors {{\bf x}} which are linear combinations of the eigenvectors of {\lambda_2}, and so every minimizer is an eigenvector of {\lambda_2}. \Box

2 thoughts on “CS294 Lecture 0: Linear Algebra Review

  1. To apply the inductive hypothesis on M’, don’t we need to prove that it is symmetric?

  2. and to answer my own question… M’ is indeed symmetric since BM’B^T = M M^T = (BM’B^T)^T = BM’^TB^T. Since M= M^T BM’B^T = BM’^TB^T and therefore M’ = M’^T

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s