Singular Value Decomposition


matrix

Recall that in a previous post on eigendecomposition, An n×n symmetric matrix can be decomposed into n matrices with the same shape (n×n).

A=λ1u1u1+λ2u2u2++λnunun, 

where u1,u2,,un are eigenvectors of A.

Written compactly,

A=PDP,

where P consists of ui as its column vectors and D a diagonal matrix with A’s eigenvalues on its diagonal.

More generally, any m×n matrix A can be docomposed into r matrices of the same shape m×n, where r is the rank of A. Why should we want to decompose a matrix? Similar to what decomposing a symmetric matrix does, we can approximate a matrix by the sum of its first k components. And why do we want to approximate a matrix? I will answer this question in the next post. In this post, let’s focus on how to decompose a matrix, symmetric or otherwise.

Let A be an m×n matrix and rank(A)=r. It can be shown that the number of the non-zero singular values of A is also its rank, r. Since all of those r singular values are positive, we can label them in descending order as

σ1σ2σn

where

σr+1=σr+2==σn=0

We know that each signular value σi is the square root of λi, which are eigenvalues of AA and correspond to eigenvectors vi of AA in the same order. Now we can write the singular value decomposition of A as:

A=UΣV 

Next, let’s unpack this equation, starting with the item in the middle, Σ. Σ (pronouced “sigma”) is an m×n diagonal matrix, with σi in its diagonal:

Σm×n=[σ100000σ200000σr00000σr+100000σn]=[σ100000σ200000σr000000000000]

In practice, to construct Σ, we can fill an r×r diagonal matrix with all the non-zero singular values of A, σ1, σ2, σr. Then pad the rest with 0s to make it an m×n matrix.

Next to Σ is V, an n×n matrix consisting of column vectors vi, eigenvectors of the symmetric matrix AA.

V=[v1v2vn]

V is an orthogonal matrix, or orthonormal matrix, because its columns, vi, are orthogonal and normalized. In addition, a set of orthogonal and normalized vectors, such as the set vi is called an orthonomal set.

Finally, U is an m×m orthogonal matrix. To understand how U is constructed, consider the following statement:

Av1, Av2, , Avr is an orthogonal basis that spans Col(A).

Proof. Because vi and vj are orthogonal for ij,

(Avi)(Avj)=viAAvj=vi(λjvj)=0.

Therefore, Av1, Av2, , Avn are orthogonal to each other.

In addition, Avi=σi, where σi are singular values of A. Recall that Avi0 when 1ir and Avi=0 for i>r. So Av1, , Avr are orthogonal and all non-zero, and thus a linearly independent set, all of which are in col(A).

Av1, Av2, , Avr also spans col(A). To see why this is true, take any vector y in col(A), y=Ax, where x is an n×1 vector. Given that vi, , vn is a basis for Rn, we can write x=c1v1++cnvn,  so

y=Ax=c1Av1++crAvr++cnAvn=c1Av1++crAvr.

In other words, any vector y in col(A) can be writen in terms of Av1, Av2, , Avr. Therefore, Av1, Av2, , Avr is an orthogonal basis for col(A).

We can normalize vectors Avi (i=1, , r) to obtain an orthonormal basis

by dividing them by their length:

ui=AviAvi=Aviσi, 1ir

We now have an orthonormal basis ui, , ur, where r is the rank of A. In the Singular Value Decomposition equation, A=UΣV , Σ is an m×n matrix. Therefore, U needs to be a m×m matrix. In case r<m, we need to add additional ortthonormal vectors ur+1  um to the set so that they span Rm. One method to find these mr vectors is Gram-Schmidt Process. We will introduce it in another post.

Now we have successfully constructed all three components: Σ, V, and U. And we can decompose A as:

A=[u1u2um][σ100000σ200000σr000000000000][v1v2vn]=[u1u2um][σ1v1σ2v2σrvr00]=σ1u1v1+σ2u2v2++σrurvr

Given that ui is an m-dimensional column vector, and vi is an n-dimensional column vector, and that σi is a scalar, each σiuivi is an m×n matrix. Therefore, the singular value decomposition equation decomposes matrix A into r matrices of the same shape.

Closely inspecting the equation A=σ1u1v1+σ2u2v2++σrurvr,  we can gain further insights. Multiply both sides of the equation with an n×1 vector x and we get:

Ax=σ1u1v1x+σ2u2v2x++σrurvrx, 

Earlier, we have also shown that Av1, Av2, , Avr is an orthogonal basis that spans Col(A). Therefore, Ax can be written as

Ax=a1u1+a2u2++arur Taken togethers, we will have ai=σivix

vix gives the scalar projection of x onto vi, which is then multiplied by the i-th singular value σi. Thus, each component in the decomposition of Ax is the result of:

Recall that in the eigendecomposition equation where A is a symmetric matrix, Ax=λ1u1u1x+λ2u2u2x++λnununx x is projected onto each eigenvector ui of A, then scaled by the corresponding eigenvalue λi.

In the singular value decomposition, x is projected onto ui, then scaled by the product of singular value σi and scalar projection of x onto vi, where vi are eigenvectors of AA, and ui are unit vectors along the direction of Avi.

The R package matlib has a function SVD() that can calculate the singular value decomposition of matrix A. Let’s see an example:

mat_a <- matrix(c(4, 8, 1, 3, 3, -2), nrow = 2)
svd_a <- matlib::SVD(mat_a)
svd_a
$d
[1] 9.492982 3.589331

$U
          [,1]       [,2]
[1,] 0.4121068  0.9111355
[2,] 0.9111355 -0.4121068

$V
            [,1]        [,2]
[1,]  0.94148621  0.09686702
[2,]  0.33135146 -0.09059764
[3,] -0.06172462  0.99116540

Note that the last component, V, is a 3×2 matrix, whereas in the original signular value decomposition equation, V is expected to be a 3×3 matrix, given that A is a 2×3 matrix. Did the function matlib::SVD() miss something? Consider the following:

[413832]=UΣV=[0.4120.9110.9110.412][9.4930003.5890][0.9410.3310.0620.0970.0910.991???]

I have used ? as a place holder for elements in V that were missing from the previous output. It is easy to observe that what’s replaced with ? would be multiplied by 0 from the last column of Σ and would therefore contribute nothing to the final result. This is why matlib::SVD() parsimoniously printed only the first two columns of V.

In the next and final post of this series, we will walk through an example where singular value decoposition solves a real life problem.