Singular Value Decomposition

2021-11-25
matrix

Recall that in a previous post on eigendecomposition, An $n \times n$ symmetric matrix can be decomposed into $n$ matrices with the same shape ( $n \times n$ ).

$A = λ_{1} u_{1} u_{1}^{⊤} + λ_{2} u_{2} u_{2}^{⊤} + \dots + λ_{n} u_{n} u_{n}^{⊤},$

where $u_{1}, u_{2}, \dots, u_{n}$ are eigenvectors of $A$ .

Written compactly,

$A = P D P^{⊤}$ ,

where $P$ consists of $u_{i}$ as its column vectors and $D$ a diagonal matrix with $A$ ’s eigenvalues on its diagonal.

More generally, any $m \times n$ matrix $A$ can be docomposed into $r$ matrices of the same shape $m \times n$ , where $r$ is the rank of $A$ . Why should we want to decompose a matrix? Similar to what decomposing a symmetric matrix does, we can approximate a matrix by the sum of its first $k$ components. And why do we want to approximate a matrix? I will answer this question in the next post. In this post, let’s focus on how to decompose a matrix, symmetric or otherwise.

Let $A$ be an $m \times n$ matrix and $rank (A) = r$ . It can be shown that the number of the non-zero singular values of $A$ is also its rank, $r$ . Since all of those $r$ singular values are positive, we can label them in descending order as

$σ_{1} \geq σ_{2} \geq \dots \geq σ_{n}$

where

$σ_{r + 1} = σ_{r + 2} = \dots = σ_{n} = 0$

We know that each signular value $σ_{i}$ is the square root of $λ_{i}$ , which are eigenvalues of $A^{⊤} A$ and correspond to eigenvectors $v_{i}$ of $A^{⊤} A$ in the same order. Now we can write the singular value decomposition of $A$ as:

$A = U Σ V^{⊤}$

Next, let’s unpack this equation, starting with the item in the middle, $Σ$ . $Σ$ (pronouced “sigma”) is an $m \times n$ diagonal matrix, with $σ_{i}$ in its diagonal:

$Σ_{m \times n} = [\begin{matrix} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & σ_{r + 1} & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & σ_{n} \end{matrix}] = [\begin{matrix} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{matrix}]$

In practice, to construct $Σ$ , we can fill an $r \times r$ diagonal matrix with all the non-zero singular values of $A$ , $σ_{1}, σ_{2}, σ_{r}$ . Then pad the rest with $0$ s to make it an $m \times n$ matrix.

Next to $Σ$ is $V$ , an $n \times n$ matrix consisting of column vectors $v_{i}$ , eigenvectors of the symmetric matrix $A^{⊤} A$ .

$V = [\begin{matrix} v_{1} & v_{2} & \dots & v_{n} \end{matrix}]$

$V$ is an orthogonal matrix, or orthonormal matrix, because its columns, $v_{i}$ , are orthogonal and normalized. In addition, a set of orthogonal and normalized vectors, such as the set $⟨ v_{i} ⟩$ is called an orthonomal set.

Finally, $U$ is an $m \times m$ orthogonal matrix. To understand how $U$ is constructed, consider the following statement:

$⟨ A v_{1}, A v_{2}, \dots, A v_{r} ⟩$ is an orthogonal basis that spans $Col (A)$ .

Proof. Because $v_{i}$ and $v_{j}$ are orthogonal for $i \neq j$ ,

$(A v_{i})^{⊤} (A v_{j}) = v_{i}^{⊤} A^{⊤} A v_{j} = v_{i}^{⊤} (λ_{j} v_{j}) = 0.$

Therefore, $A v_{1}, A v_{2}, \dots, A v_{n}$ are orthogonal to each other.

In addition, $‖ A v_{i} ‖ = σ_{i}$ , where $σ_{i}$ are singular values of $A$ . Recall that $A v_{i} \neq 0$ when $1 \leq i \leq r$ and $A v_{i} = 0$ for $i > r$ . So $A v_{1}, \dots, A v_{r}$ are orthogonal and all non-zero, and thus a linearly independent set, all of which are in $col (A)$ .

$⟨ A v_{1}, A v_{2}, \dots, A v_{r} ⟩$ also spans $col (A)$ . To see why this is true, take any vector $y$ in $col (A)$ , $y = A x$ , where $x$ is an $n \times 1$ vector. Given that $⟨ v_{i}, \dots, v_{n} ⟩$ is a basis for $R^{n}$ , we can write $x = c_{1} v_{1} + \dots + c_{n} v_{n},$ so

$\begin{aligned} y = A x & = c_{1} A v_{1} + \dots + c_{r} A v_{r} + \dots + c_{n} A v_{n} & = c_{1} A v_{1} + \dots + c_{r} A v_{r} . \end{aligned}$

In other words, any vector $y$ in $col (A)$ can be writen in terms of $⟨ A v_{1}, A v_{2}, \dots, A v_{r} ⟩$ . Therefore, $⟨ A v_{1}, A v_{2}, \dots, A v_{r} ⟩$ is an orthogonal basis for $col (A)$ .

We can normalize vectors $A v_{i}$ ( $i = 1, \dots, r$ ) to obtain an orthonormal basis

by dividing them by their length:

$u_{i} = \frac{A v_{i}}{‖ A v_{i} ‖} = \frac{A v_{i}}{σ_{i}}, 1 \leq i \leq r$

We now have an orthonormal basis $⟨ u_{i}, \dots, u_{r} ⟩$ , where $r$ is the rank of $A$ . In the Singular Value Decomposition equation, $A = U Σ V^{⊤}$ , $Σ$ is an $m \times n$ matrix. Therefore, $U$ needs to be a $m \times m$ matrix. In case $r < m$ , we need to add additional ortthonormal vectors $⟨ u_{r + 1} \dots u_{m} ⟩$ to the set so that they span $R^{m}$ . One method to find these $m - r$ vectors is Gram-Schmidt Process. We will introduce it in another post.

Now we have successfully constructed all three components: $Σ$ , $V$ , and $U$ . And we can decompose $A$ as:

$\begin{aligned} A & = [\begin{array}{c} u_{1} & u_{2} & \dots & u_{m} \end{array}] [\begin{array}{c} σ_{1} & 0 & \dots & 0 & 0 & \dots & 0 \\ 0 & σ_{2} & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & \dots & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & σ_{r} & 0 & \dots & 0 \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ 0 & 0 & \dots & 0 & 0 & \dots & 0 \end{array}] [\begin{array}{c} v_{1}^{⊤} \\ v_{2}^{⊤} \\ ⋮ \\ v_{n}^{⊤} \end{array}] \\ = [\begin{array}{c} u_{1} & u_{2} & \dots & u_{m} \end{array}] [\begin{array}{c} σ_{1} v_{1}^{⊤} \\ σ_{2} v_{2}^{⊤} \\ ⋮ \\ σ_{r} v_{r}^{⊤} \\ 0 \\ ⋮ \\ 0 \end{array}] \\ = σ_{1} u_{1} v_{1}^{⊤} + σ_{2} u_{2} v_{2}^{⊤} + \dots + σ_{r} u_{r} v_{r}^{⊤} \end{aligned}$

Given that $u_{i}$ is an $m$ -dimensional column vector, and $v_{i}$ is an $n$ -dimensional column vector, and that $σ_{i}$ is a scalar, each $σ_{i} u_{i} v_{i}^{⊤}$ is an $m \times n$ matrix. Therefore, the singular value decomposition equation decomposes matrix $A$ into $r$ matrices of the same shape.

Closely inspecting the equation $A = σ_{1} u_{1} v_{1}^{⊤} + σ_{2} u_{2} v_{2}^{⊤} + \dots + σ_{r} u_{r} v_{r}^{⊤},$ we can gain further insights. Multiply both sides of the equation with an $n \times 1$ vector $x$ and we get:

$A x = σ_{1} u_{1} v_{1}^{⊤} x + σ_{2} u_{2} v_{2}^{⊤} x + \dots + σ_{r} u_{r} v_{r}^{⊤} x,$

Earlier, we have also shown that $⟨ A v_{1}, A v_{2}, \dots, A v_{r} ⟩$ is an orthogonal basis that spans $Col (A)$ . Therefore, $A x$ can be written as

$A x = a_{1} u_{1} + a_{2} u_{2} + \dots + a_{r} u_{r}$ Taken togethers, we will have $a_{i} = σ_{i} v_{i}^{⊤} x$

$v_{i}^{⊤} x$ gives the scalar projection of $x$ onto $v_{i}$ , which is then multiplied by the $i$ -th singular value $σ_{i}$ . Thus, each component in the decomposition of $A x$ is the result of:

Project $x$ onto $v_{i}$ ,
multiply the scalar projection with the singular value $σ_{i}$ ,
and finally scale $u_{i}$ with the compound scalar resulted from the previous step.

Recall that in the eigendecomposition equation where $A$ is a symmetric matrix, $A x = λ_{1} u_{1} u_{1}^{⊤} x + λ_{2} u_{2} u_{2}^{⊤} x + \dots + λ_{n} u_{n} u_{n}^{⊤} x$ $x$ is projected onto each eigenvector $u_{i}$ of $A$ , then scaled by the corresponding eigenvalue $λ_{i}$ .

In the singular value decomposition, $x$ is projected onto $u_{i}$ , then scaled by the product of singular value $σ_{i}$ and scalar projection of $x$ onto $v_{i}$ , where $v_{i}$ are eigenvectors of $A^{⊤} A$ , and $u_{i}$ are unit vectors along the direction of $A v_{i}$ .

The R package matlib has a function SVD() that can calculate the singular value decomposition of matrix $A$ . Let’s see an example:

mat_a <- matrix(c(4, 8, 1, 3, 3, -2), nrow = 2)
svd_a <- matlib::SVD(mat_a)
svd_a

$d
[1] 9.492982 3.589331

$U
          [,1]       [,2]
[1,] 0.4121068  0.9111355
[2,] 0.9111355 -0.4121068

$V
            [,1]        [,2]
[1,]  0.94148621  0.09686702
[2,]  0.33135146 -0.09059764
[3,] -0.06172462  0.99116540

Note that the last component, $V$ , is a $3 \times 2$ matrix, whereas in the original signular value decomposition equation, $V$ is expected to be a $3 \times 3$ matrix, given that $A$ is a $2 \times 3$ matrix. Did the function matlib::SVD() miss something? Consider the following:

$\begin{aligned} [\begin{array}{c} 4 & 1 & 3 \\ 8 & 3 & - 2 \end{array}] & = U Σ V^{⊤} \\ = [\begin{array}{c} 0.412 & 0.911 \\ 0.911 & - 0.412 \end{array}] [\begin{array}{c} 9.493 & 0 & 0 \\ 0 & 3.589 & 0 \end{array}] [\begin{array}{c} 0.941 & 0.331 & - 0.062 \\ 0.097 & - 0.091 & 0.991 \\ ? & ? & ? \end{array}] \end{aligned}$

I have used ? as a place holder for elements in $V^{⊤}$ that were missing from the previous output. It is easy to observe that what’s replaced with ? would be multiplied by 0 from the last column of $Σ$ and would therefore contribute nothing to the final result. This is why matlib::SVD() parsimoniously printed only the first two columns of $V$ .

In the next and final post of this series, we will walk through an example where singular value decoposition solves a real life problem.