Singular Values

2021-11-21
matrix

In a previous post, we have seen the effect of multiplying a matrix with its eigenvectors. The vector does not change in direction, merely shrinks/stretches by an amount proportional to the corresponding eigenvalue.

I reproduce the before and after plots below for three matrices $A$ , $B$ , and $C$ .

There is one subtle difference between $B$ , $C$ , and $A$ . Take $B$ for example, the length of $B u_{1}$ is the maximum of $‖ B x ‖$ over all unit vectors $x$ . And the length of $B u_{2}$ is the maximum of $‖ B x ‖$ over all unit vectors $x$ that are perpendicular to $u_{1}$ . The same pattern applies to $C$ as well. However, $A u_{1}$ is certainly NOT the maximum of $‖ A x ‖$ over all unit vectors $x$ .

As always, there is no coincidence in mathematics. Nor is this one. For a symmetric matrix $M$ , $M u_{i}$ returns the maximum of $‖ M x ‖$ over all unit vectors $x$ that are perpendicular to the first $i - 1$ eigenvectors of $M$ . The question remains then: among all unit vectors $x$ , which one maximizes $‖ A x ‖$ when $A$ is not necessarily symmetric?

Let’s digress here for a moment and consider, not $A$ , but $A^{⊤} A$ . Given that the transpose of a product is the product of the transpose in the reverse order, we have

$(A^{⊤} A)^{⊤} = A^{⊤} (A^{⊤})^{⊤} = A^{⊤} A$

In other words, $A^{⊤} A$ is equal to its transpose, and therefore is a symmetric matrix. From previous posts, we know that a symmetric matrix such as $A^{⊤} A$ has $n$ real eigenvalues and $n$ linearly independent and orthogonal eigenvectors.

Next, let’s calculate the eigenvalues and eigenvectors of $A^{⊤} A$ .

Let’s label these eigenvectors as $v_{1}$ and $v_{2}$ , and we can assume that they are normalized.

Before we proceed, take a guess at what you would see if we plot $v_{1}$ , $v_{2}$ , $A v_{1}$ and $A v_{2}$ .

Recall the question we asked earlier: Among all unit vectors $x$ , which one maximizes $‖ A x ‖$ ? It seems that we have found the answer. It is the eigenvectors of $A^{⊤} A$ .

We have shown that this is true in the example of matrix $A$ . In general, for an $m \times n$ matrix $A$ , it can be shown that $A v_{i}$ has the greatest length and is perpendicular to the pervious $i - 1$ eigenvectors, where $v_{1}, v_{2}, \dots, v_{n}$ are eigenvectors of $A^{T} A$ .

For each of these eigenvectors, we can use the definition of length and the rule for the product of transposed matrices to have:

$‖ A v_{i} ‖^{2} = (A v_{i})^{T} A v_{i} = v_{i}^{T} A^{T} A v_{i}$

Let’s assume that the corresponding eigenvalue of $v_{i}$ is $λ_{i}$

$v_{i}^{T} A^{T} A v_{i} = v_{i}^{T} λ_{i} v_{i} = λ_{i} v_{i}^{T} v_{i}$

And because $v_{i}$ is normalized, so

$‖ v_{i} ‖^{2} = v_{i}^{T} v_{i} = 1$

and

$‖ A v_{i} ‖^{2} = λ_{i} v_{i}^{T} v_{i} = λ_{i}$

This result shows that all the eigenvalues of $A^{⊤} A$ are non-negative. If we label them in descending order, we have:

$λ_{1} \geq λ_{2} \geq \dots \geq λ_{n} \geq 0$

The singular value of $A$ is defined as the square root of $λ_{i}$ , denoted $σ_{i}$ .

$σ_{i} = \sqrt{λ_{i}} = ‖ A v_{i} ‖, σ_{1} \geq σ_{2} \geq \dots \geq σ_{n} \geq 0$

Therefore, the singular values of $A$ are the length of vectors $A v_{i}$ . An important theory that forms the backbone of the SVD method: the maximum value of $‖ A x ‖$ , subject to the constraints

$‖ x ‖ = 1, x^{⊤} v_{1} = 0, x^{⊤} v_{2} = 0, \dots, x^{⊤} v_{k - 1} = 0$

is $σ_{k}$ , and this maximum value is attained at $v_{k}$ , the $k$ -th eigenvector of $A^{T} A$ .

In an earlier post, we mentioned that a symmetric matrix transforms a vector by stretching or shrinking the vector along the eigenvectors of this matrix.

With a non-symmetric matrix $A$ , it transforms a vector by stretching or shrinking the vector along the direction of $A v_{i}$ , where $v_{i}$ is an eigenvector of $A^{T} A$ , ordered based on its corresponding eigenvalue, $‖ v_{i} ‖ = 1$ . The corresponding singular value $σ_{i}$ is the scalar that determines the length of the stretching, $σ_{i} = \sqrt{λ_{i}}$ , where $λ_{i}$ is the corresponding eigenvalue of $A^{⊤} A$ .

How can we reconcile these two seemingly different rules? Let’s take a symmetric metrix, $B$ . Suppose that its $i$ -th eigenvector is $u_{i}$ and the corresponding eigenvalue is $λ_{i}$ . If we multiply $B^{⊤} B$ by $u_{i}$ we get:

$(B^{⊤} B) u_{i} = B (B u_{i}) = B λ_{i} u_{i} = λ_{i} B u_{i} = λ_{i}^{2} u_{i}$

which means that $u_{i}$ is also an eigenvector of $B^{⊤} B$ , but its corresponding eigenvalue is $λ_{i}^{2}$ ! Now we can see that the previous rule about a symmetric matrix is nothing but a special case of the more general rule:

A matrix $A$ transforms a vector by stretching or shrinking the vector along the direction of $A v_{i}$ , where $v_{i}$ is an eigenvector of $A^{T} A$ , ordered based on its corresponding singular value. The corresponding singular value $σ_{i}$ is the scalar that determines the length of the stretching or shrinking, $σ_{i} = \sqrt{λ_{i}}$ , where $λ_{i}$ is the corresponding eigenvalue of $A^{⊤} A$ .

When $A$ is symmetric, the direction of $A v_{i}$ will be identical to that of $A u_{i}$ , because $A$ has the same eigenvectors as $A^{⊤} A$ . Moreover, $A u_{i} = λ_{i} u_{i}$ . Therefore, the direction of $A v_{i}$ is the direction of $A u_{i}$ , which is the direction of $u_{i}$ . That is, a symmetric matrix transforms a vector by stretching or shrinking the vector along the direction of $u_{i}$ , its eigenvector!

What about the length of the stretching or shrinking? We know that $λ_{i} = \sqrt{λ_{i}^{2}}$ , where $λ_{i}^{2}$ is the corresponding eigenvalue of $A^{⊤} A$ , and $λ_{i}$ is the corresponding eigenvalue of $A$ . Therefore, a symmetric matrix transforms a vector along its eigenvectors $u_{i}$ , scaled by its corresponding eigenvalues $λ_{i}$ . We have come a full circle! In the next post, we are finally ready to present the singular value decomposition equation!