Singular Values


matrix

In a previous post, we have seen the effect of multiplying a matrix with its eigenvectors. The vector does not change in direction, merely shrinks/stretches by an amount proportional to the corresponding eigenvalue.

I reproduce the before and after plots below for three matrices A, B, and C.

There is one subtle difference between B, C, and A. Take B for example, the length of Bu1 is the maximum of Bx over all unit vectors x. And the length of Bu2 is the maximum of Bx over all unit vectors x that are perpendicular to u1. The same pattern applies to C as well. However, Au1 is certainly NOT the maximum of Ax over all unit vectors x.

As always, there is no coincidence in mathematics. Nor is this one. For a symmetric matrix M, Mui returns the maximum of Mx over all unit vectors x that are perpendicular to the first i1 eigenvectors of M. The question remains then: among all unit vectors x, which one maximizes Ax when A is not necessarily symmetric?

Let’s digress here for a moment and consider, not A, but AA. Given that the transpose of a product is the product of the transpose in the reverse order, we have

(AA)=A(A)=AA

In other words, AA is equal to its transpose, and therefore is a symmetric matrix. From previous posts, we know that a symmetric matrix such as AA has n real eigenvalues and n linearly independent and orthogonal eigenvectors.

Next, let’s calculate the eigenvalues and eigenvectors of AA.

Let’s label these eigenvectors as v1 and v2, and we can assume that they are normalized.

Before we proceed, take a guess at what you would see if we plot v1, v2, Av1 and Av2.

Recall the question we asked earlier: Among all unit vectors x, which one maximizes Ax? It seems that we have found the answer. It is the eigenvectors of AA.

We have shown that this is true in the example of matrix A. In general, for an m×n matrix A, it can be shown that Avi has the greatest length and is perpendicular to the pervious i1 eigenvectors, where v1,v2,,vn are eigenvectors of ATA.

For each of these eigenvectors, we can use the definition of length and the rule for the product of transposed matrices to have:

Avi2=(Avi)TAvi=viTATAvi

Let’s assume that the corresponding eigenvalue of vi is λi

viTATAvi=viTλivi=λiviTvi

And because vi is normalized, so

vi2=viTvi=1

and

Avi2=λiviTvi=λi

This result shows that all the eigenvalues of AA are non-negative. If we label them in descending order, we have:

λ1λ2λn0

The singular value of A is defined as the square root of λi, denoted σi.

σi=λi=Avi, σ1σ2σn0

Therefore, the singular values of A are the length of vectors Avi. An important theory that forms the backbone of the SVD method: the maximum value of Ax, subject to the constraints

x=1, xv1=0, xv2=0, , xvk1=0

is σk, and this maximum value is attained at vk, the k-th eigenvector of ATA.

In an earlier post, we mentioned that a symmetric matrix transforms a vector by stretching or shrinking the vector along the eigenvectors of this matrix.

With a non-symmetric matrix A, it transforms a vector by stretching or shrinking the vector along the direction of Avi, where vi is an eigenvector of ATA, ordered based on its corresponding eigenvalue, vi=1. The corresponding singular value σi is the scalar that determines the length of the stretching, σi=λi, where λi is the corresponding eigenvalue of AA.

How can we reconcile these two seemingly different rules? Let’s take a symmetric metrix, B. Suppose that its i-th eigenvector is ui and the corresponding eigenvalue is λi. If we multiply BB by ui we get:

(BB)ui=B(Bui)=Bλiui=λiBui=λi2ui

which means that ui is also an eigenvector of BB, but its corresponding eigenvalue is λi2! Now we can see that the previous rule about a symmetric matrix is nothing but a special case of the more general rule:

A matrix A transforms a vector by stretching or shrinking the vector along the direction of Avi, where vi is an eigenvector of ATA, ordered based on its corresponding singular value. The corresponding singular value σi is the scalar that determines the length of the stretching or shrinking, σi=λi, where λi is the corresponding eigenvalue of AA.

When A is symmetric, the direction of Avi will be identical to that of Aui, because A has the same eigenvectors as AA. Moreover, Aui=λiui. Therefore, the direction of Avi is the direction of Aui, which is the direction of ui. That is, a symmetric matrix transforms a vector by stretching or shrinking the vector along the direction of ui, its eigenvector!

What about the length of the stretching or shrinking? We know that λi=λi2, where λi2 is the corresponding eigenvalue of AA, and λi is the corresponding eigenvalue of A. Therefore, a symmetric matrix transforms a vector along its eigenvectors ui, scaled by its corresponding eigenvalues λi. We have come a full circle! In the next post, we are finally ready to present the singular value decomposition equation!