Linear AlgebraSingular Value Decomposition
In this section we will develop one of the most powerful ideas in linear algebra: the singular value decomposition. The first step on this journey is the polar decomposition.
as befits the notation .
The matrix is simpler to understand than because it is symmetric and
In other words, for all , the images of under and under have equal norm. If points are the same distance from the origin, then one may be mapped to the other by a
Therefore, for each , there is an orthogonal transformation from the range of to the range of which sends to .
It can be shown that the orthogonal transformation mapping to is the same transformation for every . Furthermore, even if the range of is not all of , we can extend this orthogonal transformation to an orthogonal transformation on by arbitrarily mapping vectors in a basis of the
Theorem (Polar Decomposition)
For any matrix , there exists an orthogonal matrix such that
This representation is useful because it represents an arbitrary square matrix as a product of matrices whose properties are easier to understand (the orthogonal matrix because it is distance- and angle-preserving, and the positive-definite matrix because it is orthogonally diagonalizable, by the
Let's explore a fast method of computing a polar decomposition . This method actually works by calculating and then recovering as (since this is computationally faster than calculating the matrix square root). We call the orthogonal part of and the symmetric part of .
We set and define the iteration
Let's see why this converges to .
Defining and using the equation , show that Use the prior step to explain why the 's all have the same orthogonal parts and have symmetric parts converging to the identity matrix.
Hint: consider the eigendecompositions of the symmetric parts. You may assume that the sequence defined by converges to 1 regardless of the starting value .
Write some code to apply this algorithm to the matrix
import numpy as np A = np.array([[1,3,4],[7,-2,5],[-3,4,11]])
A = [1 3 4; 7 -2 5; -3 4 11]
and confirm that the resulting matrices and satisfy and .
Solution. Since both conjugation and inversion reverse the order of matrix products, we get
Since is orthogonal, , as . Since is symmetric, . So this is equal to
We see that the and have the same orthogonal part, and repeating the calculation shows that all the have the same orthogonal part. As for the symmetric parts , we see that
Let's see why this averaging process converges to the identity matrix. Symmetric matrices are diagonalizable by the
Thus the 's converge to the matrix , where is the diagonal matrix whose th entry is the limit obtained when you start with and repeatedly apply the function . By the fact about this iteration given in the problem statement, we conclude that is the identity matrix. Therefore, the limit of as is equal to .
import numpy as np def polar(A,n): R = A for i in range(n): R = (R + np.linalg.inv(R.T))/2 return R, np.linalg.solve(R, A) I = np.eye(3) A = np.array([[1, 3, 4],[7, -2, 5], [-3, 4, 11]]) R, P = polar(A,100) R.T @ R - I, P @ P - A.T @ A
using LinearAlgebra function polar(A,n) R = A for i=1:n R = (R + inv(R'))/2 end R, R \ A end A = [1 3 4; 7 -2 5; -3 4 11] R, P = polar(A,100) R'*R - I, P^2 - A'*A
Both of the matrices returned on the last line have entries which are within of zero.
Show that the product of two matrices with orthonormal columns has orthonormal columns.
Solution. If and , then .
The singular value decomposition
The polar decomposition tells us that any square matrix is almost the same as some symmetric matrix, and the spectral theorem tells us that a symmetric matrix is almost the same as a simple scaling along the coordinate axes. (In both cases, the phrase "almost the same" disguises a composition with an orthogonal transformation.) We should be able to combine these ideas to conclude that any square matrix is basically the same as a simple scaling along the coordinate axes!
Let's be more precise. Suppose that is a square matrix. The polar decomposition tells us that
for some orthogonal matrix . The spectral theorem tells us that for some orthogonal matrix and a diagonal matrix with nonnegative diagonal entries.
Combining these equations, we get
Since a product of orthogonal matrices is
where and are orthogonal matrices.
We can visualize the decomposition geometrically by making a figure like the one shown below, which illustrates the successive effects of each map in the composition . If we draw grid lines on the second plot (just before is applied) and propagate those grid lines to the other plots by applying the indicated maps, then we endow the domain and range of with orthogonal sets of gridlines with mapping one to the other.
We can extend the singular value decomposition to rectangular matrices (that is, matrices which are not necessarily square) by adding rows or columns of zeros to a rectangular matrix to get a square matrix, applying the SVD to that square matrix, and then trimming the resulting matrix as well as either or (depending on which dimension of is smaller). We get a decomposition of the form where is an orthogonal matrix, is an orthogonal matrix, and is a rectangular diagonal matrix.
Theorem (Singular value decomposition)
Suppose that is an matrix. Then there exist orthogonal matrices and and a rectangular diagonal matrix such that
We call the a singular value decomposition (or SVD) of . The diagonal entries of are called the singular values of .
The diagonal entries of , which are the square roots of the eigenvalues of , are called the singular values of . The columns of are called left singular vectors, and the columns of are called right singular vectors.
Looking at the bottom half of the
As an example of how the singular value decomposition can be used to understand the structure of a linear transformation, we introduce the Moore-Penrose pseudoinverse of an matrix . We define to be , where is the matrix obtained by inverting each nonzero element of . The pseudoinverse is a swiss-army knife for solving the linear system :
- If is square and invertible, then
- If has no solution, then is the value of which minimizes (in other words, the closest thing to a solution you can get).
- If has multiple solutions, then is the solution with minimal norm.
Show that has SVD . Find its Moore-Penrose pseudoinverse.
and let , , and be the three matrices given in the problem statement.
We need to check that are orthogonal, and that . We can verify that are orthogonal by showing that their columns are orthogonal unit vectors. Equivalently, we may compute the products and and observe that they are identity matrices. Similarly, can be verified by hand or on a computer.
The formula for the Moore-Penrose pseudoinverse is
The matrix is obtained by inverting the nonzero elements on the diagonal of , and the transposing the resulting matrix.
With a little calculation, we arrive at
SVD and linear dependence
Linear dependence is numerically fragile: if the columns of a matrix (with more rows than columns) are linearly dependent, then perturbing the entries slightly by adding tiny independent random numbers is almost certain to result in a matrix with linearly independent columns. However, intuition suggests that subverting the principles of linear algebra in this way
This intuition is accurate, and it highlights the utility of having a generalization of the idea of linear independence which can quantify how close a list of vectors is to having linear dependence relationships, rather than remaining within the confines of the binary labels "linearly dependent" or "linearly independent". The singular value decomposition provides such a tool.
Define a matrix with 100 rows and 5 columns, and do it in such a way that two of the five columns are nearly equal to some linear combination of the other three. Calculate the singular values of the matrix, and make a conjecture about how the number of approximate linear dependencies could have been detected from the list of singular values.
import numpy as np
Solution. We see that two of the singular values are much smaller than the other three. (Remember that you have to run the cell twice to get the plot to show.)
import numpy as np import matplotlib.pyplot as plt A = np.random.standard_normal((100,5)) A[:,3] = A[:,2] + A[:,1] + 1e-2*np.random.standard_normal(100) A[:,4] = A[:,1] - A[:,0] + 1e-2*np.random.standard_normal(100) plt.bar(range(5),np.linalg.svd(A))
using Plots, LinearAlgebra A = randn(100, 5) A[:,4] = A[:,3] + A[:,2] + 1e-2*randn(100) A[:,5] = A[:,2] - A[:,1] + 1e-2*randn(100) bar(1:5, svdvals(A), label = "singular values")
We conjecture that very small singular values indicates that columns would need to be removed to obtain a matrix which does not have approximate linear dependence relationships among its columns.
In fact, the idea developed in this exercise is used by the NumPy function
np.linalg.matrix_rank to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have
np.linalg.matrix_rank computes the singular value decomposition and returns the number of
In fact, the idea developed in this exercise is used by the Julia function
rank to calculate the rank of a matrix. Because of the roundoff errors associated with representing real numbers in memory on a computer, most matrices with float entries technically have
rank computes the singular value decomposition and returns the number of
SVD and image compression
We close this section with a computational exercise illustrating another widely applicable feature of the singular value decomposition.
Show that if are the columns of , are the columns of , and are the diagonal entries of , then .
The equation is useful for compression, because terms with sufficiently small singular value factors can be dropped and the remaining vectors and singular values can be stored using less space. Suppose that is a matrix—how many entries does have, and how many entries do , , , , , have in total?
The Python code below creates a matrix with pixel values for the image shown. How many nonzero singular values does have? Explain how you can tell just from looking at the picture.
import numpy as np import matplotlib.pyplot as plt m = 80 n = 100 a = m // 8 b = m // 4 A = np.ones((m,n)) def pixel(i,j): if (a <= i <= b or m-b <= i <= m-a) and a <= j <= n-a: return 0 elif (a <= j <= b or n-b <= j <= n-a) and a <= i <= m-a: return 0 return 1 A = np.array([[pixel(i,j) for i in range(1,m+1)] for j in range(1,n+1)]) U, Σ, V = np.linalg.svd(A) plt.matshow(A)
using LinearAlgebra, Plots m = 80 n = 100 a = m ÷ 8 b = m ÷ 4 A = ones(m,n) function pixel(i,j) if (a ≤ i ≤ b || m-b ≤ i ≤ m-a) && a ≤ j ≤ n - a 0 elseif (a ≤ j ≤ b || n-b ≤ j ≤ n-a) && a ≤ i ≤ m - a 0 else 1 end end A = [pixel(i,j) for i=1:m,j=1:n] U, Σ, V = svd(A) heatmap(A)
- Now add some noise to the image:
B = A + 0.05*np.linalg.standard_normal((m,n))
B = A + 0.05 * randn(m, n)
Display this new matrix , and also find the matrix obtained by keeping only the first three terms of for this matrix . Which looks more like the original image : (i) or (ii) the three-term approximation of ?
Hint: you can achieve this computationally either by setting some singular values to 0 or by indexing the matrices , , and appropriately. Also, you will need the function
diagm to generate a diagonal matrix from the vector of values returned by
import numpy as np import matplotlib.pyplot as plt
- Let We need to show that We will do this by first showing that for all
Now, for all By definition of matrix-vector product, is a linear combination of the columns of with weights given by Since is diagonal, it is not hard to see that the th column of is Using definition of matrix-vector product again, we find that the th weight is the dot product of the th row of and But the th row of is by definition, and thus Therefore,
where is being treated as a matrix for all
By linearity of matrix multiplication,
and thus for all Since for all it follows that for any matrix In particular, if is the identity matrix in we have
has entries and combined have entries.
It can be seen from the picture that has kinds of columns: one whose components are all dark, another whose components are light in the middle, and the other whose components are dark on the outside and in the middle with strips of light in between. These columns are clearly linearly independent, and thus has rank Therefore has non-zero singular values.
We can select only the first three terms by suitably indexing the vectors, as follows:
U, Σ, V = np.linalg.svd(B) plt.matshow(U[:,:3] * np.diag(Σ[:3]) * V.T[:3,:])
U, Σ, V = svd(B) heatmap(U[:,1:3] * diagm(Σ[1:3]) * V'[1:3,:])