Singular Value Decomposition

Subscribe Send me a message home page tags


This post is a reading note of the first chapter of the book Data-Driven Science and Engineering.

Notation

matrix_notation.png

Basic Information

Definition. The SVD is a unique matrix decomposition that exists for every complex-valued matrix \(X \in \mathbb{C}^{n \times m} \):

$$ X = U \Sigma V^{*} $$

where \(U \in \mathbb{C}^{n \times n} \) and \(V \in \mathbb{C}^{m \times m} \) are unitary matrices with orthonormal columns and \( \Sigma \in \mathbb{R}^{n \times m}\) is a matrix with real, nonnegative entires on the dianogal and zeros off the diagonal.

The form given in the definition is a full SVD, there are other two representations of SVD:

  1. Economy SVD
  2. Truncated SVD

Their relationships are illustrated in the figure below.

SVD_diagram.png

We can show that

$$ \begin{eqnarray} XX^{*}U & = & U \begin{pmatrix} \hat{\Sigma}^{2} & 0 \\ 0 & 0 \end{pmatrix} \\ X^{*}XV & = & V \hat{\Sigma}^{2} \end{eqnarray} $$

And note that \( X X^{*} \) is the covariance matrix of \(X\) (a row represents a sequence of measurements of a specific attribute). It follows that \( U \) captures the correlation in the rows of \(X\).

correlation_matrix.png

SVD Application

There are two major use case of SVD mentioned in this chapter:

  1. Approxumation
  2. Feature extraction/projection

Approximation

Truncated SVD can be used to approximate a given matrix \(X\) by a low-rank matrix. In the original matrix \(X\), there are \( n \times m \) items. In the truncated SVD, we have

$$ X \approx \tilde{U} \tilde{\Sigma} \tilde{V}^* $$

where, we have \( n \times r \) in \( \tilde{U} \), \( r \times r \) in \( \tilde{\Sigma} \) and \( r \times m \) in \( \tilde{V} \).

According to Eckart–Young–Mirsky theorem, this is also the optimal rank-r approximation to X in a least-squares sense.

Feature Extraction/Projection

As we mentioned earlier, matrix \(U\) captures the correlation in the rows of \(X\). Moreover, each column in \(X\) represents a measurement of different attributes and the column vectors in \(U\) has the same size as the column vectors in \(X\). Therefore, the column space of \(U\) can be used to characterize the column space of \(X\) and a sub-space of \(col(U)\) captures a subset of characteristics of \(X\).

Suppose we have a new measurement \(x\) and we want to extract main features from it. In order to do that, we need to calcuate the projection of \(x\) to the sub space of \(col(U)\). This can be done using the following formular:

$$ x_{projected} = \tilde{U} \tilde{U}^* x $$

----- END -----

Welcome to join reddit self-learning community.
Send me a message Subscribe to blog updates

Want some fun stuff?

/static/shopping_demo.png