SVD

Understading the Geometry of \(A\in \mathbb{R}^{m\times n}\)

\[ A\in \mathbb{R}^{m\times n} \\ x\in \mathbb{R}^n \mapsto Ax \in \mathbb{R}^m \]

What we are formalizing is that there is a major axis and a minor axis to this ellipse and we are stretching in the major and minor axes and rotating the ellipse out of the coordinate plane.

Critical point of the ratio:

\[ R(\mathbf{v}) = \frac{||A\mathbf{v}||_2}{||\mathbf{v}||_2} \]

Since \(R(\alpha \mathbf{v}) = R(\mathbf{v})\), we can assume \(||\mathbf{v}||_2 = 1\).

\[ \max_{\mathbf{v}}||A\mathbf{v}||_2 \quad \text{subject to } ||\mathbf{v}||_2 = 1 \rightarrow A^TA\mathbf{v} = \lambda \mathbf{v}, \lambda \geq 0 \]

According to Spectral Theorem, \(A^TA\) is symmetric and has \(n\) real eigenvalues and \(n\) orthogonal eigenvectors \(\mathbf{v}_1, \dots, \mathbf{v}_n\) which span \(\mathbb{R}^n\).

Singular Value Decomposition

Define \(u_i^* = A\mathbf{v}_i\).

\[ \begin{aligned} &A^TA\mathbf{v}_i = \lambda_i \mathbf{v}_i \\ &\Rightarrow A^Tu_i^* = \lambda_i \mathbf{v}_i \\ &\Rightarrow A A^Tu_i^* = \lambda_i A\mathbf{v}_i = \lambda_i u_i^* \\ \end{aligned} \]

So, \(u_i^*\) is an eigenvector of \(AA^T\) when it is nonzero.

\[ ||u_i^*||_2 = ||A\mathbf{v}_i||_2 = \sqrt{\mathbf{v}_i^T A^TA \mathbf{v}_i} = \sqrt{\mathbf{v}_i^T \lambda_i \mathbf{v}_i} = \sqrt{\lambda_i} \]

Two cases:

  1. \(\lambda_i \neq 0 \Rightarrow u_i^* = A\mathbf{v}_i\) is an eigenvector of \(AA^T\) with \(||u_i^*||_2 = \sqrt{\lambda_i}\)
  2. \(\lambda_i = 0, u_i^* = 0\)

Take \(k:= \text{\# nonzero eigenvalues of } A^TA\).

\[ \exists \mathbf{v}_1, \dots, \mathbf{v}_k \in \mathbb{R}^n, u_1, \dots, u_k \in \mathbb{R}^m \text{ with: }\\ \begin{aligned} A^TA\mathbf{v}_i &= \lambda_i \mathbf{v}_i\\ AA^Tu_i &= \lambda_i u_i(u_i = u_i^* / ||u_i^*||_2)\\ A\mathbf{v}_i &= \sqrt{\lambda_i} u_i\\ A^Tu_i &= \sqrt{\lambda_i} \mathbf{v}_i\\ \end{aligned}\\ \]

Observation

\[ u_i^TA\mathbf{v}_j = (A^Tu_i)^T \mathbf{v}_j = \sqrt{\lambda_i} \mathbf{v}_i^T \mathbf{v}_j = \sqrt{\lambda_i} \delta_{ij}\\ \]

\[ \overline{U} = \begin{bmatrix} | & | & & | \\ u_1 & u_2 & \dots & u_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{m\times k} \\ \overline{V} = \begin{bmatrix} | & | & & | \\ \mathbf{v}_1 & \mathbf{v}_2 & \dots & \mathbf{v}_k \\ | & | & & | \\ \end{bmatrix} \in \mathbb{R}^{n\times k} \\ \overline{U}^TA\overline{V} = \overline{\Sigma},\\ \overline{\Sigma} = \text{diag}(\sqrt{\lambda_1}, \dots, \sqrt{\lambda_k}) \in \mathbb{R}^{k\times k} \]

Complete \(\overline{U} \rightarrow U \in \mathbb{R}^{m\times m}\) and \(\overline{V} \rightarrow V \in \mathbb{R}^{n\times n}\) by adding nullspace vectors (orthonormal columns).

Define \(\Sigma \in \mathbb{R}^{m\times n}\) s.t.

\[ \Sigma_{ij} = \begin{cases} \sqrt{\lambda_i}=\sigma_i & i = j \leq k \\ 0 & \text{otherwise} \end{cases}\\ A = U\Sigma V^T \\ U^TU = I_{m\times m}, V^TV = I_{n\times n} \] \(\Sigma\) is a diagonal matrix with nonnegative entries.

Geometry of SVD

\[ A = U\Sigma V^T \\ \]

  1. Rotate (\(V^T\))
  2. Scale (\(\Sigma\))
  3. Rotate (\(U\))

SVD Vocabulary

  • Left singular vectors: Columns of \(U\); span \(\text{col } A\)
  • Right singular vectors: Columns of \(V\); span \(\text{row } A\)
  • Singular values: Diagonal \(\sigma_i\) of \(\Sigma\); sort \(\sigma_1 \geq \sigma_2 \geq \dots \geq 0\)

Second Approach to SVD: Algebraic

\[ B:=\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix} \]

Proposition: Take \(\mathbf{x} = (\mathbf{x}_1, \mathbf{x}_2) \in \mathbb{R}^{m+n}\) to be an eigenvector of \(B\) with eigenvalue \(\lambda\), where \(\mathbf{x}_1 \in \mathbb{R}^m, \mathbf{x}_2 \in \mathbb{R}^n\). Then, \(\mathbf{x}':=(\mathbf{x}_1, -\mathbf{x}_2)\) is an eigenvector of \(B\) with eigenvalue \(-\lambda\).

On the board: \(A=U\Sigma V^T\) for diagonal \(\Sigma\)("economy" SVD)

Many SVD's

  • Full SVD: \(U \in \mathbb{R}^{m\times m}, \Sigma \in \mathbb{R}^{m\times n}, V \in \mathbb{R}^{n\times n}\)
  • Thin SVD: \(U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}\), where \(k = \min\{m, n\}\)
  • Compact SVD: \(U \in \mathbb{R}^{m\times k}, \Sigma \in \mathbb{R}^{k\times k}, V \in \mathbb{R}^{n\times k}\), where \(k = \text{rank } (A)\)

Computing SVD: Two Strategies

  1. Build from eigenvectors of \(A^TA\) and \(AA^T\)
  2. Build from eigenvectors of \(\begin{bmatrix} 0 & A^T \\ A & 0 \end{bmatrix}\)

\(\exists\) more specialized methods!

Solving Linear Systems with \(A=U\Sigma V^T\)

\[ A\mathbf{x} = \mathbf{b}\\ \Rightarrow U\Sigma V^T \mathbf{x} = \mathbf{b}\\ \Rightarrow \mathbf{x} = V\Sigma^{-1} U^T \mathbf{b} \]

Uniting Short/Tall Matrices

\[ \min_{\mathbf{x}} ||A\mathbf{x} - \mathbf{b}||_2\rightarrow A^TA\mathbf{x} = A^T\mathbf{b}\\ \]

该约束可以转换为:

\[ \textrm{minimize } ||\mathbf{x}||_2^2 \quad \text{subject to } A^T A \mathbf{x} = A^T \mathbf{b} \]

原因是原问题有三种情况: 1. \(A\)可逆(completely determined),有唯一解 2. \(A\)超定(overdetermined),有最小二乘解 3. \(A\)欠定(underdetermined),有无穷多解

任何情况都满足\(A^TA\mathbf{x} = A^T\mathbf{b}\)这个约束。

\[ A=U\Sigma V^T \Rightarrow A^TA \mathbf{x} = V\Sigma^T U^T U\Sigma V^T \mathbf{x} = V\Sigma^T \Sigma V^T \mathbf{x},\\ A^T \mathbf{b} = V\Sigma^T U^T \mathbf{b}\\ \therefore A\mathbf{x} = \mathbf{b} \Rightarrow \Sigma^T \Sigma V^T \mathbf{y} = \mathbf{d}, \mathbf{y}:= V^T \mathbf{x}, \mathbf{d}:= U^T \mathbf{b}\\ \]

可以转化为新的问题:

\[ \min ||\mathbf{y}||_2^2 \quad \text{s.t. } \Sigma^T \Sigma \mathbf{y} = \mathbf{d}\\ \rightarrow \sigma_i^2 y_i = d_i \quad \text{for } i = 1, \dots, k \Rightarrow y_i = \begin{cases} \frac{d_i}{\sigma_i} & \sigma_i \neq 0 \\ \mathbf{0} & \sigma_i = 0 \end{cases}\\ \Rightarrow \mathbf{y} = \Sigma^{+} \mathbf{d}, \Sigma^{+} = \text{diag}(\frac{1}{\sigma_i} \text{if } \sigma_i \neq 0, 0 \text{ otherwise})\\ \Rightarrow \mathbf{x} =V \mathbf{y} = V\Sigma^{+} U^T \mathbf{b} \]

Pseudoinverse

\[ A^+ = V\Sigma^{+} U^T \]

Properties

  • \(A\) square and invertible \(\Rightarrow A^+ = A^{-1}\)
  • \(A\) overdetermined \(\Rightarrow A^+ \mathbf{b}\) gives least-squares solution to \(A\mathbf{x} \approx \mathbf{b}\)
  • \(A\) underdetermined \(\Rightarrow A^+ \mathbf{b}\) gives least-squares solution to \(A\mathbf{x} \approx \mathbf{b}\) with least (Euclidean) norm

Alternative Form

\[ A=U\Sigma V^T \Rightarrow A = \sum_{i=1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T\\ l:= \min\{m, n\}\\ A\mathbf{x} = \sum_{i=1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{x}) \approx \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T \mathbf{x} \]

Outer Product

\[ \mathbf{u} \otimes \mathbf{v} := \mathbf{u} \mathbf{v}^T \]

Computing \(A\mathbf{x}\):

\[ A\mathbf{x} = \sum_{i=1} \sigma_i (\mathbf{v}_i \cdot \mathbf{x}) \mathbf{u}_i \]

Trick: Ignore small \(\sigma_i\).

\[ A^+ = \sum_{\sigma_i \neq 0} \frac{\mathbf{v}_i \mathbf{u}_i^T}{\sigma_i} \]

Trick: Ignore large \(\sigma_i\).

Eckart-Young Theorem

Suppose \(\tilde{A}\) is obtained from \(A=U\Sigma V^T\) by truncating all but the \(k\) largest singular values \(\sigma_i\) of \(A\) to zero. Then, \(\tilde{A}\) minimizes both \(||A-\tilde{A}||_{\text{Fro}}\) and \(||A-\tilde{A}||_2\) subject to the constant that the column space of \(\tilde{A}\) has at most dimension \(k\).

Proof:

\[ A=U\Sigma V^T, \sigma_1 \geq \sigma_2 \geq \dots \geq \sigma_l \]

Define \(A_k = \sum_{i=1}^k \sigma_i \mathbf{u}_i \mathbf{v}_i^T, k \leq l\).

\[ \begin{aligned} ||A-A_k||_2 &= ||\sum_{i=k+1}^l \sigma_i \mathbf{u}_i \mathbf{v}_i^T||_2 = ||M||_2 = \sigma_{k+1} \end{aligned} \]

The last equality is because:

\[ \mathbf{x} = \mathbf{v}_{k+1} \Rightarrow M\mathbf{x} = \sum_{i=k+1}^l \sigma_i \mathbf{u}_i (\mathbf{v}_i^T \mathbf{v}_{k+1}) = \sigma_{k+1} \mathbf{u}_{k+1} \Rightarrow ||M||_2 = \sigma_{k+1} \]

Suppose \(B = XY^T\) (B is rank \(k\)), where \(X \in \mathbb{R}^{m\times k}, Y \in \mathbb{R}^{n\times k}\). Define \(V_{k+1} = \begin{bmatrix} \mathbf{v}_1 & \dots & \mathbf{v}_{k+1} \end{bmatrix} \in \mathbb{R}^{n\times (k+1)}\). We know \(Y^T V_{k+1}\) has a nullspace \(\Rightarrow \exists \mathbf{z} \neq \mathbf{0}\) s.t. \(Y^T V_{k+1} \mathbf{z} = \mathbf{0}\). Scale \(\mathbf{z}\) s.t. \(||\mathbf{z}||_2 = 1\).

\[ \begin{aligned} ||A-B||_2^2 &\geq \max_{||\mathbf{q}||_2 = 1} ||(A-B)\mathbf{q}||_2^2 \\ &\geq ||(A-B)V_{k+1} \mathbf{z}||_2^2 = ||(U\Sigma V^T - XY^T)V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z} - XY^T V_{k+1} \mathbf{z}||_2^2 \\ &= ||U\Sigma V^T V_{k+1} \mathbf{z}||\\ &= \sum_{i=1}^{k+1} z_i^2 \sigma_i^2 \geq \sigma_{k+1}^2 = ||A-A_k||_2^2 \end{aligned} \]

Matrix Norm Expressions

$$ ||A||_{}^2 = _i^2\ ||A||_2 = {_i}\ (A) =

作者

Gehan Zheng

发布于

2023-12-24

更新于

2024-01-12

许可协议

评论