Singular value decomposition

Logistics

General announcements
- Let me know if you have concerns about your grades (disagreements, etc.)
- Be careful with honor code
Office hours on Friday November 05, 2021
- 8am-9:30am on BlueJeans (BlueJeans)
- Focus on Midterm 2 preparation
Midterm 2:
- Moved to Monday November 8, 2021 (gives you weekend to prepare)
- Coverage: everything since Midterm 1 (dont’ forget the fundamentals though), emphasis on regression

Last time:
- Symmetric matrices and spectral theorem
- Objective: further understand least-square problems
Today: singular value decomposition
Reading: lecture notes 12/13

Every complex matrix $A$ has at least one complex eigenvector and every real symmetrix matrix has real eigenvalues and at least one real eigenvector.
Every matrix $A \in C^{n \times n}$ is unitarily similar to an upper triangular matrix, i.e., $A = V Δ V^{†}$ with $Δ$ upper triangular and $V^{†} = V^{- 1}$ .
Every hermitian matrix is unitarily similar to a real-valued diagonal matrix.
Note that if $A = V D V^{†}$ then $A = \sum_{i = 1}^{n} λ_{i} v_{i} v_{i}^{†}$
How about real-valued matrices $A \in R^{n \times n}$ ?

A symmetric matrice $A$ is positive definite if it has positive eigenvalues, i.e., $\forall i \in {1, \dots, n} λ_{i} > 0$ .

A symmetric matrice $A$ is positive semidefinite if it has nonnegative eigenvalues, i.e., $\forall i \in {1, \dots, n} λ_{i} \geq 0$ .
Convention: $λ_{1} \geq λ_{2} \geq \dots \geq λ_{n}$
Variational form of extreme eigenvalues for symmetric positive definite matrices $A$ $\begin{aligned} λ_{1} & = max_{x \in R^{n} : {‖ x ‖}_{2} = 1} x^{⊺} A x = max_{x \in R^{n}} \frac{x^{⊺} A x}{{‖ x ‖}_{2}^{2}} \\ λ_{n} & = min_{x \in R^{n} : {‖ x ‖}_{2} = 1} x^{⊺} A x = min_{x \in R^{n}} \frac{x^{⊺} A x}{{‖ x ‖}_{2}^{2}} \end{aligned}$
For any analytic function $f$ , we have $f (A) = \sum_{i = 1}^{n} f (λ_{i}) v_{i} v_{i}^{⊺}$

Consider the system $y = A x$ with $A$ symmetric positive definite
Let ${v_{i}}$ be the eigenvectors of $A$ . $x = \sum_{i = 1}^{n} \frac{1}{λ_{i}} {⟨ y, v_{i} ⟩}_{} v_{i}$
Assume some observation error $y = A x + e$ , with $e$ unknown, and we reconstruct $x$ as $\tilde{x}$ by applying $A^{- 1}$
$\frac{1}{λ_{1}^{2}} {‖ e ‖}_{2}^{2} \leq {‖ x - \tilde{x} ‖}_{2} \leq \frac{1}{λ_{n}^{2}} {‖ e ‖}_{2}^{2} .$
If $e \sim N (0, σ^{2} I)$ , then $E_{} [{‖ x - \tilde{x} ‖}_{2}] = σ^{2} \sum_{i = 1}^{n} \frac{1}{λ_{i}^{2}}$

What happens for non-square matrice?
Let $A \in R^{m \times n}$ with $rank (A) = r$ . Then $A = U Σ V^{T}$ where
- $U \in R^{m \times r}$ such that $U^{⊺} U = I_{r}$ (orthonormal columns)
- $N \in R^{n \times r}$ such that $V^{⊺} V = I_{r}$ (orthonormal columns)
- $Σ \in R^{r \times r}$ is diagonal with positive entries $Σ ≜ [\begin{array}{cccc} σ_{1} & 0 & 0 & \dots \\ 0 & σ_{2} & 0 & \dots \\ ⋮ & ⋱ \\ 0 & \dots & \dots & σ_{r} \end{array}]$ and $σ_{1} \geq σ_{2} \geq \dots \geq σ_{r} > 0$ . The $σ_{i}$ are called the singular values

We say that $A$ is full rank is $r = min (m, n)$

We can write $A = \sum_{i = 1}^{r} σ_{i} u_{i} v_{i}^{⊺}$

The columns of $V$ ${v_{i}}_{i = 1}^{r}$ are eigenvectors of the psd matrix $A^{⊺} A$ . ${σ_{i} : 1 \leq i \leq n and σ_{i} \neq 0}$ are the square roots of the non-zero eigenvalues of $A^{⊺} A$ .
The columns of $U$ ${u_{i}}_{i = 1}^{r}$ are eigenvectors of the psd matrix $A A^{⊺}$ . ${σ_{i} : 1 \leq i \leq n and σ_{i} \neq 0}$ are the square roots of the non-zero eigenvalues of $A A^{⊺}$ .
The columns of $V$ form an orthobasis for $row (A)$
The columns of $U$ form an orthobasis for $col (A)$
Equivalent form of the SVD: $A = \tilde{U} \tilde{Σ} {\tilde{V}}^{T}$ where
- $\tilde{U} \in R^{m \times m}$ is orthonormal
- $\tilde{V} \in R^{n \times n}$ is orthonormal
- $\tilde{Σ} \in R^{m \times n}$ is $\tilde{Σ} ≜ [\begin{array}{cc} Σ & 0 \\ 0 & 0 \end{array}]$

When we cannot solve $y = A x$ , we solve instead $min_{x \in R^{n}} {‖ x ‖}_{2}^{2} such that A^{⊺} A x = A^{⊺} y$
- This allows us to pick the minimum norm solution among potentially infinitely many solutions of the normal equations.
Recall: when $A \in R^{m \times n}$ is of rank $n$ , then $x = A^{⊺} (A A^{⊺})^{- 1} y$
The solution of $min_{x \in R^{n}} {‖ x ‖}_{2}^{2} such that A^{⊺} A x = A^{⊺} y$ is $\hat{x} = V Σ^{- 1} U^{⊺} y$ where $A = U Σ V^{T}$ is the SVD of $A$ .

$A^{+} = V Σ^{- 1} U^{⊺}$ is called the pseudo-inverse, Lanczos inverse, or Moore-Penrose inverse of $A = U Σ V^{T}$ .
If $A$ is square invertible then $A^{+} = A$
If $m \geq n$ (tall and skinny matrix) of rank $n$ then $A^{+} = (A^{⊺} A)^{- 1} A^{⊺}$
If $m \geq m$ (short and fat matrix) of rank $m$ then $A^{+} = A^{⊺} (A A^{⊺})^{- 1}$
Note $A^{+}$ is as “close” to an inverse of $A$ as possible