# Stability and Numerical Aspects of Least Squares

Wednesday, November 17, 2021

## Logistics

• General announcements

• Assignment 6 to be posted… (grading traffic jam)

• 6 lectures left!

• Assignment 5 and Midterm 2:

• Grading starting, we’ll keep you posted

## What’s on the agenda for today?

• Last time:

• Error analysis of least squares
• Today

• Stability of least squares
• Numerical considerations

## Stability of least squares

• What if we observe $\vecy = \matA\vecx_0+\vece$ and we apply the pseudo inverse? $\hat{\vecx} = \matA^+\vecy$

• We can separate the error analysis into two components $\hat{\vecx}-\vecx_0 = \underbrace{\matA^+\matA\vecx_0-\vecx_0}_{\text{null space error}} + \underbrace{\matA^+\vece}_{\text{noise error}}$

• We will express the error in terms of the SVD $\matA=\matU\boldsymbol{\Sigma}\matV^\intercal$ With

• $\set{\vecv_i}_{i=1}^r$ orthobasis of $\text{row}(\matA)$, augmented by $\set{\vecv_i}_{i=1}^{r+1}\in\ker{\matA}$ to form an orthobasis of $\bbR^n$
• $\set{\vecu_i}_{i=1}^r$ orthobasis of $\text{col}(\matA)$, augmented by $\set{\vecu}_{i=1}^{r+1}\in\ker{\matA^\intercal}$ to form an orthobasis of $\bbR^m$
• The null space error is given by $\norm[2]{\matA^+\matA\vecx_0-\vecx_0}^2=\sum_{i=r+1}^n\abs{\dotp{\vecv_i}{x_0}}^2$

• The noise error is given by $\norm[2]{\matA^+\vece}^2=\sum_{i=1}^r \frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2$

## Stable reconstruction by truncation

• How do we mitigate the effect of small singular values in reconstruction? $\hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy = \sum_{i=1}^r\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i$

• Truncate the SVD to $r'<r$ $\matA_t\eqdef \sum_{i=1}^{r'}\sigma_i\vecu_i\vecv_i^\intercal\qquad\matA_t^+ = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\vecu_i\vecv_i^\intercal$

• Reconstruct $\hat{\vecx_t} = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i=\matA_t$

• Error analysis: $\norm[2]{\hat{\vecx}_t}^2 = \sum_{i=r+1}^n\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=r'+1}^r\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=1}^r'\frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2$

## Stable reconstruction by regularization

• Regularization means changing the problem to solve $\min_{\vecx\in\bbR^n}\norm[2]{\vecy-\matA\vecx}^2+\lambda\norm[2]{\vecx}^2\qquad\ \lambda>0$

• The solution is $\hat{\vecx} = (\matA^\intercal\matA+\lambda\matI)^{-1}\matA^\intercal\vecy = \matV(\boldsymbol{\Sigma}^2+\lambda\matI)^{-1}\boldsymbol{\Sigma}\matU^\intercal\vecy$

## Numerical methods

• We have seen several solutions to systems of linear equations $\matA\vecx=\vecy$ so far

• $\matA$ full column rank: $\hat{\bfx} = (\matA^\intercal\matA)^{-1}\matA^\intercal\bfy$
• $\matA$ full row rank: $\hat{\bfx} = \matA^\intercal(\matA\matA^\intercal)^{-1}\bfy$
• Ridge regression: $\hat{\bfx} = (\matA^\intercal\matA+\delta\matI)^{-1}\matA^\intercal\bfy$
• Kernel regression: $\hat{\bfx} = (\matK+\delta\matI)^{-1}\bfy$
• Ridge regression in Hilbert space: $\hat{\bfx} = (\matA^\intercal\matA+\delta\matG)^{-1}\matA^\intercal\bfy$
• Extension: constrained least-squares $\min_{\vecx\in\bbR^n}\norm[2]{\vecy-\matA\vecx}^2\text{ s.t. } \vecx=\matB\vecalpha\text{ for some }\vecalpha$

• The solution is $\hat{\bfx} = \matB(\matB^\intercal\matA^\intercal\matA\matB)^{-1}\matB^\intercal\matA^\intercal\bfy$
• All these problems involve a symmetric positive definite system of equations.

• Many methods to achieve this based on matrix factorization

## Easy systems

• Diagonal system
• $\bfA\in\bbR^{n\times n}$ invertible and diagonal
• $O(n)$ complexity
• Orthogonal system
• $\bfA\in\bbR^{n\times n}$ invertible and orthogonal
• $O(n^2)$ complexity
• Lower triangular system
• $\bfA\in\bbR^{n\times n}$ invertible and lower diagonal
• $O(n^2)$ complexity
• General strategy: factorize $\matA$ to recover some of the structures above

## Factorizations

• LU factorization
• Cholesky factorization
• QR decomposition