Stability and Numerical Aspects of Least Squares

Dr. Matthieu R Bloch

Monday, November 22, 2021

Logistics

  • General announcements

    • Assignment 6 to be posted… (grading traffic jam)

    • 4 lectures left! (No lecture on Wednesday November 24, 2021)

  • Midterm 2

    • 99% graded, grades released after Thanksiving weekend
    • Midterm solution during office hours on Tuesday November 23, 2021
  • Assignment 5

    • Grading finalized
  • Assignment 6 and 7

    • Posted this week, due date on Monday December 6 and Monday December 13

What’s on the agenda for today?

Image
Toddlers can do it!
  • Last time:

    • Stability of least squares
  • Today:

    • More on stability and numerical considerations
  • Reading: lecture notes 14/15/16

Stability of least squares

  • What if we observe \(\vecy = \matA\vecx_0+\vece\) and we apply the pseudo inverse? \(\hat{\vecx} = \matA^+\vecy\)

  • We can separate the error analysis into two components \[ \hat{\vecx}-\vecx_0 = \underbrace{\matA^+\matA\vecx_0-\vecx_0}_{\text{null space error}} + \underbrace{\matA^+\vece}_{\text{noise error}} \]

  • We will express the error in terms of the SVD \(\matA=\matU\boldsymbol{\Sigma}\matV^\intercal\) With

    • \(\set{\vecv_i}_{i=1}^r\) orthobasis of \(\text{row}(\matA)\), augmented by \(\set{\vecv_i}_{i=1}^{r+1}\in\ker{\matA}\) to form an orthobasis of \(\bbR^n\)
    • \(\set{\vecu_i}_{i=1}^r\) orthobasis of \(\text{col}(\matA)\), augmented by \(\set{\vecu}_{i=1}^{r+1}\in\ker{\matA^\intercal}\) to form an orthobasis of \(\bbR^m\)
  • The null space error is given by \[ \norm[2]{\matA^+\matA\vecx_0-\vecx_0}^2=\sum_{i=r+1}^n\abs{\dotp{\vecv_i}{x_0}}^2 \]

  • The noise error is given by \[ \norm[2]{\matA^+\vece}^2=\sum_{i=1}^r \frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2 \]

Stable reconstruction by truncation

  • How do we mitigate the effect of small singular values in reconstruction? \[ \hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy = \sum_{i=1}^r\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i \]

  • Truncate the SVD to \(r'<r\) \[ \matA_t\eqdef \sum_{i=1}^{r'}\sigma_i\vecu_i\vecv_i^\intercal\qquad\matA_t^+ = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\vecu_i\vecv_i^\intercal \]

  • Reconstruct \(\hat{\vecx_t} = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i=\matA_t\)

  • Error analysis: \[ \norm[2]{\hat{\vecx}_t-\vecx_0}^2 = \sum_{i=r+1}^n\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=r'+1}^r\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=1}^r'\frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2 \]

Stable reconstruction by regularization

  • Regularization means changing the problem to solve \[ \min_{\vecx\in\bbR^n}\norm[2]{\vecy-\matA\vecx}^2+\lambda\norm[2]{\vecx}^2\qquad\ \lambda>0 \]

  • The solution is \[ \hat{\vecx} = (\matA^\intercal\matA+\lambda\matI)^{-1}\matA^\intercal\vecy = \matV(\boldsymbol{\Sigma}^2+\lambda\matI)^{-1}\boldsymbol{\Sigma}\matU^\intercal\vecy \]

Numerical methods

  • We have seen several solutions to systems of linear equations \(\matA\vecx=\vecy\) so far

    • \(\matA\) full column rank: \(\hat{\bfx} = (\matA^\intercal\matA)^{-1}\matA^\intercal\bfy\)
    • \(\matA\) full row rank: \(\hat{\bfx} = \matA^\intercal(\matA\matA^\intercal)^{-1}\bfy\)
    • Ridge regression: \(\hat{\bfx} = (\matA^\intercal\matA+\delta\matI)^{-1}\matA^\intercal\bfy\)
    • Kernel regression: \(\hat{\bfx} = (\matK+\delta\matI)^{-1}\bfy\)
    • Ridge regression in Hilbert space: \(\hat{\bfx} = (\matA^\intercal\matA+\delta\matG)^{-1}\matA^\intercal\bfy\)
  • Extension: constrained least-squares \[ \min_{\vecx\in\bbR^n}\norm[2]{\vecy-\matA\vecx}^2\text{ s.t. } \vecx=\matB\vecalpha\text{ for some }\vecalpha \]

    • The solution is \(\hat{\bfx} = \matB(\matB^\intercal\matA^\intercal\matA\matB)^{-1}\matB^\intercal\matA^\intercal\bfy\)
  • All these problems involve a symmetric positive definite system of equations.

    • Many methods to achieve this based on matrix factorization

Easy systems

  • Diagonal system
    • \(\bfA\in\bbR^{n\times n}\) invertible and diagonal
    • \(O(n)\) complexity
  • Orthogonal system
    • \(\bfA\in\bbR^{n\times n}\) invertible and orthogonal
    • \(O(n^2)\) complexity
  • Lower triangular system
    • \(\bfA\in\bbR^{n\times n}\) invertible and lower diagonal
    • \(O(n^2)\) complexity
  • General strategy: factorize \(\matA\) to recover some of the structures above

Factorizations

  • LU factorization

  • Cholesky factorization

  • QR decomposition

  • SVD and eigenvalue decompositions

Computing eigenvalue decompositions for symetric matrices

  • Many techniques: we shall only discuss one based on power iterations