Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, November 11, 2024

Last time

  • Last class: Wednesday November 06, 2024
    • We talked about using the SVD to solve \(\vecy=\matA\vecx\)
    • We analyzed the reconstruction error
  • Today: We will talk about mitigating the reconstruction error
  • To be effectively prepared for today's class, you should have:
    1. Gone over slides and read associated lecture notes here and there
    2. Planned to submit Homework 6 (due Thursday November 07, 2024)
  • Logistics:
    • Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
    • Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
    • Dr. Bloch office hours: Friday November 08, 2024 6pm-7pm online
  • Homework 7: due Monday November 18, 2024

What's next for this semester

  • Lecture 21 - Monday November 4, 2024: SVD and least squares
  • Lecture 22 - Wednesday November 6, 2024: Gradient descent
    • Homework 6 due on Thursday November 7, 2024
  • Lecture 23 - Monday November 11, 2024: Estimation
  • Lecture 24 - Wednesday November 13, 2024: Estimation
  • Lecture 25 - Monday November 18, 2024: Classification and Regression
    • Homework 7 due on Friday November 15. 2024
  • Lecture 26 - Wednesday November 20, 2024: Classification and Regression
  • Lecture 27 - Monday November 25, 2024: Principal Component Analysis
    • Homework 8 due
  • Lecture 28 - Monday December 2, 2024: Principal Component Analysis

Stability of least squares

  • What if we observe \(\vecy = \matA\vecx_0+\vece\) and we apply the pseudo inverse? \(\hat{\vecx} = \matA^+\vecy\)
  • We can separate the error analysis into two components \[ \hat{\vecx}-\vecx_0 = \underbrace{\matA^+\matA\vecx_0-\vecx_0}_{\text{null space error}} + \underbrace{\matA^+\vece}_{\text{noise error}} \]

  • We will express the error in terms of the SVD \(\matA=\matU\boldsymbol{\Sigma}\matV^\intercal\) With

    • \(\set{\vecv_i}_{i=1}^r\) orthobasis of \(\text{row}(\matA)\), augmented by \(\set{\vecv_i}_{i=r+1}^{n}\in\ker{\matA}\) to form an orthobasis of \(\bbR^n\)
    • \(\set{\vecu_i}_{i=1}^r\) orthobasis of \(\text{col}(\matA)\), augmented by \(\set{\vecu}_{i=t+1}^{m}\in\ker{\matA^\intercal}\) to form an orthobasis of \(\bbR^m\)
  • The null space error is given by \[ \norm[2]{\matA^+\matA\vecx_0-\vecx_0}^2=\sum_{i=r+1}^n\abs{\dotp{\vecv_i}{x_0}}^2 \]

  • The noise error is given by \[ \norm[2]{\matA^+\vece}^2=\sum_{i=1}^r \frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2 \]

Stable reconstruction by truncation

  • How do we mitigate the effect of small singular values in reconstruction? \[ \hat{\vecx} = \matV\boldsymbol{\Sigma}^{-1}\matU^\intercal\vecy = \sum_{i=1}^r\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i \]

  • Truncate the SVD to \(r'<r\) \[ \matA_t\eqdef \sum_{i=1}^{r'}\sigma_i\vecu_i\vecv_i^\intercal\qquad\matA_t^+ = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\vecu_i\vecv_i^\intercal \]

  • Reconstruct \(\hat{\vecx_t} = \sum_{i=1}^{r'}\frac{1}{\sigma_i}\dotp{\vecy}{\vecu_i}\vecv_i=\matA_t\)

  • Error analysis: \[ \norm[2]{\hat{\vecx}_t-\vecx_0}^2 = \sum_{i=r+1}^n\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=r'+1}^r\abs{\dotp{\vecx_0}{\vecv_i}}^2+\sum_{i=1}^{r'}\frac{1}{\sigma_i^2}\abs{\dotp{\vece}{\vecu_i}}^2 \]

Stable reconstruction by regularization

  • Regularization means changing the problem to solve \[ \min_{\vecx\in\bbR^n}\norm[2]{\vecy-\matA\vecx}^2+\lambda\norm[2]{\vecx}^2\qquad\ \lambda>0 \]

  • The solution is \[ \hat{\vecx} = (\matA^\intercal\matA+\lambda\matI)^{-1}\matA^\intercal\vecy = \matV(\boldsymbol{\Sigma}^2+\lambda\matI)^{-1}\boldsymbol{\Sigma}\matU^\intercal\vecy \]

Probability and statistics

  • Probabilities play a huge role in machine learning
    • You have to be comfortable with standard concepts
    • I won't have time to review everything thing
  • Resources
    • Review of Probability by Dr. Romberg
    • "Probabilities" notes on Canvas with supporting videos here, here and there
    • Use these to review your comfort and operational knowledge of probabilities
    • Homework 7
    • Office hours

Gaussian estimation:

Next time

  • Next class: Wednesday November 20, 2024
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes here and there
    2. Work on Homework 7
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)