Regression

Dr. Matthieu R Bloch

Wednesday October 06, 2021

Logistics

  • Assignment 4 assigned Tuesday, October 5, 2021

    • Includes a (small) programming component

    • Due October 14, 2021 (soft deadline, hard deadline on October 16)

What’s on the agenda for today?

  • Last time: Least-square regression

  • Today

    • Solving linear least-square regression

    • Extension to infinite dimension

  • Reading: Romberg, lecture notes 8

Solving the least-squares problem

  • Any solution \(\bftheta^*\) to the problem \(\min_{\bftheta\in\bbR^d} \norm[2]{\bfy-\matX\bftheta}^2\) must satisfy \[ \matX^\intercal\matX\bftheta^* = \matX^\intercal\vecy \] This system is called normal equations
  • Facts: for any matrix \(\bfA\in\bbR^{m\times n}\)

    • \(\ker{\bfA^\intercal\bfA}=\ker{\bfA}\)

    • \(\text{col}(\bfA^\intercal\bfA)=\text{row}(\bfA)\)

    • \(\text{row}(\bfA)\) and \(\ker{\bfA}\) are orthogonal complements

  • We can say a lot more about the normal equations

    1. There is always a solution
    2. If \(\textsf{rank}(\bfX)=d\), there is a unique solution: \((\matA^\intercal\matA)^{-1}\matA^\intercal \bfy\)
    3. if \(\textsf{rank}(\bfX)<d\) there are infinitely many non-trivial solution
    4. if \(\textsf{rank}(\bfX)=n\), there exists a solution \(\bftheta^*\) for which \(\bfy=\bfX\bftheta^*\)
  • In machine learning, there are often infinitely many solutions

Minimum norm 2 solutions

  • One reasonable to choose a solution among infinitely many is the minimum energy principle \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bftheta}^2\text{ such that } \bfX^\intercal\bfX\bftheta = \bfX^\intercal\bfy \]

    • We will see the solution is always unique using the SVD
  • For now, assume that \(\textsf{rank}(\bfX)=d\), so that the problem becomes \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bftheta}^2\text{ such that } \bfX\bftheta = \bfy \]

  • The solution is \(\bftheta^*=\bfA^\intercal(\bfA\bfA^\intercal)^{-1}\bfy\)

Regularization

  • Recall the problem \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bftheta}^2\text{ such that } \bfX^\intercal\bfX\bftheta = \bfX^\intercal\bfy \]
    • There are infinitely many solution if \(\ker{\bfX}\) is non trivial
    • The space of solution is unbounded!
    • Even if \(\ker{\bfX}=\set{0}\), the system can be poorly conditioned
  • Regularization with \(\lambda>0\) consists in solving \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bfy-\bfX\bftheta}^2 + \lambda\norm[2]{\bftheta}^2 \]
    • This problem always has a unique solution
  • The solution is \(\bftheta^*=(\bfX^\intercal\bfX+\lambda\bfI)^{-1}\bfX^\intercal\bfy = \bfX^\intercal(\bfX\bfX^\intercal+\lambda\bfI)^{-1}\bfy\)
  • Note that \(\bftheta^*\) is the row space of \(\matX\) \[ \bftheta^* = \matX\bfalpha\textsf{ with } \bfalpha =(\bfX\bfX^\intercal+\lambda\bfI)^{-1}\bfy \]