# Regression

Wednesday October 06, 2021

## Logistics

• Assignment 4 assigned Tuesday, October 5, 2021

• Includes a (small) programming component

• Due October 14, 2021 (soft deadline, hard deadline on October 16)

## What’s on the agenda for today?

• Last time: Least-square regression

• Today

• Solving linear least-square regression

• Extension to infinite dimension

• Reading: Romberg, lecture notes 8

## Solving the least-squares problem

• Any solution $\bftheta^*$ to the problem $\min_{\bftheta\in\bbR^d} \norm{\bfy-\matX\bftheta}^2$ must satisfy $\matX^\intercal\matX\bftheta^* = \matX^\intercal\vecy$ This system is called normal equations
• Facts: for any matrix $\bfA\in\bbR^{m\times n}$

• $\ker{\bfA^\intercal\bfA}=\ker{\bfA}$

• $\text{col}(\bfA^\intercal\bfA)=\text{row}(\bfA)$

• $\text{row}(\bfA)$ and $\ker{\bfA}$ are orthogonal complements

• We can say a lot more about the normal equations

1. There is always a solution
2. If $\textsf{rank}(\bfX)=d$, there is a unique solution: $(\matA^\intercal\matA)^{-1}\matA^\intercal \bfy$
3. if $\textsf{rank}(\bfX)<d$ there are infinitely many non-trivial solution
4. if $\textsf{rank}(\bfX)=n$, there exists a solution $\bftheta^*$ for which $\bfy=\bfX\bftheta^*$
• In machine learning, there are often infinitely many solutions

## Minimum norm 2 solutions

• One reasonable to choose a solution among infinitely many is the minimum energy principle $\min_{\bftheta\in\bbR^d}\norm{\bftheta}^2\text{ such that } \bfX^\intercal\bfX\bftheta = \bfX^\intercal\bfy$

• We will see the solution is always unique using the SVD
• For now, assume that $\textsf{rank}(\bfX)=d$, so that the problem becomes $\min_{\bftheta\in\bbR^d}\norm{\bftheta}^2\text{ such that } \bfX\bftheta = \bfy$

• The solution is $\bftheta^*=\bfA^\intercal(\bfA\bfA^\intercal)^{-1}\bfy$

## Regularization

• Recall the problem $\min_{\bftheta\in\bbR^d}\norm{\bftheta}^2\text{ such that } \bfX^\intercal\bfX\bftheta = \bfX^\intercal\bfy$
• There are infinitely many solution if $\ker{\bfX}$ is non trivial
• The space of solution is unbounded!
• Even if $\ker{\bfX}=\set{0}$, the system can be poorly conditioned
• Regularization with $\lambda>0$ consists in solving $\min_{\bftheta\in\bbR^d}\norm{\bfy-\bfX\bftheta}^2 + \lambda\norm{\bftheta}^2$
• This problem always has a unique solution
• The solution is $\bftheta^*=(\bfX^\intercal\bfX+\lambda\bfI)^{-1}\bfX^\intercal\bfy = \bfX^\intercal(\bfX\bfX^\intercal+\lambda\bfI)^{-1}\bfy$
• Note that $\bftheta^*$ is the row space of $\matX$ $\bftheta^* = \matX\bfalpha\textsf{ with } \bfalpha =(\bfX\bfX^\intercal+\lambda\bfI)^{-1}\bfy$