Prof. Matthieu Bloch
Wednesday, September 18, 2024 (v1.0)
Any solution \(\bftheta^*\) to the problem \(\min_{\bftheta\in\bbR^d} \norm[2]{\bfy-\matX\bftheta}^2\) must satisfy \[ \matX^\intercal\matX\bftheta^* = \matX^\intercal\vecy \] This system is called normal equations
Facts: for any matrix \(\bfA\in\bbR^{m\times n}\)
\(\ker{\bfA^\intercal\bfA}=\ker{\bfA}\)
\(\text{col}(\bfA^\intercal\bfA)=\text{row}(\bfA)\)
\(\text{row}(\bfA)\) and \(\ker{\bfA}\) are orthogonal complements
We can say a lot more about the normal equations
In machine learning, there are often infinitely many solutions
Reasonable approach: choose a solution among infinitely many with the minimum energy principle \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bftheta}^2\text{ such that } \bfX^\intercal\bfX\bftheta = \bfX^\intercal\bfy \]
For now, assume that \(\textsf{rank}(\bfX)=n\), so that the problem becomes \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bftheta}^2\text{ such that } \bfX\bftheta = \bfy \]
If \(\textsf{rank}(\bfX)=n\) the solution is \(\bftheta^*=\matX^\intercal(\matX\matX^\intercal)^{-1}\bfy\)
The solution is \(\bftheta^*=(\bfX^\intercal\bfX+\lambda\bfI)^{-1}\bfX^\intercal\bfy = \bfX^\intercal(\bfX\bfX^\intercal+\lambda\bfI)^{-1}\bfy\)
We can adapt the regularization approach to the situation of a finite dimension Hilbert space \(\calF\) \[ \min_{f\in\calF}\sum_{i=1}^n(y_i-f(\bfx_i))^2 + \lambda\norm[\calF]{f}^2 \]
Using a basis for the space \(\set{\psi_i}_{i=1}^d\) , and constructing \(\boldsymbol{\Psi}\) as earlier, we obtain \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bfy-\boldsymbol{\Psi}\bftheta}^2 + \lambda \bftheta^\intercal\matG\bftheta \] with \(\matG\) the Gram matrix for the basis.
If \(\boldsymbol{\Psi}^\intercal \boldsymbol{\Psi}+\lambda\matG\) is invertible, we find the solution as \[ \bftheta^* = (\boldsymbol{\Psi}^\intercal \boldsymbol{\Psi}+\lambda\matG)^{-1}\boldsymbol{\Psi}^\intercal \bfy \] and we can reconstruct the function as \(f(\bfx) = \sum_{i=1}^d\theta_i^*\psi_{i}(\bfx)\).
If \(\bfG\) is well conditioned, the resulting function is not too sensitive to the choice of the basis
In \(\bbR^d\), the problem \(\min_{\bftheta\in\bbR^d}\norm[2]{\bfy-\bfX\bftheta}^2 + \lambda\norm[2]{\bftheta}^2\) has a solution
Let \(\calF\) be a Hilbert space and let \(f\in\calF\) be the function we are trying to estimate
We will estimate \(f\in\calF\) using noisy observations \(\dotp{f}{x_i}\) with \(\set{x_i}_{i=1}^n\) elements of \(\calF\)
This is the equivalent of saying \(\bfy = \bfA\bfx+\bfn\) in finite dimension
\[ \min_{f\in\calF}\sum_{i=1}^n\abs{y_i-{\dotp{f}{x_i}}_{\calH}}^2+\lambda\norm[\calH]{f} \] has solution \[ f = \sum_{i=1}^n\alpha_i x_i\textsf{ with } \bfalpha = (\matK+\lambda\matI)^{-1}\vecy\qquad \matK=\mat{c}{\dotp{x_i}{x_j}}_{1\leq i,j\leq n} \]