Dr. Matthieu R Bloch

Monday October 04, 2021

**Assignment 4**assigned tonightIncludes a programming component

Due

**October 13, 2021**(soft deadline, hard deadline on October 15)

**Last time**: Non-Orthobases- Dual basis

**Today**- Wrap up non-orthobases in infinite dimension
- Least-square regression

**Reading:**Romberg, lecture notes 7/8

- \(\set{v_i}_{i=1}^\infty\) is a
**Riesz basis**for Hilbert space \(\calH\) if \(\text{cl}(\text{span}(\set{v_i}_{i=1}^\infty))=\calH\) and there exists \(A,B>0\) such that \[ A\sum_{i=1}^\infty\alpha_i^2\leq \norm[\calH]{\sum_{i=1}^n\alpha_iv_i}^2\leq B\sum_{i=1}^\infty\alpha_i^2 \]**uniformly**for all sequences \(\set{\alpha_i}_{i\geq 1}\) with \(\sum_{i\geq 1}\alpha_i^2<\infty\). In infinite dimension, the existence of \(A,B>0\) is

**not**automatic.**Examples**

Computing expansion on Riesz basis not as simple in infinite dimension: Gram matrix is “infinite”

The Grammiam is a

**linear operator**\[ \calG:\ell_2(\bbZ)\to\ell_2(\bbZ): \bfx\mapsto \bfy\text{ with }[\calG(\bfx)]_n\eqdef y_n=\sum_{\ell=-\infty^\infty}\dotp{v_\ell}{v_n}x_\ell \]**Fact:**there exists another linear operator \(\calH:\ell_2(\bbZ)\to\ell_2(\bbZ)\) such that \[ \calH(\calG(\bfx)) = \bfx \] We can replicate what we did in finite dimension!

A fundamental problem in unsupervised machine learning can be cast as follows \[ \textsf{Given a dataset }\calD\eqdef\{(\vecx_i,y_i)\}_{i=1}^n\textsf{, how do we find $f$ such that $f(\bfx_i)\approx y_i$ for all }i\in\set{1,\cdots,n}? \]

- Often \(\vecx_i\in\bbR^d\), but sometimes \(\vecx_i\) is a weirder object (think tRNA string)
- if \(y_i\in\calY\subseteq\bbR\) with \(\card{\calY}<\infty\), the problem is called classification
- if \(y_i\in\calY=\bbR\), the problem is called
*regression*

We need to introduce several ingredients to make the question well defined

- We need a class \(\calF\) to which \(f\) should belong
- We need a loss function \(\ell:\bbR\times\bbR\to\bbR^+\) to measure the quality of our approximation

We can then formulate the question as \[ \min_{f\in\calF}\sum_{i=1}^n\ell(f(\bfx_i),y_i) \]

We will focus quite a bit on the

*square loss*\(\ell(u,v)\eqdef (u-v)^2\), called*least-square regression*

A classical choice of \(\calF\) is the set of continuous linear functions.

\(f:\bbR^d\to\bbR\) is

*linear*iff \[ \forall \bfx,\bfy\in\bbR^d,\lambda,\mu\in\bbR\quad f(\lambda\bfx + \mu\bfy) = \lambda f(\bfx)+\mu f(\bfy) \]We will see that every continuous linear function on \(\bbR^d\) is actually an inner product, i.e., \[ \exists \bftheta_f\in\bbR^d\textsf{ s.t. } f(\bfx)=\bftheta_f^\intercal\bfx \quad\forall \bfx\in\bbR^d \]

**Canonical form I**- Stack \(\bfx_i\) as row vectors into a matrix \(\bfX\in\bbR^{n\times d}\), stack \(y_i\) as elements of column vector \(\bfy\in\bbR^n\) \[ \min_{\bftheta\in\bbR^d} \norm[2]{\bfy-\matX\bftheta}^2\textsf{ with } \matX\eqdef\mat{c}{-\vecx_1^\intercal-\\\vdots\\-\vecx_n^\intercal-} \]

**Canonical form II**Allow for

*affine*functions (not just linear)Add a 1 to every \(\vecx_i\) \[ \min_{\bftheta\in\bbR^{d+1}} \norm[2]{\bfy-\matX\bftheta}^2\textsf{ with } \matX\eqdef\mat{c}{1-\vecx_1^\intercal-\\\vdots\\1-\vecx_n^\intercal-} \]

Let \(\calF\) be an \(d\)-dimensional subspace of a vector space with basis \(\set{\psi_i}_{i=1}^d\)

- We model \(f(\bfx) = \sum_{i=1}^d\theta_i\psi_i(\bfx)\)

The problem becomes \[ \min_{\bftheta\in\bbR^d}\norm[2]{\bfy-\boldsymbol{\Psi}\bftheta}^2\textsf{ with }\boldsymbol{\Psi}\eqdef \mat{c}{-\psi(\bfx_1)^\intercal-\\\vdots\\-\psi(\bfx_n)^\intercal-}\eqdef\mat{cccc}{\psi_1(\bfx_1)&\psi_2(\bfx_1)&\cdots&\psi_d(\bfx_1)\\ \vdots&\vdots&\vdots&\vdots\\ \psi_1(\bfx_n)&\psi_2(\bfx_n)&\cdots&\psi_d(\bfx_n) } \]

We are recovering a nonlinear function of a continuous variable

- This is the exact same computational framework as linear regression.