# Regression

Monday October 04, 2021

## Logistics

• Assignment 4 assigned tonight

• Includes a programming component

• Due October 13, 2021 (soft deadline, hard deadline on October 15)

## What’s on the agenda for today?

• Last time: Non-Orthobases

• Dual basis
• Today

• Wrap up non-orthobases in infinite dimension
• Least-square regression
• Reading: Romberg, lecture notes 7/8

## Non-orthogonal bases in infinite dimension

• $\set{v_i}_{i=1}^\infty$ is a Riesz basis for Hilbert space $\calH$ if $\text{cl}(\text{span}(\set{v_i}_{i=1}^\infty))=\calH$ and there exists $A,B>0$ such that $A\sum_{i=1}^\infty\alpha_i^2\leq \norm[\calH]{\sum_{i=1}^n\alpha_iv_i}^2\leq B\sum_{i=1}^\infty\alpha_i^2$ uniformly for all sequences $\set{\alpha_i}_{i\geq 1}$ with $\sum_{i\geq 1}\alpha_i^2<\infty$.
• In infinite dimension, the existence of $A,B>0$ is not automatic.

• Examples

## Non-orthogonal bases in finite dimension: dual basis

• Computing expansion on Riesz basis not as simple in infinite dimension: Gram matrix is “infinite”

• The Grammiam is a linear operator $\calG:\ell_2(\bbZ)\to\ell_2(\bbZ): \bfx\mapsto \bfy\text{ with }[\calG(\bfx)]_n\eqdef y_n=\sum_{\ell=-\infty^\infty}\dotp{v_\ell}{v_n}x_\ell$

• Fact: there exists another linear operator $\calH:\ell_2(\bbZ)\to\ell_2(\bbZ)$ such that $\calH(\calG(\bfx)) = \bfx$ We can replicate what we did in finite dimension!

## Regression

• A fundamental problem in unsupervised machine learning can be cast as follows $\textsf{Given a dataset }\calD\eqdef\{(\vecx_i,y_i)\}_{i=1}^n\textsf{, how do we find f such that f(\bfx_i)\approx y_i for all }i\in\set{1,\cdots,n}?$

• Often $\vecx_i\in\bbR^d$, but sometimes $\vecx_i$ is a weirder object (think tRNA string)
• if $y_i\in\calY\subseteq\bbR$ with $\card{\calY}<\infty$, the problem is called classification
• if $y_i\in\calY=\bbR$, the problem is called regression
• We need to introduce several ingredients to make the question well defined

1. We need a class $\calF$ to which $f$ should belong
2. We need a loss function $\ell:\bbR\times\bbR\to\bbR^+$ to measure the quality of our approximation
• We can then formulate the question as $\min_{f\in\calF}\sum_{i=1}^n\ell(f(\bfx_i),y_i)$

• We will focus quite a bit on the square loss $\ell(u,v)\eqdef (u-v)^2$, called least-square regression

## Least square linear regression

• A classical choice of $\calF$ is the set of continuous linear functions.

• $f:\bbR^d\to\bbR$ is linear iff $\forall \bfx,\bfy\in\bbR^d,\lambda,\mu\in\bbR\quad f(\lambda\bfx + \mu\bfy) = \lambda f(\bfx)+\mu f(\bfy)$

• We will see that every continuous linear function on $\bbR^d$ is actually an inner product, i.e., $\exists \bftheta_f\in\bbR^d\textsf{ s.t. } f(\bfx)=\bftheta_f^\intercal\bfx \quad\forall \bfx\in\bbR^d$

• Canonical form I

• Stack $\bfx_i$ as row vectors into a matrix $\bfX\in\bbR^{n\times d}$, stack $y_i$ as elements of column vector $\bfy\in\bbR^n$ $\min_{\bftheta\in\bbR^d} \norm[2]{\bfy-\matX\bftheta}^2\textsf{ with } \matX\eqdef\mat{c}{-\vecx_1^\intercal-\\\vdots\\-\vecx_n^\intercal-}$

## Least square affine regression

• Canonical form II

• Allow for affine functions (not just linear)

• Add a 1 to every $\vecx_i$ $\min_{\bftheta\in\bbR^{d+1}} \norm[2]{\bfy-\matX\bftheta}^2\textsf{ with } \matX\eqdef\mat{c}{1-\vecx_1^\intercal-\\\vdots\\1-\vecx_n^\intercal-}$

## Nonlinear regression using a basis

• Let $\calF$ be an $d$-dimensional subspace of a vector space with basis $\set{\psi_i}_{i=1}^d$

• We model $f(\bfx) = \sum_{i=1}^d\theta_i\psi_i(\bfx)$
• The problem becomes $\min_{\bftheta\in\bbR^d}\norm[2]{\bfy-\boldsymbol{\Psi}\bftheta}^2\textsf{ with }\boldsymbol{\Psi}\eqdef \mat{c}{-\psi(\bfx_1)^\intercal-\\\vdots\\-\psi(\bfx_n)^\intercal-}\eqdef\mat{cccc}{\psi_1(\bfx_1)&\psi_2(\bfx_1)&\cdots&\psi_d(\bfx_1)\\ \vdots&\vdots&\vdots&\vdots\\ \psi_1(\bfx_n)&\psi_2(\bfx_n)&\cdots&\psi_d(\bfx_n) }$

• We are recovering a nonlinear function of a continuous variable

• This is the exact same computational framework as linear regression.