Regression

Logistics

Assignment 4 assigned tonight
- Includes a programming component
- Due October 13, 2021 (soft deadline, hard deadline on October 15)

Last time: Non-Orthobases
- Dual basis
Today
- Wrap up non-orthobases in infinite dimension
- Least-square regression
Reading: Romberg, lecture notes 7/8

${v_{i}}_{i = 1}^{\infty}$ is a Riesz basis for Hilbert space $H$ if $cl (span ({v_{i}}_{i = 1}^{\infty})) = H$ and there exists $A, B > 0$ such that $A \sum_{i = 1}^{\infty} α_{i}^{2} \leq {‖ \sum_{i = 1}^{n} α_{i} v_{i} ‖}_{H}^{2} \leq B \sum_{i = 1}^{\infty} α_{i}^{2}$ uniformly for all sequences ${α_{i}}_{i \geq 1}$ with $\sum_{i \geq 1} α_{i}^{2} < \infty$ .
In infinite dimension, the existence of $A, B > 0$ is not automatic.
Examples

Computing expansion on Riesz basis not as simple in infinite dimension: Gram matrix is “infinite”
The Grammiam is a linear operator $G : ℓ_{2} (Z) \to ℓ_{2} (Z) : x \mapsto y with [G (x)]_{n} ≜ y_{n} = \sum_{ℓ = - \infty^{\infty}} {⟨ v_{ℓ}, v_{n} ⟩}_{} x_{ℓ}$
Fact: there exists another linear operator $H : ℓ_{2} (Z) \to ℓ_{2} (Z)$ such that $H (G (x)) = x$ We can replicate what we did in finite dimension!

A fundamental problem in unsupervised machine learning can be cast as follows $Given a dataset D ≜ {(x_{i}, y_{i})}_{i = 1}^{n}, how do we find f such that f (x_{i}) \approx y_{i} for all i \in {1, \dots, n} ?$
- Often $x_{i} \in R^{d}$ , but sometimes $x_{i}$ is a weirder object (think tRNA string)
- if $y_{i} \in Y \subseteq R$ with $| Y | < \infty$ , the problem is called classification
- if $y_{i} \in Y = R$ , the problem is called regression
We need to introduce several ingredients to make the question well defined
1. We need a class $F$ to which $f$ should belong
2. We need a loss function $ℓ : R \times R \to R^{+}$ to measure the quality of our approximation
We can then formulate the question as $min_{f \in F} \sum_{i = 1}^{n} ℓ (f (x_{i}), y_{i})$
We will focus quite a bit on the square loss $ℓ (u, v) ≜ (u - v)^{2}$ , called least-square regression

A classical choice of $F$ is the set of continuous linear functions.
- $f : R^{d} \to R$ is linear iff $\forall x, y \in R^{d}, λ, μ \in R f (λ x + μ y) = λ f (x) + μ f (y)$
- We will see that every continuous linear function on $R^{d}$ is actually an inner product, i.e., $\exists θ_{f} \in R^{d} s.t. f (x) = θ_{f}^{⊺} x \forall x \in R^{d}$
Canonical form I
- Stack $x_{i}$ as row vectors into a matrix $X \in R^{n \times d}$ , stack $y_{i}$ as elements of column vector $y \in R^{n}$ $min_{θ \in R^{d}} {‖ y - X θ ‖}_{2}^{2} with X ≜ [\begin{matrix} - x_{1}^{⊺} - \\ ⋮ \\ - x_{n}^{⊺} - \end{matrix}]$

Canonical form II
- Allow for affine functions (not just linear)
- Add a 1 to every $x_{i}$ $min_{θ \in R^{d + 1}} {‖ y - X θ ‖}_{2}^{2} with X ≜ [\begin{matrix} 1 - x_{1}^{⊺} - \\ ⋮ \\ 1 - x_{n}^{⊺} - \end{matrix}]$

Let $F$ be an $d$ -dimensional subspace of a vector space with basis ${ψ_{i}}_{i = 1}^{d}$
- We model $f (x) = \sum_{i = 1}^{d} θ_{i} ψ_{i} (x)$
The problem becomes $min_{θ \in R^{d}} {‖ y - Ψ θ ‖}_{2}^{2} with Ψ ≜ [\begin{matrix} - ψ (x_{1})^{⊺} - \\ ⋮ \\ - ψ (x_{n})^{⊺} - \end{matrix}] ≜ [\begin{array}{cccc} ψ_{1} (x_{1}) & ψ_{2} (x_{1}) & \dots & ψ_{d} (x_{1}) \\ ⋮ & ⋮ & ⋮ & ⋮ \\ ψ_{1} (x_{n}) & ψ_{2} (x_{n}) & \dots & ψ_{d} (x_{n}) \end{array}]$
We are recovering a nonlinear function of a continuous variable
- This is the exact same computational framework as linear regression.