Due December 7, 2021 for bonus, deadline December 10, 2021
3 lectures left
Let me know what’s missing
Midterm 2 / Assignment 5
Grades posted this week
What’s on the agenda for today?
Last time:
Numerical considerations
Today:
(Fast discussion) of additional numerical considerations
Reading: lecture notes 14/15/16
Easy systems
Diagonal system
\(\bfA\in\bbR^{n\times n}\) invertible and diagonal
\(O(n)\) complexity
Orthogonal system
\(\bfA\in\bbR^{n\times n}\) invertible and orthogonal
\(O(n^2)\) complexity
Lower triangular system
\(\bfA\in\bbR^{n\times n}\) invertible and lower diagonal
\(O(n^2)\) complexity
General strategy: factorize \(\matA\) to recover some of the structures above
Factorizations
LU factorization
Cholesky factorization
QR decomposition
SVD and eigenvalue decompositions
Computing eigenvalue decompositions for symetric matrices
Many techniques: we shall only discuss one based on power iterations
Components of supervised machine learning
An unknown function\(f:\calX\to\calY:\bfx\mapsto y=f(\bfx)\) to learn
The formula to distinguish cats from dogs
A dataset \(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)
\(\bfx_i\in\calX\eqdef\bbR^d\): picture of cat/dog
\(y_i\in\calY\eqdef\bbR\): the corresponding label cat/dog
A set of hypotheses\(\calH\) as to what the function could be
Example: deep neural nets with AlexNet architecture
An algorithm\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\)
Terminology:
\(\calY=\bbR\): regression problem
\(\card{\calY}<\infty\): classification problem
\(\card{\calY}=2\): binary classification problem
The goal is to generalize, i.e., be able to classify inputs we have not seen.
A learning puzzle
Learning seems impossible without additional assumptions!
Possible vs probable
Flip a biased coin, lands on head with unknown probability \(p\in[0,1]\)
\(\P{\text{head}}=p\) and \(\P{\text{tail}}=1-p\)
Say we flip the coin \(N\) times, can we estimate \(p\)?
\[ \hat{p} = \frac{\text{# head}}{N} \]
Can we relate \(\hat{p}\) to \(p\)?
The law of large numbers tells us that \(\hat{p}\) converges in probability to \(p\) as \(N\) gets large \[
\forall\epsilon>0\quad\P{\abs{\hat{p}-p}>\epsilon}\mathop{\longrightarrow}_{N\to\infty} 0.
\]
It is possible that \(\hat{p}\) is completely off but it is not probable
Components of supervised machine learning
An unknown function\(f:\calX\to\calY:\bfx\mapsto y=f(\bfx)\) to learn
A dataset\(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)
\(\{\bfx_i\}_{i=1}^N\) i.i.d. from unknown distribution\(P_{\bfx}\) on \(\calX\)
\(\{y_i\}_{i=1}^N\) are the corresponding labels \(y_i\in\calY\eqdef\bbR\)
A set of hypotheses\(\calH\) as to what the function could be
An algorithm\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\)
Another learning puzzle
Components of supervised machine learning
::: nonincremental
An unknown conditional distribution\(P_{y|\bfx}\) to learn