Dr. Matthieu R Bloch

Monday, November 29, 2021

**General announcements**Assignment 6 posted (

*last assignment*)Due December 7, 2021 for bonus, deadline December 10, 2021

3 lectures left

Let me know what’s missing

**Midterm 2 / Assignment 5**- Grades posted this week

**Last time**:- Numerical considerations

**Today:**- (Fast discussion) of additional numerical considerations

**Reading:**lecture notes 14/15/16

**Diagonal system**- \(\bfA\in\bbR^{n\times n}\) invertible and diagonal
- \(O(n)\) complexity

**Orthogonal system**- \(\bfA\in\bbR^{n\times n}\) invertible and orthogonal
- \(O(n^2)\) complexity

**Lower triangular system**- \(\bfA\in\bbR^{n\times n}\) invertible and lower diagonal
- \(O(n^2)\) complexity

**General strategy:**factorize \(\matA\) to recover some of the structures above

**LU factorization****Cholesky factorization****QR decomposition****SVD and eigenvalue decompositions**

- Many techniques: we shall only discuss one based on
*power iterations*

- An
*unknown function*\(f:\calX\to\calY:\bfx\mapsto y=f(\bfx)\) to learn- The formula to distinguish cats from dogs

- A dataset \(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)
- \(\bfx_i\in\calX\eqdef\bbR^d\): picture of cat/dog
- \(y_i\in\calY\eqdef\bbR\): the corresponding label cat/dog

- A
*set of hypotheses*\(\calH\) as to what the function could be- Example: deep neural nets with AlexNet architecture

- An
*algorithm*\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\)

**Terminology:**- \(\calY=\bbR\):
*regression*problem - \(\card{\calY}<\infty\):
*classification*problem - \(\card{\calY}=2\):
*binary classification*problem

- \(\calY=\bbR\):
- The goal is to
*generalize*, i.e., be able to classify inputs we have*not*seen.

- Learning seems
*impossible*without additional assumptions!

Flip a biased coin, lands on head with

*unknown*probability \(p\in[0,1]\)\(\P{\text{head}}=p\) and \(\P{\text{tail}}=1-p\)

Say we flip the coin \(N\) times, can we estimate \(p\)?

\[ \hat{p} = \frac{\text{# head}}{N} \]

Can we relate \(\hat{p}\) to \(p\)?

- The law of large numbers tells us that \(\hat{p}\) converges in probability to \(p\) as \(N\) gets large \[ \forall\epsilon>0\quad\P{\abs{\hat{p}-p}>\epsilon}\mathop{\longrightarrow}_{N\to\infty} 0. \]

It is

*possible*that \(\hat{p}\) is completely off but it is not*probable*

An

*unknown function*\(f:\calX\to\calY:\bfx\mapsto y=f(\bfx)\) to learnA

*dataset*\(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)- \(\{\bfx_i\}_{i=1}^N\) i.i.d.
*from unknown distribution*\(P_{\bfx}\) on \(\calX\) - \(\{y_i\}_{i=1}^N\) are the corresponding labels \(y_i\in\calY\eqdef\bbR\)

- \(\{\bfx_i\}_{i=1}^N\) i.i.d.
A

*set of hypotheses*\(\calH\) as to what the function could beAn

*algorithm*\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\)

::: nonincremental

An

*unknown conditional distribution*\(P_{y|\bfx}\) to learn- \(P_{y|\bfx}\) models \(f:\calX\to\calY\)
*with noise*

- \(P_{y|\bfx}\) models \(f:\calX\to\calY\)
A

*dataset*\(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)- \(\{\bfx_i\}_{i=1}^N\) i.i.d. from distribution \(P_{\bfx}\) on \(\calX\)
- \(\{y_i\}_{i=1}^N\) are the corresponding labels \(y_i\sim P_{y|\bfx=\bfx_i}\)

A

*set of hypotheses*\(\calH\) as to what the function could beAn

*algorithm*\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\) :::

- The roles of \(P_{y|\bfx}\) and \(P_{\bfx}\) are
*different*- \(P_{y|\bfx}\) is what we want to learn, captures the underlying function and the noise added to it
- \(P_{\bfx}\) models
*sampling*of dataset, need*not*be learned

- Assume that you are designing a fingerprint authentication system
- You trained your system with a fancy machine learning system
- The probability of wrongly authenticating is
*1%* - The probability of correctly authenticating is
*60%* - Is this a good system?

- It depends!
- If you are GTRI, this might be ok (security matters more)
- If you are Apple, this is not acceptable (convenience matters more)

- There is an application dependent
*cost*that can affect the design

A

*dataset*\(\calD\eqdef\{(\bfx_1,y_1),\cdots,(\bfx_N,y_N)\}\)- \(\{\bfx_i\}_{i=1}^N\) i.i.d. from an unknown distribution \(P_{\bfx}\) on \(\calX\)

An

*unknown conditional distribution*\(P_{y|\bfx}\)- \(P_{y|\bfx}\) models \(f:\calX\to\calY\)
*with noise* - \(\{y_i\}_{i=1}^N\) are the corresponding labels \(y_i\sim P_{y|\bfx=\bfx_i}\)

- \(P_{y|\bfx}\) models \(f:\calX\to\calY\)
A

*set of hypotheses*\(\calH\) as to what the function could beA

*loss function*\(\ell:\calY\times\calY\rightarrow\bbR^+\) capturing the “cost” of predictionAn

*algorithm*\(\texttt{ALG}\) to find the best \(h\in\calH\) that explains \(f\)