Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Wednesday, October 16, 2024 (v1.0)

Last time

Last class: Monday October 7, 2024
- We wrapped up our discussion of RKHS
- Homework 4 will reinforce the concepts
- We started going back to matrix concepts
Today: We will talk about the spectral theorem

To be effectively prepared for today's class, you should have:
1. Relaxed a bit over the fall break
2. Gone over slides
Logistics: use office hours to prepare for Homework 4
- Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
- Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
- Dr. Bloch: Sunday October 20, 2024 11am-12pm.
Homework 4: due Sunday October 20.
- Don't start accumulating delays, homework 5 is coming
- However, the pace of subsequent homework will be faster (~1/week)

Midterm

Grading not started
- We'll get the grades back before the withdrawal date
- Withdrawal deadline: Saturday October 26, 2024 at 4:00 pm
Candid feedback I've received
- Too long given allotted time (5 mins/question)
- Too different from homework
Good opportunity to reflect
- Is your use of additional resources (ChatGPT, StackOverflow, etc.) helping
- Are you making good use of resources available to you?
  - In-person lecture time
  - Hybrid office hours
  - Q&A on Piazza

Change of basis

Consider the canonical basis \(\set{e_i}_{i=1}^n\) for \(\bbR^n\); every vector can be viewed as coefficients \(\set{\alpha_i}_{i=1}^n\), \[ \bfx = \sum_{i=1}^n \alpha_i e_i = \mat{cccc}{\alpha_1&\alpha_2&\cdots&\alpha_n}^\intercal \]
How do we find the representation of \(\bfx\) in another basis \(\set{v_i}_{i=1}^n\)? Write \(e_i=\sum_{j=1}^n\beta_{ij}v_j\)
Regroup the coefficients \[ \bfx = \cdots + \left(\sum_{i=1}^n\beta_{ij}\alpha_i\right) v_j + \cdots \]
In matrix form \[ \bfx_{\text{new}} = \mat{cccc}{\beta_{11}&\beta_{21}&\cdots&\beta_{n1}\\ \beta_{12}&\beta_{22}&\cdots&\beta_{n2}\\\vdots&\vdots&\vdots&\vdots\\\beta_{1n}&\beta_{2n}&\cdots&\beta_{nn}}\bfx \]

Similarity

A change of basis matrix \(\matP\) is full rank (basis vectors are linearly independent)

Any full rank matrix \(\matP\) can be viewed as a change of basis

\(\matP^{-1}\) takes you back to the original basis

Warning: the columns of \(\bfP\) describe the old coordinates as a function of the new ones

If \(\matA,\bfB\in\bbR^{n\times n}\) then \(\bfB\) is similar to \(\bfA\) if there exists an invertible matrix \(\bfP\in\bbR^{n\times n}\) such that \(\bfB=\bfP^{-1}\bfA\bfP\)

Intuition: similar matrices are the same up to a change of basis

\(\matA\in\bbR^{n\times n}\) is diagonalizable if it is similar to a diagonal matrix, i.e., there exists an invertible matrix \(\bfP\in\bbR^{n\times n}\) such that \(\bfD=\bfP^{-1}\bfA\bfP\) with \(\matD\) diagonal

Not all matrices are diagonalizable!

Spectral theorem

Every complex matrix \(\matA\) has at least one complex eigenvector and every real symmetrix matrix has real eigenvalues and at least one real eigenvector.

Every matrix \(\matA\in\bbC^{n\times n}\) is unitarily similar to an upper triangular matrix, i.e., \[ \bfA = \bfV\boldsymbol{\Delta}\bfV^\dagger \] with \(\boldsymbol{\Delta}\) upper triangular and \(\bfV^\dagger=\bfV^{-1}\).

Every hermitian matrix is unitarily similar to a real-valued diagonal matrix.

Note that if \(\matA = \matV\matD\matV^\dagger\) then \(\matA = \sum_{i=1}^n\lambda_i \vecv_i\vecv_i^\dagger\)

How about real-valued matrices \(\matA\in\bbR^{n\times n}\)?

Symmetric positive definite matrices

A symmetric matrice \(\matA\) is positive definite if it has positive eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\quad\lambda_i>0\).

A symmetric matrice \(\matA\) is positive semidefinite if it has nonnegative eigenvalues, i.e., \(\forall i\in\set{1,\cdots,n}\quad\lambda_i\geq 0\).

Convention: \(\lambda_1\geq \lambda_2\geq \cdots \geq \lambda_n\)

Variational form of extreme eigenvalues for symmetric positive definite matrices \(\bfA\) \[ \lambda_1 = \max_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \max_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2}\qquad \lambda_n = \min_{\vecx\in\bbR^n:\norm[2]{\bfx}=1}\vecx^\intercal \matA\vecx = \min_{\vecx\in\bbR^n}\frac{\vecx^\intercal \matA\vecx}{\norm[2]{\vecx}^2} \]

For any analytic function \(f\), we have \[ f(\matA) = \sum_{i=1}^n f(\lambda_i)\vecv_i\vecv_i^\intercal \]

System of symmetric definite equations

Consider the system \(\vecy=\matA\vecx\) with \(\matA\) symmetric positive definite

Let \(\set{\vecv_i}\) be the eigenvectors of \(\matA\). \[ \vecx = \sum_{i=1}^n\frac{1}{\lambda_i}\dotp{\vecy}{\vecv_i}\vecv_i \]

Assume that there exists some observation error \(\vecy=\matA\vecx+\vece\)
- \(\vece\) is unknown
- we try to reconstruct \(\vecx\) as \(\widetilde{\vecx}\) by applying \(\matA^{-1}\)

\[ \frac{1}{\lambda_1^2}\norm{\vece}^2\leq \norm[2]{\vecx-\tilde{\vecx}}\leq \frac{1}{\lambda_n^2}\norm{\vece}^2. \]

What about the average case?

Next time

Next class: Monday October 21, 2024

To be effectively prepared for next class, you should:
1. Go over today's slides and read associated lecture notes here
2. Submit Homework 4 on Gradescope
Optional
- Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)