Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, December 2, 2024

Last time

Last class: Monday November 25, 2024
- We talked about Maximum likelihood estimation
Today: We will talk about principal component analysis
- Great way to end the course where we started
To be effectively prepared for today's class, you should have:
1. Gone over slides and read associated lecture notes here and there and there
2. Submitted Homework 8

Final exam is coming

Review all notes and exams: exam is comprehensive
- Friday December 6, 2024 2:40pm-5:30pm (see https://registrar.gatech.edu/info/final-exam-matrix-fall-2024)
- In class and open notes
- Same rules as for the midterm with electronics
Frequently asked questions
- How many problems on the midterm?
- How many questions on the midterm?
- What topics will the midterm cover?
- Do you have sample exams?
- Can we do additional work for extra credit?
- Do you give bonuses for participation or for CIOS completion?
We will be very available for help and review sessions
- Tuesday December 03, 2024 12pm: Anuvab (hybrid)
- Wednesday December 04, 2024 9am: Dr. Bloch (online)
- Wednesday December 04, 2024 11:30am: Jack (online)

Back to first lecture: Kernel PCA

tRNA (transfer RNA): plays a key role in the creation of amino acid sequence of proteins (source)
- G G G G A A T T A G C T C A A G C G G T A G A G C G …
Challenge: compare, classify, analyze, visualize sequences
- Datasets of tRNA sequences ${x_{i}}_{i = 1}^{n}$
Lots happening behind the scene
- What does it mean to represent the data in 2D?
- How do we measure distances between tRNA sequences?
- We can explain a lot now!

Principal Component Analysis

Feature extraction methods based: unsupervised, linear, based on sum of square errors

Idea is to find approximation of data as $x_{i} \approx μ + A θ_{i} with μ \in R^{d}, A \in R^{d \times k}, θ_{i} \in R^{k}$ and $A$ has orthonormal columns

Principal Component Analysis consists in solving the problem $\underset{μ, A, θ_{i}}{argmin} \sum_{i = 1}^{N} {‖ x_{i} - μ - A θ_{i} ‖}_{2}^{2}$

Hard part is finding $A$

Given $A$ , relatively easy to find $θ_{i}$ and $μ$

Solving PCA

Assume that $μ$ and $A$ are fixed. Then, $θ_{i} = A^{⊺} (x_{i} - μ)$

Assume $A$ is fixed and $θ_{i} = A^{⊺} (x_{i} - μ)$ . Then, $μ = \frac{1}{N} \sum_{i = 1}^{N} x_{i}$

Solving PCA

One possible choice of $A$ is $A = [u_{1}, \dots, u_{k}]$ where $u_{i}$ 's are the eigenvectors corresponding to the $k$ largest eigenvalues of $S ≜ \sum_{i = 1}^{N} x_{i} x_{i}^{⊺}$

Proof steps
- Step 1: introduce $S ≜ \sum_{i = 1}^{N} x_{i} x_{i}^{⊺} = X X^{⊺}$
- Step 2: introduce linear program
- Step 3: solve linear program
Connection to SVD $X = U Σ V^{†}$ where columns of $X$ are $x_{i} - μ$ : $X = \underset{U}{\underset{⏟}{[\begin{array}{ccc} | \\ A \\ | \end{array}]}} \underset{Σ V^{†}}{\underset{⏟}{[\begin{matrix} - Θ - \end{matrix}]}}$