Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, November 25, 2024

Last time

Last class: Wednesday November 20, 2024
- We talked about Gaussian estimation and Gaussian Processes
Today: We will talk about maximum likelihood estimation

To be effectively prepared for today's class, you should have:
1. Gone over slides and read associated lecture notes here
2. Worked on Homework 8
Homework 8: due Wednesday November 27, 2024

What's next for this semester

~~Lecture 21 - Monday November 4, 2024: SVD and least squares~~

~~Lecture 22 - Wednesday November 6, 2024: Gradient descent~~
- ~~Homework 6 due on Thursday November 7, 2024~~
~~Lecture 23 - Monday November 11, 2024: Estimation~~

~~Lecture 24 - Wednesday November 13, 2024: Estimation~~

Lecture 25 - Monday November 18, 2024: more estimation

Lecture 26 - Wednesday November 20, 2024: even more estimation

Lecture 27 - Monday November 25, 2024: Estimation again

Lecture 28 - Monday December 2, 2024: Principal Component Analysis

Final exam is coming

Start reviewing notes and exams: exam is comprehensive
- Thanksgiving is this week, exam next week
- Reminder: final is on Friday December 6, 2024 2:40pm-5:30pm
  - see https://registrar.gatech.edu/info/final-exam-matrix-fall-2024
We will be very available for help and review sessions
- Monday, November 25, 2024: Jack
- Tuesday November 26, 2024: Anuvab
- Wednesday November 27, 2024: Dr. Bloch
- More ahead of the final next week
Updates on your standing
- Expect email in your inbox with grades up to and including Homework 7
- I will try to include projections

Maximum Likelihood Estimation

We consider the different but related problem, of parameter estimation
- We assume that $X \sim p_{X} (x; θ)$ for some unknown parameter $θ$
- The goal is to estimate $θ$ from samples of $X$

For a probabilistic model governing the distributions of samples ${x_{i}}_{i = 1}^{n}$ , the likelihood function is $L (θ; x_{1}, \dots, x_{n}) ≜ p_{X_{1} \dots X_{n}} (x_{1}, \dots, x_{n}; θ) .$ It is often convenient to work with a log-likelihood $ℓ (θ; x_{1}, \dots, x_{n}) = \log L (θ; x_{1}, \dots, x_{n})$ .

The maximum likelihood estimate of $θ$ is

${\hat{θ}}_{MLE} ≜ \underset{θ}{argmax} L (θ; x_{1}, \dots, x_{n})$

Properties of Estimators

There are three important properties of estimator that we will discuss:
1. Bias
2. Consistency
3. Efficiency

An estimator $\hat{θ}$ of $θ_{0} \in T$ has bias $E_{} [\hat{θ}] - θ_{0}$ . The estimator is unbiased if the bias is zero for all $θ_{0} \in T$

An estimator ${\hat{θ}}_{n}$ of $θ_{0} \in T$ using $n$ observations $x_{1}, \dots, x_{n}$ is consistent if for every $ϵ > 0$ and $δ \in (0; 1)$ $lim_{n \to \infty} P_{} (| {\hat{θ}}_{n} - θ_{0} | > ϵ) \leq δ .$

Next time

Next class: Monday December 02, 2024 (last class)
- We will talk about Kernel PCA!
- This will close the loop on everything we've done (RKHS, SVD, etc.)
To be effectively prepared for next class, you should:
1. Go over today's slides and read associated lecture notes here and there and there
2. Work on Homework 7
Optional
- Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)