Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, November 18, 2024

Last time

Last class: Wednesday November 13, 2024
- We talked about minimum mean square estimation
- We reviewed important concepts in probability:
  - conditional expectation
  - towering law of expectation
  - Jensen's inequality
Today: We will talk about Gaussian estimation

To be effectively prepared for today's class, you should have:
1. Gone over slides and read associated lecture notes here
2. Worked on submit Homework 7 (due today!)
Logistics:
- Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
- Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
- Dr. Bloch office hours: Friday November 15, 2024 4:30pm-5:15pm online
Homework 8: due Wednesday November 27, 2024

What's next for this semester

~~Lecture 21 - Monday November 4, 2024: SVD and least squares~~

~~Lecture 22 - Wednesday November 6, 2024: Gradient descent~~
- ~~Homework 6 due on Thursday November 7, 2024~~
~~Lecture 23 - Monday November 11, 2024: Estimation~~

~~Lecture 24 - Wednesday November 13, 2024: Estimation~~

Lecture 25 - Monday November 18, 2024: more estimation

Lecture 26 - Wednesday November 20, 2024: Classification and Regression

Lecture 27 - Monday November 25, 2024: Principal Component Analysis
- Homework 8 due
Lecture 28 - Monday December 2, 2024: Principal Component Analysis

Final exam is coming

Start reviewing notes and exams
- Thanksgiving is next week, exam the week after
- Reminder: final is on Friday December 6, 2024 2:40pm-5:30pm
  - see https://registrar.gatech.edu/info/final-exam-matrix-fall-2024
We will be very available for help and review sessions…
- …but not last minute.
- Try to plan you studies accordingly
- Use Piazza

Gaussian estimation

Consider a Gaussian random vector $X \sim N (0, R)$ , i.e., $p (x) = \frac{1}{(2 π)^{n / 2} \sqrt{det R}} \exp (- x^{T} R^{- 1} x)$
Assume that we we write $X = [\begin{matrix} X_{o} \\ X_{h} \end{matrix}] R = [\begin{array}{cc} R_{o} & R_{o h} \\ R_{o h}^{T} & R_{h} \end{array}]$
- We observe $X_{o} = x_{o}$
- what is the conditional density of $X_{h} | X_{o} = x_{o}$

The conditional density of $X_{h} | X_{o} = x_{o}$ is a Normal distribution with mean and covariance matrix $μ = R_{o h}^{T} R_{o}^{- 1} x_{o}$ $Σ = R_{h} - R_{o h}^{T} R_{o}^{- 1} R_{o h}$

Gaussian random processes

A Gaussian random process is a collection ${X (t) : t \in T}$ where $T \subset R^{d}$ characterized by

a mean function $μ : T \to R : t \mapsto μ (t)$
a covariance function $r : T \times T \to R : (t, τ) \mapsto r (t, τ)$ where $r$ is a PSD kernel

such that for any $t = {t_{i}}_{i = 1}^{n}$ , ${X (t_{i})}_{i = 1}^{n} \sim N (\underset{―}{μ}, R) \underset{―}{μ} = [\begin{matrix} μ (t_{1}) \\ | \\ μ (t_{n}) \end{matrix}] R = {[r (t_{i}, t_{j})]}_{1 \leq i, j \leq n}$

Maximum Likelihood Estimation

We consider the different but related problem, of parameter estimation
- We assume that $X \sim p_{X} (x; θ)$ for some unknown parameter $θ$
- The goal is to estimate $θ$ from samples of $X$

For a probabilistic model governing the distributions of samples ${x_{i}}_{i = 1}^{n}$ , the likelihood function is $L (θ; x_{1}, \dots, x_{n}) ≜ p_{X_{1} \dots X_{n}} (x_{1}, \dots, x_{n}; θ) .$ It is often convenient to work with a log-likelihood $ℓ (θ; x_{1}, \dots, x_{n}) = \log L (θ; x_{1}, \dots, x_{n})$ .

The maximum likelihood estimateis of $θ$ is ${\hat{θ}}_{MLE} ≜ \underset{θ}{argmax} L (θ; x_{1}, \dots, x_{n})$

Next time

Next class: Wednesday November 20, 2024

To be effectively prepared for next class, you should:
1. Go over today's slides and read associated lecture notes here
2. Work on Homework 7
Optional
- Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)