Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, November 18, 2024

Last time

  • Last class: Wednesday November 13, 2024
    • We talked about minimum mean square estimation
    • We reviewed important concepts in probability:
      • conditional expectation
      • towering law of expectation
      • Jensen's inequality
  • Today: We will talk about Gaussian estimation
  • To be effectively prepared for today's class, you should have:
    1. Gone over slides and read associated lecture notes here
    2. Worked on submit Homework 7 (due today!)
  • Logistics:
    • Jack Hill office hours: Wednesday 11:30am-12:30pm in TSRB and hybrid
    • Anuvab Sen office hours: Thursday 12pm-1pm in TSRB and hybrid
    • Dr. Bloch office hours: Friday November 15, 2024 4:30pm-5:15pm online
  • Homework 8: due Wednesday November 27, 2024

What's next for this semester

  • Lecture 21 - Monday November 4, 2024: SVD and least squares
  • Lecture 22 - Wednesday November 6, 2024: Gradient descent
    • Homework 6 due on Thursday November 7, 2024
  • Lecture 23 - Monday November 11, 2024: Estimation
  • Lecture 24 - Wednesday November 13, 2024: Estimation
  • Lecture 25 - Monday November 18, 2024: more estimation
  • Lecture 26 - Wednesday November 20, 2024: Classification and Regression
  • Lecture 27 - Monday November 25, 2024: Principal Component Analysis
    • Homework 8 due
  • Lecture 28 - Monday December 2, 2024: Principal Component Analysis

Final exam is coming

  • Start reviewing notes and exams
  • We will be very available for help and review sessions…
    • …but not last minute.
    • Try to plan you studies accordingly
    • Use Piazza

Gaussian estimation

  • Consider a Gaussian random vector \(\bfX\sim\calN(\mathbf{0},\matR)\), i.e., \[ p(\vecx) = \frac{1}{(2\pi)^{n/2}\sqrt{\det{\matR}}}\exp\left(-\bfx^T\matR^{-1}\bfx\right) \]

  • Assume that we we write \[ \bfX = \left[\begin{array}{c}\bfX_o\\\bfX_h\end{array}\right]\qquad\matR = \left[\begin{array}{cc}\bfR_o&\matR_{oh}\\ \matR_{oh}^T&\matR_{h}\end{array}\right] \]

    • We observe \(\bfX_o=\vecx_o\)
    • what is the conditional density of \(\matX_h|\matX_o=\vecx_o\)

The conditional density of \(\matX_h|\matX_o=\vecx_o\) is a Normal distribution with mean and covariance matrix \[ \bfmu = \matR_{oh}^T\matR_o^{-1}\vecx_o \] \[ \mathbf{\Sigma} = \matR_h - \matR_{oh}^T\matR_o^{-1}\matR_{oh} \]

Gaussian random processes

A Gaussian random process is a collection \(\{X(\vect):\vect\in\calT\}\) where \(\calT\subset\bbR^d\) characterized by

  1. a mean function \(\mu:\calT\to\bbR:\vect\mapsto \mu(\vect)\)
  2. a covariance function \(r:\calT\times\calT\to\bbR:(\vect,\mathbf{\tau})\mapsto r(\vect,\mathbf{\tau})\) where \(r\) is a PSD kernel

such that for any \(\vect=\set{\vect_i}_{i=1}^n\), \[ \set{X(\vect_i)}_{i=1}^n\sim\calN(\mathbf{\underline{\mu}},\matR)\quad\underline{\mu}=\left[\begin{array}{c}\mu(\vect_1)\\|\\\mu(\vect_n)\end{array}\right]\quad \matR=\left[r(\vect_i,\vect_j)\right]_{1\leq i,j\leq n} \]

Maximum Likelihood Estimation

  • We consider the different but related problem, of parameter estimation
    • We assume that \(X\sim p_{X}(x;\theta)\) for some unknown parameter \(\theta\)
    • The goal is to estimate \(\theta\) from samples of \(X\)

For a probabilistic model governing the distributions of samples \(\set{x_i}_{i=1}^n\), the likelihood function is \[ L(\theta;x_1,\cdots,x_n)\eqdef p_{X_1\cdots X_n}(x_1,\cdots,x_n;\theta). \] It is often convenient to work with a log-likelihood \(\ell(\theta;x_1,\cdots,x_n)=\log L(\theta;x_1,\cdots,x_n)\).

The maximum likelihood estimateis of \(\theta\) is \[ \hat{\theta}_{\textnormal{MLE}} \eqdef \argmax_{\theta}L(\theta;x_1,\cdots,x_n) \]

Next time

  • Next class: Wednesday November 20, 2024
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes here
    2. Work on Homework 7
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)