Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday, November 25, 2024

Last time

  • Last class: Wednesday November 20, 2024
    • We talked about Gaussian estimation and Gaussian Processes
  • Today: We will talk about maximum likelihood estimation
  • To be effectively prepared for today's class, you should have:
    1. Gone over slides and read associated lecture notes here
    2. Worked on Homework 8
  • Homework 8: due Wednesday November 27, 2024

What's next for this semester

  • Lecture 21 - Monday November 4, 2024: SVD and least squares
  • Lecture 22 - Wednesday November 6, 2024: Gradient descent
    • Homework 6 due on Thursday November 7, 2024
  • Lecture 23 - Monday November 11, 2024: Estimation
  • Lecture 24 - Wednesday November 13, 2024: Estimation
  • Lecture 25 - Monday November 18, 2024: more estimation
  • Lecture 26 - Wednesday November 20, 2024: even more estimation
  • Lecture 27 - Monday November 25, 2024: Estimation again
  • Lecture 28 - Monday December 2, 2024: Principal Component Analysis

Final exam is coming

  • Start reviewing notes and exams: exam is comprehensive
  • We will be very available for help and review sessions
    • Monday, November 25, 2024: Jack
    • Tuesday November 26, 2024: Anuvab
    • Wednesday November 27, 2024: Dr. Bloch
    • More ahead of the final next week
  • Updates on your standing
    • Expect email in your inbox with grades up to and including Homework 7
    • I will try to include projections

Maximum Likelihood Estimation

  • We consider the different but related problem, of parameter estimation
    • We assume that \(X\sim p_{X}(x;\theta)\) for some unknown parameter \(\theta\)
    • The goal is to estimate \(\theta\) from samples of \(X\)

For a probabilistic model governing the distributions of samples \(\set{x_i}_{i=1}^n\), the likelihood function is \[ L(\theta;x_1,\cdots,x_n)\eqdef p_{X_1\cdots X_n}(x_1,\cdots,x_n;\theta). \] It is often convenient to work with a log-likelihood \(\ell(\theta;x_1,\cdots,x_n)=\log L(\theta;x_1,\cdots,x_n)\).

  • The maximum likelihood estimate of \(\theta\) is

\[ \hat{\theta}_{\textnormal{MLE}} \eqdef \argmax_{\theta}L(\theta;x_1,\cdots,x_n) \]

Properties of Estimators

  • There are three important properties of estimator that we will discuss:
    1. Bias
    2. Consistency
    3. Efficiency

An estimator \(\hat{\theta}\) of \(\theta_0\in\calT\) has bias \(\E{\hat{\theta}}-\theta_0\). The estimator is unbiased if the bias is zero for all \(\theta_0\in\calT\)

An estimator \(\hat{\theta}_n\) of \(\theta_0\in\calT\) using \(n\) observations \(x_1,\cdots,x_n\) is consistent if for every \(\epsilon>0\) and \(\delta\in(0;1)\) \[ \lim_{n\to\infty}\P{\abs{\hat{\theta}_n-\theta_0}>\epsilon}\leq \delta. \]

Next time

  • Next class: Monday December 02, 2024 (last class)
    • We will talk about Kernel PCA!
    • This will close the loop on everything we've done (RKHS, SVD, etc.)
  • To be effectively prepared for next class, you should:
    1. Go over today's slides and read associated lecture notes here and there and there
    2. Work on Homework 7
  • Optional
    • Export slides for next lecture as PDF (be on the lookout for an announcement when they're ready)