Mathematical Foundations of Machine Learning

Dr. Matthieu R Bloch

Monday August 23, 2021

Motivating example: Kernel PCA

Image
Kin et al., Genome Informatics, (2002)
  • Challenge: compare, classify, analyze, visualize sequences

    • Example: Kernel Principal Component Analysis to identify clusters
    • Datasets of tRNA sequences \(\set{\bfx_i}_{i=1}^n\)
  • Lots happening behind the scene

    1. Sequences \(\bfx\in\calS\) embedded in Hilbert space \(\calH\) (dimension \(N\gg 1\)) \[ \Phi:\calS\to\calH:\bfx\mapsto\Phi(\bfx) \]

    2. Approximate sequences in low \(d\)-dimensional subspace

      \[\argmin_{\bfmu,\bfA,\bftheta_i}\sum_{i=1}^n\norm[2]{\Phi(\bfx_i)-\bfmu-\bfA\bftheta_i}^2 \text{ with }\matA\in\bbR^{N\times d}\]

    • How do we choose \(\Phi\)? What is the computational complexity? How do we find \(\bfA\), \(\bfmu\), \(\bftheta_i\)?

Mathematical Foundations of Machine Learning

  • Representations

    • How do we represent signals and operators for data analysis?
    • Linear models (and why they matter so much)
  • Models

    • How do we model datasets?
  • Estimation

    • How do we estimate parameters?
  • Computing

    • How do we run algorithms for machine learning?
    • Optimization (gradient descent)

What to expect in ECE 7750

Image
https://xkcd.com/1838
  • ECE 7750 is about the mathematical foundations of machine learning

    • We will talk a lot about probability and linear algebra
    • We will prove a lot of things formally (theorems, lemmas)
    • We will not develop cool apps based on Deep Neural Nets
    • Exams and homework will have theoretical components
  • All that being said…

    • We will also use simulations to understand concepts
    • We will talk about machine learning
    • Homework will have an experimental component (Python required)
    • ECE 7750 is a fun course and you will learn a lot of useful concepts
  • If you’re unsure about taking the class, the self-assessment is here to help!

  • ECE 7750 will give you solid background to self-study or take other ML courses at GT

Logistics (the easy part)

  • Class time and venue: Monday and Wednesday 3:30pm-4:45pm
    • In-person live course
    • Asynchronously recorded lectures (DL and on-campus) + synchronous BlueJeans

Image

  • Instructor: Prof. Matthieu Bloch
  • Email: matthieu.bloch@ece.gatech.edu
  • Office: TSRB 441 (appointments only)
  • Office hours: to be announced (mixed online/on-campus)
  • Teaching assistants: being finalized


  • Websites
    • Canvas: for assignment posting and submission
    • Piazza: for Q&A (Register, link on canvas)
    • Gradescope: for assignment submission

Logistics (the hard part)

  • We are officially back in-person (no social distancing, etc.)

  • Official Institute policy at Tech Moving Forward

    • We encourage everyone in the Georgia Tech community to follow the Centers for Disease Control and Prevention’s (CDC) recommendations, vaccinate, and wear a mask in campus buildings.
    • Stamps Health Services is offering free Covid-19 vaccines in August and September.
    • Asymptomatic testing on campus is easy, convenient and free
Image

Electronic communication policy

Image
https://xkcd.com/1873
  • General guidelines
    • Email the Dean of Students if your personal situation requires special academic consideration
    • Use Piazza for technical questions
      • You can be anonymous to your peers, not to the instructors
      • You can use \(\LaTeX\) (\(\min_\beta\Vert y-X\beta\Vert_2^2\))
    • Be courteous in your electronic interactions
      • Avoid judgmental language, e.g., “The answer is obvious.”
      • Try to be constructive
      • Avoid typos and use correct syntax
  • If you really have to email me
    • Include [ECE 7750] in the subject of the email
    • I am usually reasonably fast
    • If you email the TAs, cc me

Writing

Image
Image

Grading

  • Self-assessment and assignments (50%)
    • Self-assessment is here to help you decide whether ECE 7750 is right for you
      • Review of concepts from calculus, linear algebra, probability theory, and programming.
      • Open-book/internet test
    • Assignments
      • Due approximately every week (~10-16 assignments overall).
      • Both mathematical and programming problems.
      • You are encouraged to typeset in \(\LaTeX\), but do not waste time.
      • Allocate time to submit on gradescope
  • Midterm exam (2x15%)
    • Take home exam, 24 to 48 hours to complete
  • Final exam (20%)
    • Take home exam, 48 hours to complete

Assignments policy

Image
https://xkcd.com/1658
  • Two stage deadline policy

    • Soft official deadline with 2% bonus (conditions apply, read the fine print)
    • Hard deadline 48 hours after soft-deadline; no late homework accepted after hard deadline
  • Abide by the Georgia Tech honor code

    • Reference all your sources
    • Do not plagiarize other sources (python code, homework solutions, etc.)
    • Do not upload course material on other websites
    • When in doubt regarding what constitutes plagiarism, ask!
  • Assignments are individual but light collaboration permitted and encouraged

    • Piazza is here for that purpose
    • Small study groups are ok

Final thoughts

Image
At home…
  • I believe in accountability, integrity, and fairness

    • I will hold you to the same standards
    • I trust you be default - trust is easily lost, not easily regained
  • Don’t be shy and don’t hesitate to talk to me !

    • I have little bandwidth for whining and complaining but I’m usually friendly
  • I value your feedback - I use it to make improvements

  • Aspects students least liked about the course

    • “The lack of examples during lecture. Going from theory to homework problems was very difficult for me.”
    • “It was frustrating to be able to download the raw slides (in order to watch the lecture and take notes), but then not know when the annotated slides would be uploaded.”
    • “Each assignment consisted of numerous questions of high difficulty.”
    • “The lack of advanced materials on learning.”
    • “The class uses proofs too soon. Proofs are the highest level of understanding of a concept, and aren’t a good introduction into a concept.”
    • “I also wish there’d be at least audio from our side (if not video)”