Mathematical Foundations of Machine Learning

Prof. Matthieu Bloch

Monday August 19, 2024 (v1.1)

Don't Panic

  • If you are not officially enrolled in the class:
    • Ask for a permit as appropriate here
    • There will be lots of movement in the waitlist - historically everyone who wanted a seat in the course, got it
    • Let me know on Thursday August 22, 2024 in the evening if you're still unable to register
  • If you are officially enrolled in the class
    • As a courtesy to others, decide whether to stay or not by Wednesday August 21, 2024 evening

Motivating example: Kernel PCA

Image
Kin et al., Genome Informatics, (2002)
  • tRNA (transfer RNA): plays a key role in the creation of amino acid sequence of proteins (source)
    • G G G G A A T T A G C T C A A G C G G T A G A G C G …
  • Challenge: compare, classify, analyze, visualize sequences
    • Example: Kernel Principal Component Analysis
    • Datasets of tRNA sequences \(\set{\bfx_i}_{i=1}^n\)
  • Lots happening behind the scene
    1. Sequences \(\bfx\in\calS\) embedded in Hilbert space \(\calH\) (dimension \(N\gg 1\)) \[ \Phi:\calS\to\calH:\bfx\mapsto\Phi(\bfx) \]

    2. Approximate sequences in low \(d\)-dimensional subspace

      \[\argmin_{\bfmu,\bfA,\bftheta_i}\sum_{i=1}^n\norm[2]{\Phi(\bfx_i)-\bfmu-\bfA\bftheta_i}^2 \text{ with }\matA\in\bbR^{N\times d}\]

  • How do we choose \(\Phi\)? What is the computational complexity? How do we find \(\bfA\), \(\bfmu\), \(\bftheta_i\)?

Mathematical Foundations of Machine Learning

  • Representations

    • How do we represent signals and operators for data analysis?
    • Linear models (and why they matter so much)
  • Models

    • How do we model datasets?
  • Estimation

    • How do we estimate parameters?
  • Computing

    • How do we run algorithms for machine learning?
    • Optimization (gradient descent)

What to expect in ECE 7750

Image
https://xkcd.com/1838
  • ECE 7750 is about the mathematical foundations of machine learning

    • We will talk a lot about probability and linear algebra
    • We will prove a lot of things formally (theorems, lemmas)
    • We will not develop cool apps based on Deep Neural Nets
    • Exams and homework will have theoretical components
  • All that being said…

    • We will also use simulations to understand concepts
    • We will talk about machine learning
    • Homework will have an experimental component (Python required)
    • ECE 7750 is a fun course and you will learn a lot of useful concepts
  • If you're unsure about taking the class, the self-assessment is here to help!

  • ECE 7750 will give you solid background to self-study or take other ML courses at GT

Logistics

  • Class time and venue: Monday and Wednesday 3:30pm-4:45pm
    • In-person live course
    • Synchronous online lecture recorded for asynchronous viewing (DL and on-campus)
Image
  • Instructor: Prof. Matthieu Bloch
  • Email: matthieu.bloch@ece.gatech.edu
  • Office: TSRB 437 (appointments only)
  • Office hours: to be announced (mixed online/on-campus)
  • Teaching assistants: Jack Hill

Knack tutoring

  • Students looking for additional assistance outside of the classroom are advised to consider working with a peer tutor through Knack.
  • Georgia Tech has partnered with Knack to provide students access to verified peer tutors who have previously aced this course.
  • This is pilot program, I have never used Knack myself
    • Give me your feedback?

Electronic communication policy

Image
https://xkcd.com/1873
  • General guidelines
    • Email the Dean of Students if your personal situation requires special academic consideration
    • Use Piazza for technical questions
      • You can be anonymous to your peers, not to the instructors
      • You can use \(\LaTeX\) (\(\min_\beta\Vert y-X\beta\Vert_2^2\))
    • Be courteous in your electronic interactions
      • Avoid judgmental language, e.g., "The answer is obvious", "This is trivial."
      • Try to be constructive
      • Avoid typos and use correct syntax
  • If you really have to email me
    • Include `[ECE 7750]` in the subject of the email
    • I am usually reasonably fast
    • If you email the TAs, cc me

Pop Quiz: Question 1

Image
AI Generated with co-pilot
  • You are not feeling well and you cannot turn in your homework on time. Who do you contact?
    • (a) The Dean
    • (b) The Dean of Students
    • (c) The School Chair
    • (d) The Instructor

Pop Quiz: Question 2

Image
AI Generated with co-pilot
  • You are facing personal challenges that may affect your semester. Who do you contact?
    • (a) The Dean
    • (b) The Dean of Students
    • (c) The School Chair
    • (d) The Instructor

Pop Quiz: Question 3

Image
AI Generated with co-pilot
  • You are not happy with your midterm grade. Who do you contact?
    • (a) The Dean
    • (b) The Dean of Students
    • (c) The School Chair
    • (d) The Instructor

Writing

Image
Image
  • Be extra careful with written communication
    • Proper greeting and proper closing
    • 3 lines
    • Clear asks

Grading

  • Self-assessment and assignments (50%)
    • Self-assessment is here to help you decide whether ECE 7750 is right for you
      • Review of concepts from calculus, linear algebra, probability theory, and programming.
      • Open-book/internet test
    • 2% of 50% for submission
    • Assignments
      • Due approximately every 10 days (about 8 assignments, subject to updates).
      • Both mathematical and programming problems.
      • You are encouraged to typeset in \(\LaTeX\), but do not waste time.
      • Allocate time to submit on gradescope
      • The maximum number of homework points that you can earn is \((N-1)\times 100\), where \(N\) is the number of assignments
  • Midterm exam (25%)
    • Wednesday October 9, 2024 3:30pm-4:45pm in class
  • Final exam (25%)
    • Friday December 6, 2024 2:40pm-5:30pm

Assignments policy

Image
https://xkcd.com/1658
  • Two stage deadline policy
    • Soft official deadline with 2% bonus (conditions apply, read the fine print)
    • Hard deadline 48 hours after soft-deadline; no late homework accepted after hard deadline
  • Abide by the Georgia Tech honor code
    • Reference all your sources
    • Do not plagiarize other sources (python code, homework solutions, etc.)
    • Do not upload course material on other websites
    • Use of Generative AI without acknowledgment is plagiarism
    • When in doubt regarding what constitutes plagiarism, ask!
  • Assignments are individual but light collaboration permitted and encouraged
    • Piazza is here for that purpose
    • Small study groups are ok

Final thoughts

  • I believe in accountability, integrity, and fairness
    • I will hold you to the same standards
    • I trust you be default - trust is easily lost, not easily regained
  • Don't be shy and don't hesitate to talk to me !
    • I don't negotiate grades but I’m here to help you learn
  • I value your feedback - I use it to make improvements
  • To be effective in ECE 7750 you should:
    1. Come to class and leave all distractions behind
    2. Be disciplined and complete reading and writing assignments on schedule
    3. Come to office hours if you have questions
    4. Enjoy the learning process, including the necessary struggles with the assignments