# Mathematical Foundations of Machine Learning

Monday August 23, 2021

## Motivating example: Kernel PCA

• Challenge: compare, classify, analyze, visualize sequences

• Example: Kernel Principal Component Analysis to identify clusters
• Datasets of tRNA sequences $\set{\bfx_i}_{i=1}^n$
• Lots happening behind the scene

1. Sequences $\bfx\in\calS$ embedded in Hilbert space $\calH$ (dimension $N\gg 1$) $\Phi:\calS\to\calH:\bfx\mapsto\Phi(\bfx)$

2. Approximate sequences in low $d$-dimensional subspace

$\argmin_{\bfmu,\bfA,\bftheta_i}\sum_{i=1}^n\norm[2]{\Phi(\bfx_i)-\bfmu-\bfA\bftheta_i}^2 \text{ with }\matA\in\bbR^{N\times d}$

• How do we choose $\Phi$? What is the computational complexity? How do we find $\bfA$, $\bfmu$, $\bftheta_i$?

## Mathematical Foundations of Machine Learning

• Representations

• How do we represent signals and operators for data analysis?
• Linear models (and why they matter so much)
• Models

• How do we model datasets?
• Estimation

• How do we estimate parameters?
• Computing

• How do we run algorithms for machine learning?

## What to expect in ECE 7750

• ECE 7750 is about the mathematical foundations of machine learning

• We will talk a lot about probability and linear algebra
• We will prove a lot of things formally (theorems, lemmas)
• We will not develop cool apps based on Deep Neural Nets
• Exams and homework will have theoretical components
• All that being said…

• We will also use simulations to understand concepts
• We will talk about machine learning
• Homework will have an experimental component (Python required)
• ECE 7750 is a fun course and you will learn a lot of useful concepts
• If you’re unsure about taking the class, the self-assessment is here to help!

• ECE 7750 will give you solid background to self-study or take other ML courses at GT

## Logistics (the easy part)

• Class time and venue: Monday and Wednesday 3:30pm-4:45pm
• In-person live course
• Asynchronously recorded lectures (DL and on-campus) + synchronous BlueJeans

• Instructor: Prof. Matthieu Bloch
• Email: matthieu.bloch@ece.gatech.edu
• Office: TSRB 441 (appointments only)
• Office hours: to be announced (mixed online/on-campus)
• Teaching assistants: being finalized

• Websites
• Canvas: for assignment posting and submission
• Piazza: for Q&A (Register, link on canvas)

## Logistics (the hard part)

• We are officially back in-person (no social distancing, etc.)

• Official Institute policy at Tech Moving Forward

• We encourage everyone in the Georgia Tech community to follow the Centers for Disease Control and Prevention’s (CDC) recommendations, vaccinate, and wear a mask in campus buildings.
• Stamps Health Services is offering free Covid-19 vaccines in August and September.
• Asymptomatic testing on campus is easy, convenient and free

## Electronic communication policy

• General guidelines
• Email the Dean of Students if your personal situation requires special academic consideration
• Use Piazza for technical questions
• You can be anonymous to your peers, not to the instructors
• You can use $\LaTeX$ ($\min_\beta\Vert y-X\beta\Vert_2^2$)
• Be courteous in your electronic interactions
• Avoid judgmental language, e.g., “The answer is obvious.”
• Try to be constructive
• Avoid typos and use correct syntax
• If you really have to email me
• Include [ECE 7750] in the subject of the email
• I am usually reasonably fast
• If you email the TAs, cc me

## Writing

• Self-assessment and assignments (50%)
• Self-assessment is here to help you decide whether ECE 7750 is right for you
• Review of concepts from calculus, linear algebra, probability theory, and programming.
• Open-book/internet test
• Assignments
• Due approximately every week (~10-16 assignments overall).
• Both mathematical and programming problems.
• You are encouraged to typeset in $\LaTeX$, but do not waste time.
• Allocate time to submit on gradescope
• Midterm exam (2x15%)
• Take home exam, 24 to 48 hours to complete
• Final exam (20%)
• Take home exam, 48 hours to complete

## Assignments policy

• Soft official deadline with 2% bonus (conditions apply, read the fine print)
• Abide by the Georgia Tech honor code

• Do not plagiarize other sources (python code, homework solutions, etc.)
• Do not upload course material on other websites
• When in doubt regarding what constitutes plagiarism, ask!
• Assignments are individual but light collaboration permitted and encouraged

• Piazza is here for that purpose
• Small study groups are ok

## Final thoughts

• I believe in accountability, integrity, and fairness

• I will hold you to the same standards
• I trust you be default - trust is easily lost, not easily regained
• Don’t be shy and don’t hesitate to talk to me !

• I have little bandwidth for whining and complaining but I’m usually friendly
• I value your feedback - I use it to make improvements

• Aspects students least liked about the course

• “The lack of examples during lecture. Going from theory to homework problems was very difficult for me.”
• “It was frustrating to be able to download the raw slides (in order to watch the lecture and take notes), but then not know when the annotated slides would be uploaded.”
• “Each assignment consisted of numerous questions of high difficulty.”
• “The lack of advanced materials on learning.”
• “The class uses proofs too soon. Proofs are the highest level of understanding of a concept, and aren’t a good introduction into a concept.”
• “I also wish there’d be at least audio from our side (if not video)”