Matthieu Bloch
April 7, 2020
For fixed hypothesis \(\calH\) and dataset \(\calD\), goal is to find function \(h\in\calH\) that minimizes true risk \(R(h)\)
Hope to approximate the Bayes classifier (classification) or underlying function (regression)
Regularization plays a similar role, by biasing answer away from complex classifiers/functions
Complexity must be carefully limited to avoid overfitting
VC generalization bound \[R(h)\leq \widehat{R}_N(h)+\epsilon(\calH,N)\quad\textsf{w.h.p.}\]
\[\E[\calD]{R(\hat{h}_{\calD})}=\sigma^2_{\varepsilon} + \E[X]{\Var{\hat{h}_{\calD}(X)}|X} + \E[X]{\textsf{Bias}(\hat{h}_\calD(X))^2|X}\] with \[\Var{\hat{h}_\calD(X)}\eqdef \E[\calD]{\left(\hat{h}_\calD(X)-\E[\calD]{\hat{h}_\calD(X)}\right)^2}\] \[\textsf{Bias}(\hat{h}_\calD(X))\eqdef \E[\calD]{\hat{h}_\calD(X)}-h(X)\]
Intuition: role of bias and variance
Demo: learning a sinusoid
Bias-variance decomposition more useful as a conceptual tool than as a practical technique