The Mathematics of ECE

Probabilities - Markov, Chernoff, Hoeffding

Friday, August 27, 2021

Logistics

  • The Mathematics of ECE
    • Review sessions not tied to any class coverting probabilities, linear algebra, and real analysis
    • Objective: small curriculum but opportunity to ask anything!
    • No grade, no pressure, no stakes
  • Instructor
    • Brighton Ancelin, Ph.D. student in Machine Learning
    • Piazza for course: sign-up
  • Today’s class: Probabilities
    • Brief review of axiomatic construction
    • Markov’s inequality, Chebyshev, Weak law of large numbers
Image
https://xkcd.com/2370/

Probabilities: Axiomatic construction

  • Probabilities are not that old. The axiomatic theory was carried out by Kolmogorov in 1932

  • Let \(\Omega\neq\emptyset\) be a sample space. The class of subsets of \(\Omega\) that constitutes events satisfies the following axioms:
    1. \(\Omega\) is an event;
    2. For \(\{\calA_i\}_{i\geq 1}^\infty\) some events in \(\Omega\), \(\cup_{i=1}^\infty \calA_i\) is an event;
    3. For every event \(\calA\) in \(\Omega\), \(\calA^c\) is an event.
  • Let \(\Omega\neq\emptyset\) be a sample space and \(\calF\) a class of events satisfying the axioms for events. A probability rule is a function \(\mathbb{P}:\calF\to\bbR^+\) such that:
    1. \(\P{\Omega}=1\);
    2. For every \(\calA\subset\calF\) we have \(\P{\calA}\geq 0\);
    3. For any disjoint events in \(\calF\) \(\{\calA_i\}_{i=1}^\infty\) we have \(\P{\cup_{i=1}^\infty\calA_i}=\sum_{i=1}^\infty \P{\calA_i}\).
  • We will discuss in Review session #3 (Monday September 13, 2021) why we need to deal with \(\calF\) instead of \(\Omega\).

  • It really matters to do things properly but we can get away without it for now

key tool: The union bound

  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space. For any events \(\set{\calA_i}_{i\geq 1}\) we have \(\P{\cup_{i\geq 1}\calA_i}\leq \sum_{i\geq 1}\P{\calA_i}\)

Probabilities: Conditional probability and independence

  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space. If \(\P{\calB}>0\), the conditional probability of event \(\calA\) given event \(\calB\) is given by \(\P{\calA|\calB} = \P{\calA\cap\calB}/\P{\calB}\).

  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space and \(\calA,\calB\) with non-zero probability. Then, \[\P{\calA|\calB} = \P{\calB|\calA}\frac{\P{\calA}}{\P{\calB}}.\]
  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space. Event \(\calA\) is independent of event \(\calB\) if \(\P{\calA\cap\calB} = \P{\calA}\P{\calB}\).

    For \(n>2\), the events \(\set{\calA_i}_{i=1}^n\) are independent if \(\forall\calS\subset\intseq{1}{n}\) such that \(\card{\calS}\geq 2\), \[\P{\cap_{i\in\calS}\calA_i}=\prod_{i\in\calS}\P{\calA_i}\]

Random variables: Definition

  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space. A random variable \(X\) is a function \(X:\Omega\to\bbR\).
    1. \(X\) might be undefined or infinite on a subset of zero probability.
    2. \(\set{\omega\in\Omega:X(\omega)\leq x}\) must be an event for all \(x\in\bbR\)
    3. For a finite set of random variables \(\set{X_i}_{i=1}^n\), the set \(\set{\omega:X_1(\omega)\leq x_1,\cdots,X_n(\omega)\leq x_n}\) must be an event for all \(\set{x_i}_{i=1}^n\)
  • Let \((\Omega, \calF, \mathbb{P})\) be a probability space and \(X\) a random variable. The CDF of \(X\) is the function \[F_X:\bbR\to[0,1]:x\mapsto \P{\omega\in\Omega:X(\omega)\leq x}\eqdef \P{X\leq x}\]

  • If \(\card{\calX}<\infty\) or countable, the random variable is discrete. We can write \(\calX=\set{x_i}_{i=1}^\card{\calX}\) and \(P_X(x_i)\eqdef\P{X=x_i}\) is called the probability mass function (PMF) of \(X\).

  • If the CDF of \(X\) has a finite derivative at \(x\), the derivative is called the probability density function (PDF), denoted by \(p_X\). If \(F_X\) has a derivative for every \(x\in\bbR\), \(X\) is continuous

  • We often don’t need to specify \((\Omega, \calF, \mathbb{P})\). All we need is a CDF (or PMF or PDF)

Random variable: Moments

  • Let \(X\) be a random variable with PMF \(P_X\). Then \(\E{X}\eqdef\sum_{x\in\calX}x P_X(x)\).

    Let \(X\) be a random variable with PDF \(p_X\). Then \(\E{X}\eqdef\int_{x\in\calX}x p_X(x)dx\).

  • Expectation of a function \(f\) of a discrete \(X\) is \(\E{f(X)} = \sum_{x\in\calX} f(x)P_X(x)\) (and idem for PDFs).

  • Let \(X\) be a random variable. The \(m\)th moment of \(X\) is \(\E{X^m}\).

    The variance is the second centered moment \(\Var{X}\eqdef \E{(X-\E{X})^2}=\E{X^2}-\E{X}^2\).

  • Let \(X\) be a random variable and \(\calE\subset\bbR\). Then \[\E{\indic{X\in\calE}}=\P{X\in\calE}\]

  • 11th commandment: thou shall denote random variables by capital letters

  • 12th commandment: but sometimes not

Concentration inequalities: Markov

  • Let \(X\) be a non-negative real-valued random variable. Then for all \(t>0\) \[\P{X\geq t}\leq \frac{\E{X}}{t}.\]

  • Let \(X\) be a real-valued random variable. Then for all \(t>0\) \[\P{\abs{X-\E{X}}\geq t}\leq \frac{\Var{X}}{t^2}.\]

  • Let \(\{X_i\}_{i=1}^N\) be i.i.d. real-valued random variables with finite mean \(\mu\) and finite variance \(\sigma^2\). Then \[\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i-\mu}\geq\epsilon}\leq\frac{\sigma^2}{N\epsilon^2}\qquad\lim_{N\to\infty}\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i-\mu}\geq \epsilon}=0.\]

Concentration inequalities: Hoeffding

  • We can obtain much better bounds than with Chebyshev

  • Let \(\{X_i\}_{i=1}^N\) be i.i.d. real-valued zero-mean random variables such that \(X_i\in[a_i;b_i]\) with \(a_i<b_i\). Then for all \(\epsilon>0\) \[\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i}\geq\epsilon}\leq 2\exp\left(-\frac{2N^2\epsilon^2}{\sum_{i=1}^N(b_i-a_i)^2}\right).\]