The Mathematics of ECE

Probabilities - Markov, Chernoff, Hoeffding

Friday, August 27, 2021

Logistics

The Mathematics of ECE
- Review sessions not tied to any class coverting probabilities, linear algebra, and real analysis
- Objective: small curriculum but opportunity to ask anything!
- No grade, no pressure, no stakes
Instructor
- Brighton Ancelin, Ph.D. student in Machine Learning
- Piazza for course: sign-up
Today’s class: Probabilities
- Brief review of axiomatic construction
- Markov’s inequality, Chebyshev, Weak law of large numbers

https://xkcd.com/2370/

Probabilities are not that old. The axiomatic theory was carried out by Kolmogorov in 1932
Let \(\Omega\neq\emptyset\) be a sample space. The class of subsets of \(\Omega\) that constitutes events satisfies the following axioms:
1. \(\Omega\) is an event;
2. For \(\{\calA_i\}_{i\geq 1}^\infty\) some events in \(\Omega\), \(\cup_{i=1}^\infty \calA_i\) is an event;
3. For every event \(\calA\) in \(\Omega\), \(\calA^c\) is an event.
Let \(\Omega\neq\emptyset\) be a sample space and \(\calF\) a class of events satisfying the axioms for events. A probability rule is a function \(\mathbb{P}:\calF\to\bbR^+\) such that:
1. \(\P{\Omega}=1\);
2. For every \(\calA\subset\calF\) we have \(\P{\calA}\geq 0\);
3. For any disjoint events in \(\calF\) \(\{\calA_i\}_{i=1}^\infty\) we have \(\P{\cup_{i=1}^\infty\calA_i}=\sum_{i=1}^\infty \P{\calA_i}\).
We will discuss in Review session #3 (Monday September 13, 2021) why we need to deal with \(\calF\) instead of \(\Omega\).
It really matters to do things properly but we can get away without it for now

Let \((\Omega, \calF, \mathbb{P})\) be a probability space. For any events \(\set{\calA_i}_{i\geq 1}\) we have \(\P{\cup_{i\geq 1}\calA_i}\leq \sum_{i\geq 1}\P{\calA_i}\)

Let \((\Omega, \calF, \mathbb{P})\) be a probability space. If \(\P{\calB}>0\), the conditional probability of event \(\calA\) given event \(\calB\) is given by \(\P{\calA|\calB} = \P{\calA\cap\calB}/\P{\calB}\).
Let \((\Omega, \calF, \mathbb{P})\) be a probability space and \(\calA,\calB\) with non-zero probability. Then, \[\P{\calA|\calB} = \P{\calB|\calA}\frac{\P{\calA}}{\P{\calB}}.\]
Let \((\Omega, \calF, \mathbb{P})\) be a probability space. Event \(\calA\) is independent of event \(\calB\) if \(\P{\calA\cap\calB} = \P{\calA}\P{\calB}\).

For \(n>2\), the events \(\set{\calA_i}_{i=1}^n\) are independent if \(\forall\calS\subset\intseq{1}{n}\) such that \(\card{\calS}\geq 2\), \[\P{\cap_{i\in\calS}\calA_i}=\prod_{i\in\calS}\P{\calA_i}\]

Let \((\Omega, \calF, \mathbb{P})\) be a probability space. A random variable \(X\) is a function \(X:\Omega\to\bbR\).
1. \(X\) might be undefined or infinite on a subset of zero probability.
2. \(\set{\omega\in\Omega:X(\omega)\leq x}\) must be an event for all \(x\in\bbR\)
3. For a finite set of random variables \(\set{X_i}_{i=1}^n\), the set \(\set{\omega:X_1(\omega)\leq x_1,\cdots,X_n(\omega)\leq x_n}\) must be an event for all \(\set{x_i}_{i=1}^n\)
Let \((\Omega, \calF, \mathbb{P})\) be a probability space and \(X\) a random variable. The CDF of \(X\) is the function \[F_X:\bbR\to[0,1]:x\mapsto \P{\omega\in\Omega:X(\omega)\leq x}\eqdef \P{X\leq x}\]
If \(\card{\calX}<\infty\) or countable, the random variable is discrete. We can write \(\calX=\set{x_i}_{i=1}^\card{\calX}\) and \(P_X(x_i)\eqdef\P{X=x_i}\) is called the probability mass function (PMF) of \(X\).
If the CDF of \(X\) has a finite derivative at \(x\), the derivative is called the probability density function (PDF), denoted by \(p_X\). If \(F_X\) has a derivative for every \(x\in\bbR\), \(X\) is continuous
We often don’t need to specify \((\Omega, \calF, \mathbb{P})\). All we need is a CDF (or PMF or PDF)

Let \(X\) be a random variable with PMF \(P_X\). Then \(\E{X}\eqdef\sum_{x\in\calX}x P_X(x)\).

Let \(X\) be a random variable with PDF \(p_X\). Then \(\E{X}\eqdef\int_{x\in\calX}x p_X(x)dx\).
Expectation of a function \(f\) of a discrete \(X\) is \(\E{f(X)} = \sum_{x\in\calX} f(x)P_X(x)\) (and idem for PDFs).
Let \(X\) be a random variable. The \(m\)th moment of \(X\) is \(\E{X^m}\).

The variance is the second centered moment \(\Var{X}\eqdef \E{(X-\E{X})^2}=\E{X^2}-\E{X}^2\).
Let \(X\) be a random variable and \(\calE\subset\bbR\). Then \[\E{\indic{X\in\calE}}=\P{X\in\calE}\]
11th commandment: thou shall denote random variables by capital letters
12th commandment: but sometimes not

Let \(X\) be a non-negative real-valued random variable. Then for all \(t>0\) \[\P{X\geq t}\leq \frac{\E{X}}{t}.\]
Let \(X\) be a real-valued random variable. Then for all \(t>0\) \[\P{\abs{X-\E{X}}\geq t}\leq \frac{\Var{X}}{t^2}.\]
Let \(\{X_i\}_{i=1}^N\) be i.i.d. real-valued random variables with finite mean \(\mu\) and finite variance \(\sigma^2\). Then \[\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i-\mu}\geq\epsilon}\leq\frac{\sigma^2}{N\epsilon^2}\qquad\lim_{N\to\infty}\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i-\mu}\geq \epsilon}=0.\]

We can obtain much better bounds than with Chebyshev
Let \(\{X_i\}_{i=1}^N\) be i.i.d. real-valued zero-mean random variables such that \(X_i\in[a_i;b_i]\) with \(a_i<b_i\). Then for all \(\epsilon>0\) \[\P{\abs{\frac{1}{N}\sum_{i=1}^N X_i}\geq\epsilon}\leq 2\exp\left(-\frac{2N^2\epsilon^2}{\sum_{i=1}^N(b_i-a_i)^2}\right).\]