ECE 6605 - Information Theory

Prof. Matthieu Bloch

Thursday August 24, 2023

Announcements

  • Make sure you're familiar with the course logistics
    • Canvas: read the syllabus
    • Piazza: make sure you register, this is not automatic
    • Gradescope: make sure you login to try
  • Office hours
    • 11am in TSRB 423 - might have to move at times for seminars
    • Starting next week
  • Registration
    • End Friday 4pm
    • 42 students thus far (probably means not TA for the class)

Last time…

  • Key take away
    • Entropy is a mathematical quantity with operational relevance
    • Operational = solution to an engineering questions
    • We will show that many information metrics do!
  • Today
    • More on notation
    • Useful mathematical tools
    • Entropy

Notation

  • Sets, vectors, matrices, functions
    • Real line \(\mathbb{R}\)
    • Non-negative integers \(\mathbb{N}\)
    • Open and closed sets on the real line: for \(x<y\in\mathbb{R}\), \(]x;y]\eqdef\set{z\in\mathbb{R}:x<z\leq y}\)
    • Set of integers: for \(n<m\in\mathbb{N}\), \([n;m]\eqdef \set{k\in\mathbb{N}:n\leq k \leq m}\)
    • Vector \(\mathbf{x}\in\mathbb{R}^n\): \(\mathbf{x}\eqdef \set{x_1,x_2,\dots,x_n}\)
    • \(x^{i:j}\eqdef\set{x_k:i\leq k\leq j}\) so that \(x^{1:n}\eqdef \mathbf{x}\)
    • Matrices: \(\bfA\in\bbR^{n\times m}\) (context allows us to avoid confusion arises with random vectors)
    • Real-valued function \(f:\mathbb{R}\to\mathbb{R}:x\mapsto f(x)\)
  • Random variables
    • Random variable \(X\) taking values in alphabet \(\calX\) with realization \(x\).
    • Random vector \(\mathbf{X}=\set{X_i}_{i=1}^n\)
    • Assume \(\card{\calX}<\infty\) unless stated otherwise
    • Probability simplex on \(\Delta(\calX)\eqdef\set{\mathbf{p}\in[0;1]^\card{\calX}: \sum_{x\in\calX}p_x=1}\)
    • Probability mass function of \(X\in\calX\) is \(P_X\) (\(\P{X=x}\eqdef P_X(x)=p_x\))
    • Cumulative distribution function: \(F_X(x)\eqdef \P{X\leq x}\)
    • Support of \(X\) is \(\text{supp}(X)\eqdef \set{x\in\calX:p_X(x)>0}\)
    • What about continuous random variables? We'll denote the probability density function by \(p_X\)

Independence, expectation

Let \(X\in\calX\) and \(Y\in\calY\)

  1. The joint distribution of \(X\) and \(Y\) is \(P_{XY}\), the PDF of the random vector \((X,Y)\)
  2. The distribution of \(X\) is the marginal \(P_X(x)=\sum_{y}P_Y(y)\)
  3. The conditional distribution of \(Y\) given \(X\) is \(P_{Y|X}(y|x)\eqdef P_{XY}(x,y)/P_X(x)\) if \(P_X(x)>0\)

\[ P_{Y|X}(y|x) = P_{X|Y}(x|y)\frac{P_{Y}(y)}{P_X(x)} \]

Two random variables \(X\in\calX\) and \(Y\in\calY\) are independent if and only if \(P_{XY}=P_XP_Y\)

For \(f:\mathbb{R}\to\mathbb{R}\), \(f(X)\) is the random variable such that \(P_{f(X)}(y)\eqdef \sum_{x:f(x)=y}P_X(x)\) \[ \E{f(X)}\eqdef \sum_{x}f(x)P_X(x) \]

Moments, Absolute continuity

Let \(X\) be a random variable. The \(m\) -th moment of \(X\) is \(\E{X^m}\)

The variance is the second centered moment \(\text{Var}{X}\eqdef \E{(X-\E{X})^2}=\E{X^2}-\E{X}^2\)

Let \(X\) be a random variable and \(\calE\subset\mathbb{R}\). Then \[\E{\indic{X\in\calE}}=\P{X\in\calE}\]

For random variables \(X,Y\) and a function \(f:\mathbb{R}^2\to\mathbb{R}\) \[ \E[XY]{f(X,Y)} = \E[Y]{\E[X|Y]{f(X,Y)|Y}} \]

Let \(P,Q\in\Delta(\calX)\). We say that \(P\) is absolutely continuous wrt \(Q\), denoted by \(P\ll Q\), if \(\text{supp}{P}\subseteq\text{supp}{Q}\).

Markov chain and conditional independence

Let \(X, Y, Z\) be real-valued random variables with joint PMF \(P_{XYZ}\). Then \(X\), \(Y\), \(Z\) form a Markov chain in that order, denoted \(X - Y - Z\), if \(X\) and \(Z\) are conditionally independent given \(Y\), i.e., \[ \forall (x,y,z)\in\calX\times\calY\times\calZ \textsf{ we have }P_{XZY}(x,y,z)= P_{Z|Y}(z|y) P_{Y|X}(y|x) P_X(x) \]

Tail and concentration inequalities

  • For \(X\in\calX\), a tail inequality is a bound of the form \(\P{X\geq t}\leq f(t)\) for some \(t\in\bbR\) and some function \(f:\bbR\to\bbR^+\).

Let \(X\) be a non-negative real-valued random variable. Then for all \(t>0\)

\[\begin{align} \P{X\geq t}\leq \frac{\E{X}}{t}.\label{eq:MarkovInequality} \end{align}\]

  • For \(X \in \calX \subset \mathbb{R}\), consider \(\phi:\calX\rightarrow\mathbb{R}^+\) non-decreasing on \(\calX\) such that \(\E{\abs{\phi(X)}}<\infty\).

\[\begin{equation} \mathbb{P}[X \geq t] = \mathbb{E}[\indic{X \geq t}] = \mathbb{E}[{\indic{X \geq t}} \indic{\phi(X) \geq \phi(t)}] \leq \mathbb{P}[\phi(X) \geq \phi(t)], \end{equation}\]

Let \(X \in \mathbb{R}\). Then,

\[\begin{align} \mathbb{P}[|X - \E{X}| \geq t] \leq \frac{\text{Var}X}{t^2}. \end{align}\]

Tail and concentration inequalities

Consider independent random variables \(X_i\) with \(\mathbb{E}[X_i] = 0\) and \(X_i \in [a_i,b_i]\). Let \(Y = \sum_{i=1}^n X_i\). Then

\[\begin{align} \mathbb{P}\left[ \sum_{i=1}^n X_i \geq t \right]\leq \exp \left( - \frac{2 t^2}{\sum_{i=1}^n (a_i-b_i)^2} \right) \end{align}\]

  • This is a bit harder to prove but this will be very useful
  • There are many variations of such bounds, you can find them in the literature, e.g.,
    • Stéphane Boucheron et al., Concentration Inequalities: A Nonasymptotic Theory of Independence, (2016)

Convexity and Jensen’s inequality

A function \(f:\intseq{a}{b} \longmapsto \mathbb{R}\) is convex if \(\forall \lambda \in \intseq{0}{1}\) \(f(\lambda a + (1-\lambda)b ) \leq \lambda f(a) + (1 - \lambda) f(b)\). A function \(f\) is strictly convex if the inequality above is strict. A function \(f\) is (strictly) concave if \(-f\) is (strictly) convex.

Let \(X\) be a real-valued random variable defined on some interval \([a,b]\) and with PDF \(p_X\). Let \(f:[a,b]\rightarrow \mathbb{R}\) be a real valued function that is convex in \([a,b]\). Then,

\[\begin{align*} f(\E{X})\leq \E{f(X)}. \end{align*}\]

For any strictly convex function, equality holds if and only if \(X\) is a constant. The results also holds more generally for continuous random variables.

Let \(\{a_i\}_{i=1}^n\in\bbR_{+}^n\) and \(\{b_i\}_{i=1}^n\in\bbR_{+}^n\). Then,

\[\begin{align*} \sum_{i=1}^n a_i\ln \frac{a_i}{b_i} \geq \left(\sum_{i=1}^n a_i\right)\ln\frac{\sum_{i=1}^n a_i}{\sum_{i=1}^n b_i}. \end{align*}\]

Entropy

  • Axiomatic construction of entropy H
    1. If \(m>n\in\mathbb{N}\) and \(X\sim\calU([1;m])\), \(Y\sim\calU[1;n]\) then \(H(X)>H(Y)\)
    2. If \(m,n\in\mathbb{N}\), \(X\sim\calU([1;m])\), \(Y\sim\calU[1;n]\), \(X\) indep. of \(Y\) then \(H(XY)=H(X)+H(Y)\)
    3. Let \(\calA,\calB\subset\calX\) with \(\calA\cap\calB=\emptyset\); then \[ H(P_X) = \P{A}H(P_{X|A}) + \P{B}H(P_{X|B}) \]
    4. \(H(X)\) is continuous function of \(p\) if \(X\sim B(p)\)

The only functions that satisfies the axioms are \(H(X) = -C\sum_xP_X(x)\log P_X(x)\) for some \(C>0\).

Let \(X\in\calX\) be a discrete random variable with \(\card{\calX}<\infty\). The Shannon entropy of \(X\) is defined as

\[\begin{align} \mathbb{H}(X) \eqdef \E[X]{-\ln P_X(X)} = - \sum_{x \in {\calX}} P_X(x) \ln P_X(x), \end{align}\]

with the convention that \(0\ln 0 = 0\).

Properties of entropy

Let \(X\in\calX\) be a discrete random variable. Then \(\mathbb{H}(X) \geq 0\) with equality iff \(X\) is a constant, i.e., there exists \(x^*\in\calX\) such that \(\P{X=x^*}=1\).

Let \(X \in {\calX}\) be a discrete random variable. Then \(\mathbb{X}\leq \ln \card{\calX}\) with equality if and only if \(X\) is uniform over \({\calX}\), i.e., \(\forall x \in {\calX}, P_X(x) = \frac{1}{\card{\calX}}\).

Joint and conditional entropy

The joint entropy of two discrete random variables \(X\in\calX\) and \(Y\in\calY\) with joint PMF \(P_{XY}\) is

\[\begin{align*} \mathbb{H}(X,Y) \eqdef \mathbb{E}_{XY}(-\log_2 P_{XY}(X,Y)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}}P_{XY}(x, y) \log_2 P_{XY}(x, y). \end{align*}\]

Furthermore, the conditional entropy \(Y\) given \(X\) is

\[\begin{align*} \mathbb{H}(Y|X) \eqdef \mathbb{E}_{XY}(-\log_2 P_{Y|X}(Y|X)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}} P_{XY}(x,y) \log_2 P_{Y|X}(y|x). \end{align*}\]

Let \(X, Y\) be discrete random variables with joint PMF \(p_{XY}\). Then \(\mathbb{H}(Y|X) \geq 0\) with equality if and only if \(Y\) is a function of \(X\).