ECE 6605 - Information Theory

Prof. Matthieu Bloch

Tuesday September 05, 2023

Announcements

  • Assignment 1
    • Check out office hours from Friday September 01, 2023 on Media library
      • We talked about Problem 1
      • We solved Problem 3 to illustrate reasoning with entropy (problem will not be graded)
    • Deadline extended by 48 hours since you couldn't quite solve everything this weekend
    • Come to office hours on Friday September 08, 2023 11 am in TSRB 423
  • Last time
    • Jensen's inequality
    • Shannon entropy derived from \(f\)-divergence
    • Joint and conditional entropy
  • Today
    • Chain rule of entropy, mutual information
    • Fano's inequality
    • Renyi entropy
    • Entropy rate

Entropy

  • Axiomatic construction of entropy H for \(P_X\in\Delta(\calX)\)
    1. \(H(X)\) only depends on the set of values \(\{p_X(x)\}_{x\in\calX}\) but not on their order.
    2. for \(X\sim B(p)\) with \(p\in[0;1]\), \(H(X)\) is a continuous function of \(p\).
    3. if \(X\sim B(1/2)\), then \(H(X)=1\).
    4. If \(t\in[0;1]\) and \(\set{p_i}_{i=1}^n\in\Delta(\calX)\), then \(H(tp_1,(1-t)p_1,p_2,\cdots,p_n)=p_1H(t,1-t)+ H(p_1,p_2,\cdots,p_n)\)

The only function that satisfies the axioms is \(H(X) = -\sum_xP_X(x)\log_2 P_X(x)\).

Let \(X\in\calX\) be a discrete random variable with \(\card{\calX}<\infty\). The Shannon entropy of \(X\) is defined as

\[\begin{align} \mathbb{H}(X) \eqdef \E[X]{-\log_2 P_X(X)} = - \sum_{x \in {\calX}} P_X(x) \log_2 P_X(x), \end{align}\]

with the convention that \(0\ln 0 = 0\).

Properties of entropy

The entropy \(H:\Delta(\calX)\to\bbR:P\mapsto H(P)\) is a concave function.

Let \(X\in\calX\) be a discrete random variable. Then \(\mathbb{H}(X) \geq 0\) with equality iff \(X\) is a constant, i.e., there exists \(x^*\in\calX\) such that \(\P{X=x^*}=1\).

Let \(X \in {\calX}\) be a discrete random variable. Then \(\mathbb{H}(X)\leq \ln \card{\calX}\) with equality if and only if \(X\) is uniform over \({\calX}\), i.e., \(\forall x \in {\calX}, P_X(x) = \frac{1}{\card{\calX}}\).

Joint and conditional entropy

The joint entropy of two discrete random variables \(X\in\calX\) and \(Y\in\calY\) with joint PMF \(P_{XY}\) is

\[\begin{align*} \mathbb{H}(X,Y) \eqdef \mathbb{E}_{XY}(-\log_2 P_{XY}(X,Y)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}}P_{XY}(x, y) \log_2 P_{XY}(x, y). \end{align*}\]

Furthermore, the conditional entropy \(Y\) given \(X\) is

\[\begin{align*} \mathbb{H}(Y|X) \eqdef \mathbb{E}_{XY}(-\log_2 P_{Y|X}(Y|X)) = - \sum_{x \in {\calX}} \sum_{y \in \mathcal{Y}} P_{XY}(x,y) \log_2 P_{Y|X}(y|x). \end{align*}\]

Let \(X, Y\) be discrete random variables with joint PMF \(p_{XY}\). Then \(\mathbb{H}(Y|X) \geq 0\) with equality if and only if \(Y\) is a function of \(X\).

Let \(X\) and \(Y\) be discrete random variables with joint PMF \(p_{XY}\). Then \(\mathbb{H}(X|Y) \leq \mathbb{H}(X)\) with equality if and only if \(X\) is independent of \(Y\).

Chain rule of entropy

Let \(X\), \(Y\), and \(Z\) be discrete random variable with joint PMF \(p_{XYZ}\). Then \[ \mathbb{H}(XY|Z) = \mathbb{H}(X|Z) + \mathbb{H}(Y|XZ) \notag = \mathbb{H}(Y|Z) + \mathbb{H}(X|YZ). \] More generally, if \(\bfX\eqdef\set{X_i}_{i=1}^n\) and \(Z\) are jointly distributed random variables we have \[ \mathbb{H}(\bfX|Z) = \sum_{i=1}^n\mathbb{H}(X_i|X^{1:i-1}Z) \] with the convention \(X^{1:0}\eqdef \emptyset\).

Mutual information

Let \(X, Y\) be two random variables with joint PMF \(P_{XY}\). The mutual information between \(X\) and \(Y\) is \[ \bbI(X;Y)\eqdef \bbH(X)-\bbH(X|Y)=\bbD(P_{XY}\Vert P_XP_Y). \]

If \(\Delta(\calX\to\calY)\) denotes the set of kernels from \(\calX\) to \(\calY\) then the function \[ I:\Delta(\calX)\times \Delta(\calX\to\calY):(P,W)\mapsto I(P;W)\eqdef \mathbb{D}(W\cdot P\Vert (W\circ P)P) \] is a concave function of \(P\) and a convex function of \(W\).

Conditional mutual information

Let \(X, Y, Z\) be discrete random variables with joint distribution \(p_{XYZ}\). The conditional mutual information between \(X\) and \(Y\) given \(Z\) is \(\mathbb{I}(X ; Y|Z) \eqdef \mathbb{H}(X|Z) - \mathbb{H}(X|YZ)\).

  • Using Baye's rule, \(\mathbb{I}\left(X;Y|Z\right) = \E[Z]{\mathbb{D}(P_{XY|Z}\Vert P_{X|Z}P_{Y|Z})}\).

\(\mathbb{I}(X;Y|Z)\geq 0\) if and only if \(X\) and \(Y\) are conditionally independent given \(Z\).

Let \(\bfX\eqdef\set{X_i}_{i=1}^n\), \(Y\), and \(Z\) be jointly distributed random variables. Then,

\[\begin{align*} \mathbb{I}(\bfX;Y|Z) = \sum_{i=1}^n\mathbb{I}(X_i;Y|Z\bfX^{1:i-1})\textsf{ with the convention }\bfX^{1:0}=\emptyset. \end{align*}\]

Let \(X\), \(Y\), and \(Z\) be discrete random variables such that \(X \rightarrow Y \rightarrow Z\). Then \(\mathbb{I}(X;Y) \geq \mathbb{I}(X;Z)\) or, equivalently, \(\mathbb{H}(X|Z) \geq \mathbb{H}(X|Y)\).

Fano's inequality

Let \(X\) be a discrete random variable with alphabet \({\calX}\). Let \(\widehat{X}\) be an estimate of \(X\), \(\widehat{X} \in {\calX}\) and with joint distribution \(p_{X\widehat{X}}\). We define the probability of estimation error: \(P_e \eqdef \mathbb{P}[X\neq\widehat{X}]\). Then, \(\mathbb{H}(X|\widehat{X}) \leq \mathbb{H}_b(P_e) + P_e \log(\card{\calX}-1)\).

  • Fano's inequality is deceptively simple! It relates:
  • an operational quantity, the probability of error;
  • an information thoeretic metric, the (conditional) entropy.

Renyi entropy

  • Weaker axiomatic construction of entropy H for \(P_X\in\Delta(\calX)\)
    1. \(H(X)\) only depends on the set of values \(\{p_X(x)\}_{x\in\calX}\) but not on their order.
    2. for \(X\sim B(p)\) with \(p\in[0;1]\), \(H(X)\) is a continuous function of \(p\).
    3. if \(X\sim B(1/2)\), then \(H(X)=1\).

Let \(X\in\calX\) be a discrete random variable. The Renyi entropy of order \(\alpha\in\mathbb{R}\cup\{\infty\}\) of \(X\) is

\[\begin{align*} H_{\alpha}(X)\eqdef \frac{1}{1-\alpha}\log \sum_{x\in\calX} p_X(x)^\alpha, \end{align*}\]

with the convention that \(H_1(X)=\lim_{\alpha\rightarrow 1} H_\alpha(X)\) and \(H_\infty(X)=\lim_{\alpha\rightarrow \infty} H_\alpha(X)\).

Properties of Renyi entropy

Let \(X\in\calX\) be a discrete random variable. Then,

  • \(H_0(X)=\log \card{\calX}\) is called the max-entropy
  • \(H_1(X)\eqdef\lim_{\alpha\rightarrow 1} H_\alpha(X) = \mathbb{H}(X)\), the Shannon entropy;
  • \(H_2(X)=-\log\sum_{x}p_X(x)^2\) is called the collision entropy.
  • \(H_\infty(X)\eqdef\lim_{\alpha\rightarrow \infty} H_\alpha(X) = -\log \max_{x}p_X(x)\) is called the min-entropy.

Let \(X\in\calX\) and \(Y\in\calY\) be discrete random variables. Then,

  • For any \(\alpha,\beta\in\mathbb{R}^+\cup\{\infty\}\), if \(\alpha\leq\beta\) then \(H_\alpha(X)\geq H_{\beta}(X)\).
  • For any \(\alpha\in\mathbb{R}^+\cup\{\infty\}\) we have \(0\leq H_\alpha(X)\leq \log \card{\calX}\). The equality \(H_\alpha(X)=0\) hold if and only if \(X\) is constant. For \(\alpha\neq 0\), the equality \(H_\alpha(X)=\log\card{\calX}\) holds if and only if \(X\) is uniformly distributed.
  • For any \(\alpha\in\mathbb{R}^+\cup\{\infty\}\) we have \(H_\alpha(XY)\geq H_{\alpha}(X)\).

Conditional Renyi entropy

  • This is where it gets complicated, there are several options
    • \(H_\alpha^A(X|Y)\eqdef {\frac{\alpha}{1-\alpha}}\log{\left(\sum_yp_Y(y)\left(\sum_xp_{X|Y}(x|y)^\alpha\right)^{\frac{1}{\alpha}}\right)}\);
    • \(H_\alpha^H(X|Y)\eqdef \frac{1}{1-\alpha}\log \left(\sum_yp_{Y}(y)\sum_xp_{X|Y}(x|y)^\alpha\right)\).

Let \(X,Y,Z\) be three discrete random variables. Then,

  • \(\forall \alpha,\beta\in\mathbb{R}^+\cup\{\infty\}\) \(\alpha\leq\beta\Rightarrow H_\alpha^A(X|Y)\geq H^A_{\beta}(X|Y)\).
  • \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(0\leq H^A_\alpha(X|Y)\leq \log \card{\calX}\) with equality on the left-hand side if and only if \(X\) is a function of \(Y\); if \(\alpha\neq 0\), equality holds on the right hand side if and only if \(X\) is uniformly distributed and independent of \(Y\).
  • \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H_{\alpha}^A(XY|Z)\geq H_{\alpha}^A(X|Z)\).

Conditional Renyi entropy

Let \(X,Y,Z\) be three discrete random variables. Then, \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H^A_{\alpha}(X|YZ)\leq H^A_{\alpha}(X|Y)\). If \(\alpha\notin\{0,\infty\}\), equality holds if and only if \(X-Y-Z\) forms a Markov chain.

Let \(X,Y,Z\) be three discrete random variables such that \(X-Y-Z\) forms a Markov chain. Then, \(\forall\alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H_\alpha(X|Y)\leq H_\alpha(X|Z)\).

Let \(X,Y,Z\) be three discrete random variables. Then, \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\)

\[\begin{align*} H^A_{\alpha}(XY|Z)\geq H^A_\alpha(XY|Z)-H^A_0(Y|Z) \geq H^A_\alpha(X|Z)-H^A_0(Y|Z)\geq H^A_\alpha(X|Z)-\log\card{\calY}. \end{align*}\]