ECE 6605 - Information Theory

Prof. Matthieu Bloch

Thursday, September 7, 2023

Announcements

  • Assignment 1
    • Deadline was extended by 48 hours
      • Soft deadline to claim 2% bonus: Saturday September 09, 2023
      • Hard deadline: Monday September 11, 2023
    • Submit your assigment on gradescope
  • Assignment 2 coming
  • Last time
    • Joint and conditional entropy
    • Chain rule of entropy
    • Mutual information
  • Today
    • Data processing inequality
    • Fano's inequality
    • Renyi entropy
    • Entropy rate

Mutual information

Let \(X, Y\) be two random variables with joint PMF \(P_{XY}\). The mutual information between \(X\) and \(Y\) is \[ \bbI(X;Y)\eqdef \bbH(X)-\bbH(X|Y)=\bbD(P_{XY}\Vert P_XP_Y) \eqdef \bbH(Y)-\bbH(Y|X). \]

If \(\Delta(\calX\to\calY)\) denotes the set of kernels from \(\calX\) to \(\calY\) then the function \[ I:\Delta(\calX)\times \Delta(\calX\to\calY):(P,W)\mapsto I(P;W)\eqdef \mathbb{D}(W\cdot P\Vert (W\circ P)P) \] is a concave function of \(P\) and a convex function of \(W\).

Conditional mutual information

Let \(X, Y, Z\) be discrete random variables with joint distribution \(p_{XYZ}\). The conditional mutual information between \(X\) and \(Y\) given \(Z\) is \(\mathbb{I}(X ; Y|Z) \eqdef \mathbb{H}(X|Z) - \mathbb{H}(X|YZ)\).

  • Using Baye's rule, \(\mathbb{I}\left(X;Y|Z\right) = \E[Z]{\mathbb{D}(P_{XY|Z}\Vert P_{X|Z}P_{Y|Z})}\eqdef \mathbb{D}(P_{XY|Z}\Vert P_{X|Z}P_{Y|Z}|P_Z)\).

\(\mathbb{I}(X;Y|Z)\geq 0\) if and only if \(X\) and \(Y\) are conditionally independent given \(Z\).

Let \(\bfX\eqdef\set{X_i}_{i=1}^n\), \(Y\), and \(Z\) be jointly distributed random variables. Then,

\[\begin{align*} \mathbb{I}(\bfX;Y|Z) = \sum_{i=1}^n\mathbb{I}(X_i;Y|Z\bfX^{1:i-1})\textsf{ with the convention }\bfX^{1:0}=\emptyset. \end{align*}\]

Let \(X\), \(Y\), and \(Z\) be discrete random variables such that \(X - Y - Z\). Then \(\mathbb{I}(X;Y) \geq \mathbb{I}(X;Z)\) or, equivalently, \(\mathbb{H}(X|Z) \geq \mathbb{H}(X|Y)\).

Fano's inequality

Let \(X\) be a discrete random variable with alphabet \({\calX}\). Let \(\widehat{X}\) be an estimate of \(X\), \(\widehat{X} \in {\calX}\) and with joint distribution \(p_{X\widehat{X}}\). We define the probability of estimation error: \(P_e \eqdef \mathbb{P}[X\neq\widehat{X}]\). Then, \(\mathbb{H}(X|\widehat{X}) \leq \mathbb{H}_b(P_e) + P_e \log(\card{\calX}-1)\).

  • Fano's inequality is deceptively simple! It relates:
    • an operational quantity, the probability of error;
    • an information thoeretic metric, the (conditional) entropy.

Renyi entropy

  • Weaker axiomatic construction of entropy H for \(P_X\in\Delta(\calX)\)
    1. \(H(X)\) only depends on the set of values \(\{p_X(x)\}_{x\in\calX}\) but not on their order.
    2. for \(X\sim B(p)\) with \(p\in[0;1]\), \(H(X)\) is a continuous function of \(p\).
    3. if \(X\sim B(1/2)\), then \(H(X)=1\).

Let \(X\in\calX\) be a discrete random variable. The Renyi entropy of order \(\alpha\in\mathbb{R}\cup\{\infty\}\) of \(X\) is

\[\begin{align*} H_{\alpha}(X)\eqdef \frac{1}{1-\alpha}\log \sum_{x\in\calX} p_X(x)^\alpha, \end{align*}\]

with the convention that \(H_1(X)=\lim_{\alpha\rightarrow 1} H_\alpha(X)\) and \(H_\infty(X)=\lim_{\alpha\rightarrow \infty} H_\alpha(X)\).

Properties of Renyi entropy

Let \(X\in\calX\) be a discrete random variable. Then,

  • \(H_0(X)=\log \card{\calX}\) is called the max-entropy
  • \(H_1(X)\eqdef\lim_{\alpha\rightarrow 1} H_\alpha(X) = \mathbb{H}(X)\), the Shannon entropy;
  • \(H_2(X)=-\log\sum_{x}p_X(x)^2\) is called the collision entropy.
  • \(H_\infty(X)\eqdef\lim_{\alpha\rightarrow \infty} H_\alpha(X) = -\log \max_{x}p_X(x)\) is called the min-entropy.

Let \(X\in\calX\) and \(Y\in\calY\) be discrete random variables. Then,

  • For any \(\alpha,\beta\in\mathbb{R}^+\cup\{\infty\}\), if \(\alpha\leq\beta\) then \(H_\alpha(X)\geq H_{\beta}(X)\).
  • For any \(\alpha\in\mathbb{R}^+\cup\{\infty\}\) we have \(0\leq H_\alpha(X)\leq \log \card{\calX}\). The equality \(H_\alpha(X)=0\) hold if and only if \(X\) is constant. For \(\alpha\neq 0\), the equality \(H_\alpha(X)=\log\card{\calX}\) holds if and only if \(X\) is uniformly distributed.
  • For any \(\alpha\in\mathbb{R}^+\cup\{\infty\}\) we have \(H_\alpha(XY)\geq H_{\alpha}(X)\).

Conditional Renyi entropy

  • This is where it gets complicated, there are several options
    • \(H_\alpha^A(X|Y)\eqdef {\frac{\alpha}{1-\alpha}}\log{\left(\sum_yp_Y(y)\left(\sum_xp_{X|Y}(x|y)^\alpha\right)^{\frac{1}{\alpha}}\right)}\);
    • \(H_\alpha^H(X|Y)\eqdef \frac{1}{1-\alpha}\log \left(\sum_yp_{Y}(y)\sum_xp_{X|Y}(x|y)^\alpha\right)\).

Let \(X,Y,Z\) be three discrete random variables. Then,

  • \(\forall \alpha,\beta\in\mathbb{R}^+\cup\{\infty\}\) \(\alpha\leq\beta\Rightarrow H_\alpha^A(X|Y)\geq H^A_{\beta}(X|Y)\).
  • \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(0\leq H^A_\alpha(X|Y)\leq \log \card{\calX}\) with equality on the left-hand side if and only if \(X\) is a function of \(Y\); if \(\alpha\neq 0\), equality holds on the right hand side if and only if \(X\) is uniformly distributed and independent of \(Y\).
  • \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H_{\alpha}^A(XY|Z)\geq H_{\alpha}^A(X|Z)\).

Conditional Renyi entropy

Let \(X,Y,Z\) be three discrete random variables. Then, \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H^A_{\alpha}(X|YZ)\leq H^A_{\alpha}(X|Y)\). If \(\alpha\notin\{0,\infty\}\), equality holds if and only if \(X-Y-Z\) forms a Markov chain.

Let \(X,Y,Z\) be three discrete random variables such that \(X-Y-Z\) forms a Markov chain. Then, \(\forall\alpha\in\mathbb{R}^+\cup\{\infty\}\) \(H_\alpha(X|Y)\leq H_\alpha(X|Z)\).

Let \(X,Y,Z\) be three discrete random variables. Then, \(\forall \alpha\in\mathbb{R}^+\cup\{\infty\}\)

\[\begin{align*} H^A_{\alpha}(XY|Z)\geq H^A_\alpha(XY|Z)-H^A_0(Y|Z) \geq H^A_\alpha(X|Z)-H^A_0(Y|Z)\geq H^A_\alpha(X|Z)-\log\card{\calY}. \end{align*}\]

Smooth Renyi entropy

  • For a probability distribution \(P\in\Delta(\calX)\) and \(\epsilon\geq 0\), the ball of radius \(\epsilon\) around \(P\) is \[ \calB^\epsilon(P) \eqdef \set{Q:\bbV(P,Q)\leq \epsilon} \]

Let \(X\in\calX\) be a discrete random variable and let \(\epsilon\geq 0\). The \(ε\)-smooth Renyi entropy of order \(\alpha\in\mathbb{R}\cup\{\infty\}\) of \(X\) is

\[\begin{align*} H^\epsilon_{\alpha}(X)\eqdef \frac{1}{1-\alpha}\inf_{Q\in\calB^\epsilon(P_X)}\log \sum_{x\in\calX} Q(x)^\alpha, \end{align*}\]

with the convention that \(H^\epsilon_1(X)=\lim_{\alpha\rightarrow 1} H^\epsilon_\alpha(X)\) and \(H^\epsilon_\infty(X)=\lim_{\alpha\rightarrow \infty} H^\epsilon_\alpha(X)\).

  • This may look very complicated

  • We will get back to this quantity later, it is actually quite natural in some cases