CMI, A Universal Translator Built by the SIT-ian Race

March 9, 2012

CMI, A Universal Translator Built by the SIT-ian Race

Filed under: Uncategorized — rrtucci @ 6:13 am

I am currently trying very hard to write a long, crazy paper on quantum SIT (Shannon Information Theory). So I’m thinking a lot about SIT these days. Here is a short “song” about the subject.

Sometimes two parties cannot understand each other without the help of an intermediary.

When the two parties involved are made of people, we human beings have invented clever devices and human professions whose goal is to intervene, and mediate a transaction between the two parties. For example,

devices – Rosetta stone, translation dictionaries, Google’s Translate, DARPA’s TRANSTAC, Universal Translator in “Star Trek” (and many other sci-fi stories), C3PO in “Star Wars”, dog2human translation devices in “Up”…
people – nearly simultaneous, human translators and interpreters of languages, sign language interpreters, marriage brokers and matchmakers, conflict mediators, divorce lawyers, judges…

When the two parties involved are objects instead of people, we often come up with devices to mediate transactions (e.g, electronic transistors, fluid valves, bridges,…)

Star Trek's Universal Translator (looks like a cordless microphone)

Since SIT (Shannon Information Theory) is an attempt to model in a very general way the communication of information between several parties, it’s not surprising that SIT has its own universal translator. Indeed, in SIT one defines a “universal translator function” called the CMI (Conditional Mutual Information), pronounced by me as “see-me”. CMI, denoted here by $H(\underline{a}:\underline{b}|\underline{c})$ , is a real valued function with 3 slots filled by the random variables $\underline{a}, \underline{b}, \underline{c}$ . (In what follows, we will indicate random variables by underlining.) In $H(\underline{a}:\underline{b}|\underline{c})$ , $\underline{a}, \underline{b}$ are the two parties that are having trouble understanding each other and $\underline{c}$ is the mediator.

In classical SIT, one defines

the entropy (i.e., the variance or spread) of $\underline{a}$ by
$H(\underline{a}) = \sum_a P(a) \log \frac{1}{P(a)}$ ,
the conditional spread (of $\underline{a}$ given $\underline{b}$ ) by
$H(\underline{a} |\underline{b}) = \sum_{a,b} P(a,b) \log \frac{1}{P(a|b)}$ ,
the mutual information (MI) (i.e., the correlation) between $\underline{a}$ and $\underline{b}$ by
$H(\underline{a}:\underline{b}) = \sum_{a,b} P(a,b) \log \frac{P(a,b)}{P(a)P(b)}$ ,
the CMI by
$H(\underline{a}:\underline{b}|\underline{c}) = \sum_{a,b,c} P(a,b,c) \log \frac{P(a,b|c)}{P(a|c)P(b|c)}$ .

But note that $H(\underline{a})$ , $H(\underline{a}|\underline{b})$ and $H(\underline{a}:\underline{b})$ can all be expressed in terms of the CMI. Indeed,

$H(\underline{a}) = H(\underline{a}:\underline{a}|empty)$

$H(\underline{a}|\underline{b}) = H(\underline{a}:\underline{a}|\underline{b})$

$H(\underline{a}:\underline{b}) = H(\underline{a}:\underline{b}|empty)$

So the spread, conditional spread and MI are all just special cases of CMI.

CMI has many nice properties, by which I mean that it satisfies some cool equalities (a.k.a. identities) and inequalities.

One identity that I find really neat is

$H(\underline{a}|\underline{c}) = H(\underline{a}:\underline{x}|\underline{c}) + H(\underline{a}|\underline{x},\underline{c})$ .

I usually remember this one by the mnemonic $:\underline{x} + |\underline{x} = 1$ . Basically, this identity is saying to me that the spread of $\underline{a}$ can be decomposed into a correlation (of $\underline{a}$ and $\underline{x}$ ) plus a conditional spread (of $\underline{a}$ given $\underline{x}$ ).

Another cool identity satisfied by CMI is the chain rule, which says basically that the “:” operator behaves sort of like a differential operator.

CMI also satisfies a slew of cool inequalities; for instance, CMI >= 0, the data processing inequalities, Fano’s inequality, and subadditivity (a.k.a. the independence bound).

This blog post is just a quick introduction to CMI. For a more complete discussion of CMI, you might consult the SIT textbooks mentioned below.

Almost every statement in both Cover&Thomas’ book and Papa the Camel’s book, two great books on classical SIT, involves CMI (or one of the aforementioned special cases of CMI). Thus, CMI is pretty universal!

So far we have discussed classical CMI. One can also define a quantum version of CMI. The quantum version of CMI again shows up almost everywhere in Mark Wilde’s very complete book on quantum SIT. (By the way, Mark’s book is excellent and free.)

Quantum CMI can even be used to define quantum entanglement.

There are, however, correlations between 4 or more random variables that cannot be expressed in terms of CMI. (or can they be?) So the fact that most of the classical and quantum SIT books talk almost exclusively about CMI does not mean that CMI is the most general SIT function possible, and that all others can be expressed in terms of CMI. It just means that most of our current SIT-ing knowledge involves correlations of <= 3 random variables, because that is the lowest fruit in the tree of SIT. So there are limits to the universality of CMI. Our universal translator is not perfect, but it’s the best one we have on board the starship Enterprise.

Are you a CMI user, a.k.a. a CMI-an? (pronounced the same way as the word “simian”)

Quantum Bayesian Networks

March 9, 2012