| |||||||||
In probability theory, the mutual information between two random variables X and Y is given by
where P(X) and P(Y) are the probability distributions of X and Y.
If X and Y are independent, then I(X,Y) = 0, since P(X,Y) = P(X) P(Y) in that case.
Mutual information is symmetric: I(X,Y) = I(Y,X).
Mutual information is nonnegative: I(X,Y) ≥ 0.
The mutual information can be equivalently expressed as
where H(X) and H(X|Y) are the unconditional entropy and conditional entropy of X, likewise H(Y) and H(Y|X) are the unconditional and conditional entropy of Y, with
and
Since H(X) > H(X|Y), this proves the nonnegativity property stated above.
Mutual information can also be expressed in terms of the Kullback-Leibler divergence. Note that
Thus mutual information can be understood as a weighted Kullback-Leibler divergence: the more different the distributions P(X) and P(X|Y), the greater the information gain.