A Primer on Information Theory 



29 



Expanding the logarithm gives 



HAy) = -IpUj') iog2/X^y) + Ip(ij') Iog2/X0- 



ij io 



Noting that 



lpiJJ)=p(i) 



3 



we get 



H/y) = -IpiiJ) loga /?(/,;■) + lp(i) loga /?('■). 



ij i 



We have seen that the first term on the right side is H(x, y) and the second 

 -H{x). So: 



H,(y) = H(x, y) - H{x) and H(x, y) = H(x) + H^y) 

 A parallel development shows that 



H(x, y) = H(y) + H,{x) 



This relation is quite obvious if put into words: the joint uncertainty con- 

 cerning two variables is equal to the sum of the uncertainty concerning either 

 one variable plus the conditional uncertainty concerning the second variable if 

 the first one is given. 



Fig. 4. The relation between information functions shown graphically 



The difference in uncertainty concerning )', depending on whether or not x 

 is known, 



H{y) - Hly\ 



is the gfl/rt in certainty about y derived from observing x. Substituting for 

 ^^rCj')' weget: 



H{y) - Ely) = H{y) + H{x) - H{x, y) 



The expression on the right side is the defining equation for T{x\y): 



H(y) + H{x) - H{x, y) - T{x; y). 

 It follows from this derivation that Tis a symmetrical function: 



r(x; jO = rO-; x) = H{x) - H,{x) - H{y) - HJy) 

 and it becomes clear why Tis a measure of the mutual reduction of uncertainty. 

 The relations between the six information functions, H(x), H{v), H(x, v), H^(y), 

 Hy(x) and T(x;y), can be demonstrated graphically as in Fig. 4. 



