A Primer on Information Theory 21 



The function here needed has already been derived as the condition of 

 representabiHty. If two situations can be made to represent each other, then 

 information on one can aboHsh uncertainty concerning the other. Thus, 

 mutual representabiHty implies equal information content, and representation 

 in the standard binary system yields a general measure of information content. 

 This measure is the 'amount of selective information' as defined by Shannon 

 and Wiener (4, 5). It is expressed as follows: 



Let X be a classification with categories i and associated probabilities 

 p{i); then the information content oj x is designated H(x) and given by*: 



H(x)^ -2 p(i) logo p(i) 



i 



The units of this function are the binary digits needed for representation 

 of a given event, and are called bits. It must be remembered that the 'bit' is 

 a technical unit of amount of information and not a small piece of information. 

 A single chunk of information may contain many bits or a fraction of a bit. 



Some Properties of the Shannon- Wiener Information Function 



The Shannon-Wiener information function has been derived (admittedly, in 

 a loose fashion) from a consideration of standard representation of information. 

 We will now consider a number of its properties and see that they correspond 

 losely to the behavior which one would intuitively expect from a good 

 measure of information. 



(1) Independence — Let / be one of the possible categories of an event x, 

 p{i) the associated probability, and F{i) the contribution of the /th category 

 to the uncertainty. It is desirable that F{i) be a function of and only of p{i). 

 The function / ^ 



F{i)^ -pii)\og^p(i) 



fulfills this requirement. / 



(2) Continuity — A small change of /;(/) should result in a small change in 

 F(i); in other words, F(i) should be a continuous function of p(i). The function 

 p{i) log2 p(i) is continuous. 



/(3) Additivity — It is desirable that the total information derived from two 

 dependent sources should be the sum of the individual information; in other 



* The information function looks (except for a scale factor) like Boltzmann's entropy- 

 function; this is not a mere coincidence. The physical entropy is the amount of uncertainty 

 associated with a state of a system, provided all states which are physically distinguishable are 

 considered as different, that is, if the categorization is taken with the finest grain possible. 

 In most situations dealt with in information theory, large numbers of states which are physically 

 distinguishable are lumped into equivalent classes. The category "one light on the steeple" is 

 a good example; an enormous number of physically distinct states are compatible with this 

 definition, but they are all lumped into one class. The distinctions upon which categorizations 

 are based are usually a very small percentage of the distinctions one could make. Thus, 

 physical entropy is an upper bound of the information functions which can be associated with a 

 given situation, but it is a very high upper bound, usually very far from the actual value. For 

 this reason, I prefer not to use the word 'entropy' as synonymous with 'information'. 



A very thorough discussion of the relation between information and entropy has been given 

 by Brillouin (9). 



