A Primer on Information Theory 17 



Even with more pronounced unbalance of frequencies, tiie minimum value 

 of binary digits per word is soon approximated. For/)(A) —■ .89 and /;(B) = .11, 

 the limiting value is .50. In single-event-code, one needs one digit per event; 

 for two-event-sequences, .66 digits; for three-event-sequences, .55; and for 

 four-event-sequences, .52. 



We have begun our discussion of binary representation with the case of 

 2, 4, 8, 16, ... , equiprobable categories. We then generalized to cases with 

 any number of categories, and proceeded from the representation of single events 

 to clusters of events. Next, we introduced unequal probabilities, of value 

 1/2, 1/4, 1/8, ... . Finally, we dropped all restrictions. We can now state, 

 with full generality: 



If a real situation is categorized into r categories, with associated proba- 

 bilities p(i), (where / = 1, 2, . . . , r), then it is possible to represent each 



r 



event with an average of no more than — 2 p(i) log2 pii) binary symbols. 



i = l 



Representation Theorem — In general, the closer we v/ant to approximate 

 the minimum bulk of representation, the larger the groups of sequences which 

 must be encoded. This entails the following penalties : 



1. There will be a delay in waiting for a whole group of events to occur 

 or to be registered, and 



2. The encoding and decoding procedures, and the code book itself, will 

 become the more elaborate the larger the groups coded. 



It is obvious that the code which is most economical in terms of bulk of 

 representation is not necessarily optimum in over-all performance. There 

 will be cases where it might be worthwhile to sacrifice economy in word length 

 for ease in decoding. If the reader will work through exercise 4, then he surely 

 will appreciate this possibility. Whether or not minimum bulk of coding is 

 favorable, in a given case, cannot be derived from informational analysis. 

 What information theory does is to establish a limiting value of the number 

 of symbols, of a given kind, which are needed to represent the information in 

 a given factual situation; in some cases, like those here discussed, information 

 theory will also show how such coding economy can be achieved; but it can 

 never prescribe that this is what should be done. 



It would be quite legitimate to inquire, at this point, why we have gone to 

 so much trouble to find out how to achieve binary representation with minimum 

 bulk ? Is not the result of doubtful value, in view of the fact that a tolerable 

 approximation to minimum bulk can usually be achieved with the simplest 

 means, and that a close approximation often entails prohibitive costs in encoding 

 and decoding? The answer is this: by establishing the minimum length of 

 code words in standard binary representation, we have implicitly established 

 a general condition of representability : 



If an event can be represented by (on the average) n binary digits, then it 

 can symbolically represent, or be represented by, any other event that can 

 also be coded into n binary digits. 



This can be immediately generalized to groups of events: Let 5"^. and Sy be 

 the number of real and symbolic events in a group, and n^. and «„ the average 



