44 Information Storage and Neural Control 



Zipf argued that, while the speaker's effort (cost) would be 

 least if only one word were used, this situation does not persist 

 because the listener's decoding" efforts would be too great. Infor- 

 mation theory allows us to put this notion on a sounder base. 

 You have seen that a message which is always sent can convey 

 no information, and that the larger the vocabulary, or set of 

 alternatives, from which messages are selected, the greater is the 

 information which they convey. On the other hand, the process 

 of deciding which message is next to be sent is also more difficult 

 when the set of messages is larger. Mandelbrot has proposed that 

 the balance of these two factors may be conceived as the basis for 

 word statistics, and in this we see the similarities with Zipf. How- 

 ever, Mandelbrot has employed a specific definition of infoimation, 

 and has rigorously defined the probleni. Let us examine the main 

 features of his derivation for a problem which is formally identical 

 with the one stated above: Given a fixed average cost per word, 

 what will be the frequency distribution of the words to give inaxi- 

 mum information per word? 



Let Cr be the cost of the r-th most frequent word, which occurs 

 with probability p^. Average cost per word, C, is then 



C = X VrCr. 

 r 



Also, 



r 



must hold. The problem is then to maximize 



H = -IZ Vr log Pr 



T 



subject to the above conditions. 

 When this is done, it is found that 



Vr = AAI 



where A, B, and M are constants which have interpretations in 

 terms of the coding process. One further step is needed in order 

 to complete the derivation, and this involves relating C,- and rank, r. 

 Mandelbrot has managed to show that if words are coded 

 "optimally," the resulting word statistics will be correct. Suppose 

 that words are coded from' some elementary units. These units 



