EFFICIENT CODING 729 



a TV set produces on an idle channel. Written text would look like 

 WPEIPTNKUH WFIOZ — : a random secjuence of letters. The statistics 

 of the message, in particular the correlations between the various sam- 

 ples, greatly reduce the number of sequences of given length which are 

 at all likely. As a result the information rate is less, and fewer bits per 

 second are required to describe the average message. 



A sequence of M binary digits can describe any of 2 possible mes- 

 sages. Conversely any of A^ messages can be described by logo A'^ binary 

 digits. The information rate, H, of a message source is therefore given by 



// = hm bits/symbol 



n— >oo 11/ 



where N = number of message sequences of length n. If the successive 

 symbols of the message are independent but not equiprobable, then a long 

 sequence will contain Xi symbols of type 1, X2 of type 2, etc. The number 

 of possible combinations of these symbols will be 



A^ = 



II ^i!' 



y 



so that log A^" = log n\ — ^ log Xj\ 



7 



For large enough n, all the Xj will be large also and we may write, by 

 Stirling's approximation 



log A^ -> log \/27rn -\- nlogn — n — ^ [log ^/2TXj + 



J 



But since Zl ^j = ^) and since for large n, Xj — » p(j)n where p{j) is the 

 probability of the j*'' symbol, we have 



log A^ -^ log V'27rn + n log n — n X log \/2TrXj — 



i 



n Z) vU) log p{j) - n log n + n 

 H, = hm !^^ = -E VU) log p{j) (5) 



n-»oo n J 



which is the expression Shannon derives more rigorously . //i is a maxi- 

 mum when all the p{j) are equal to \/ C. Then H\ = log-, f = Ho . The 

 more uneciual the p(j), i.e., the more peaked the probability distribution, 

 the smaller II y becomes. 



