404 BELL SYSTEM TECHNICAL JOURNAL 



matching of probabilities to corresponding lengths of sequences. With a 

 good code the logarithm of the reciprocal probability of a long message 

 must be proportional to the duration of the corresponding signal, in fact 



log P"' _ r 



must be small for all but a small fraction of the long messages. 



If a source can produce only one particular message its entropy is zero, 

 and no channel is required. For example, a computing machine set up to 

 calculate the successive digits of tt produces a definite sequence with no 

 chance element. No channel is required to ''transmit" this to another 

 point. One could construct a second machine to compute the same sequence 

 at the point. However, this may be impractical. In such a case we can 

 choose to ignore some or all of the statistical knowledge we have of the 

 source. We might consider the digits of tt to be a random sequence in that 

 we construct a system capable of sending any sequence of digits. In a 

 similar way we may choose to use some of our statistical knowledge of Eng- 

 lish in constructing a code, but not all of it. In such a case we consider the 

 source with the maximum entropy subject to the statistical conditions we 

 wish to retain. The entropy of this source determines the channel capacity 

 which is necessary and sufficient. In the tt example the only information 

 retained is that all the digits are chosen from the set 0, 1, . . ., 9. In the 

 case of English one might wish to use the statistical saving possible due to 

 letter frequencies, but nothing else. The maximum entropy source is then 

 the first approximation to English and its entropy determines the required 

 channel capacity. 



11. Examples 



As a simple example of some of these results consider a source which 

 produces a sequence of letters chosen from among ^1, 5, C, D with prob- 

 abilities J, J, J, I, successive symbols being chosen independently. We 

 have 



^ = -(ilogi+ilogi + f logi) 



= \ bits per symbol . 



Thus we can approximate a coding system to encode messages from this 

 source into binary digits with an average of \ binary digit per symbol. 

 In this case we can actually achieve the limiting value by the following code 

 (obtained by the method of the second proof of Theorem 9) : 



