60 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1951 



The ideal iV-gram predictor can be considered, as has been pointed out, to 

 be a transducer which operates on the language translating it into a sequence 

 of numbers running from 1 to 27. As such it has the following two properties: 



1. The output symbol is a function of the present input (the predicted 

 next letter when we think of it as a predicting device) and the preced- 

 ing (iV-1) letters. 



2. It is instantaneously reversible. The original input can be recovered by 

 a suitable operation on the reduced text without loss of time. In fact, 

 the inverse operation also operates on only the (A^-1) preceding sym- 

 bols of the reduced text together with the present output. 



The above proof that the frequencies of output symbols with an N-1 

 gram predictor satisfy the inequalities: 



tq'!>tqr' 5 = 1, 2, • • • , 27 (14) 



1 1 



can be applied to any transducer having the two properties listed above. 

 In fact we can imagine again an array with the various (N-l) grams listed 

 vertically and the present input letter horizontally. Since the present output 

 is a function of only these quantities there will be a definite output symbol 

 which may be entered at the corresponding intersection of row and column. 

 Furthermore, the instantaneous reversibility requires that no two entries 

 in the same row be the same. Otherwise, there would be ambiguity between 

 the two or more possible present input letters when reversing the transla- 

 tion. The total probability of the S most probable symbols in the output, 



8 



say z2^i , will be the sum of the probabilities for 5 entries in each row, summed 



1 



over the rows, and consequently is certainly not greater than the sum of the 

 S largest entries in each row. Thus we will have 



t,q''i>'Eri 5= 1,2, •••,27 (15) 



1 1 



In other words ideal prediction as defined above enjoys a preferred position 

 among all translating operations that may be applied to a language and 

 which satisfy the two properties above. Roughly speaking, ideal prediction 

 collapses the probabilities of various symbols to a small group more than 

 any other translating operation involving the same number of letters which 

 is instantaneously reversible. 



Sets of numbers satisfying the inequalities (15) have been studied by 

 Muirhead in connection with the theory of algebraic inequalities.^ If (15) 

 holds when the qi and r,- are arranged in decreasing order of magnitude, and 



•* Hardy, Littlewood and Polya, "Incfiualities," Cambridge University Press, 1934. 



