58 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1951 



advantage. Since the frequency tables are determined from long samples of 

 English, these two columns are subject to less sampling error than the others. 



It will be seen that the prediction gradually improves, apart from some 

 statistical fluctuation, with increasing knowledge of the past as indicated 

 by the larger numbers of correct first guesses and the smaller numbers of 

 high rank guesses. 



One experiment was carried out with "reverse" prediction, in which the 

 subject guessed the letter preceding those already known. Although the 

 task is subjectively much more difficult, the scores were only slightly poorer. 

 Thus, with two 101 letter samples from the same source, the subject ob- 

 tained the following results: 



No. of guess 1 



Forward 70 



Reverse 66 



Incidentally, the TV-gram entropy Fn for a reversed language is equal to 

 that for the forward language as may be seen from the second form in equa- 

 tion (1). Both terms have the same value in the forward and reversed cases. 



4. Ideal TV-Gram Prediction 



The data of Table I can be used to obtain upper and lower bounds to the 

 A^-gram entropies Fn . In order to do this, it is necessary first to develop 

 some general results concerning the best possible prediction of a language 

 when the preceding N letters are known. There will be for the language a set 

 of conditional probabilities ^ij , i2 , * • • , t>r_i 0"). This is the probability when 

 the (A^-l) gram ii , ^2 , • • • , is-i occurs that the next letter will be j. The 

 best guess for the next letter, when this (A-1) gram is known to have oc- 

 curred, will be that letter having the highest conditional probability. The 

 second guess should be that with the second highest probability, etc. A 

 machine or person guessing in the best way would guess letters in the order 

 of decreasing conditional probability. Thus the process of reducing a text 

 with such an ideal predictor consists of a mapping of the letters into the 

 numbers from 1 to 27 in such a way that the most probable next letter 

 [conditional on the known preceding (A-1) gram] is mapped into 1, etc. 

 The frequency of I's in the reduced text will then be given by 



qi = Xp{ii,i2, ••• ,iN-i,j) (10) 



where the sum is taken overall (A-1) grams ii , 12 , • • • , iw-i the 7 being the 

 one which maximizes p for that particular (A-1) gram. Similarly, the fre- 

 quency of 2's, qi , is given by the same formula with j chosen to be that 

 letter having the second highest value of p, etc. 

 On the basis of A-grams, a different set of probabilities for the symbols 



