PREDICTION AND ENTROPY OF PRINTED ENGLISH 59 



in the reduced text, gf "^^ , q2'^^ , . • . , ^2?"^ , would normally result. Since this 

 prediction is on the basis of a greater knowledge of the past, one would ex- 

 pect the probabilities of low numbers to be greater, and in fact one can 

 prove the following inequalities: 



tgr>tq': 5 = 1,2,.... (11) 



This means that the probability of being right in the first S guesses when 

 the preceding N letters are known is greater than or equal to that when 

 only (iY-1) are known, for all S. To prove this, imagine the probabilities 

 pdi , ^2, ' ' ' , iN , j) arranged in a table with j running horizontally and all 

 the iV-grams vertically. The table will therefore have 27 columns and 27^ 

 rows. The term on the left of (11) is the sum of the S largest entries in each 

 row, summed over all the rows. The right-hand member of (11) is also a sum 

 of entries from this table in which 5 entries are taken from each row but not 

 necessarily the S largest. This follows from the fact that the right-hand 

 member would be calculated from a similar table with (N-l) grams rather 

 than .¥-grams listed vertically. Each row in the N-1 gram table is the sum 

 of 27 rows of the i\^-gram table, since: 



27 



pik, h, • • • , iifj) = J2 piii ,^2, " , (nJ). (12) 



The sum of the S largest entries in a row of the N-1 gram table will equal 

 the sum of the 276* selected entries from the corresponding 27 rows of the 

 X-gram table only if the latter fall into S columns. For the equality in (11) 

 to hold for a particular S, this must be true of every row of the N-1 gram 

 table. In this case, the first letter of the .Y-gram does not affect the set of the 

 5 most probable choices for the next letter, although the ordering within 

 the set may be affected. However, if the equality in (11) holds for all S, it 

 follows that the ordering as well will be unaffected by the first letter of the 

 Y-gram. The reduced text obtained from an ideal .Y-1 gram predictor is then 

 identical with that obtained from an ideal .¥-gram predictor. 



Since the partial sums 



ea = i:?r 5 = 1,2,..- (13) 



are monotonic increasing functions of N, < 1 for all N, they must all ap- 

 proach limits as .Y -^ oc . Their first differences must therefore approach 

 limits as N -^ x> ^ i.e., the gf approach limits, q'? . These may be interpreted 

 as the relative frequency of correct first, second, • • • , guesses with knowl- 

 ,. edge of the entire (infinite) past history of the text. 



b 



