PREDICTION AND ENTROPY OE PRINTED ENGLISH 



55 



Of a total of 129 letters, 89 or 69% were guessed correctly. The errors, as 

 would be expected, occur most frequently at the beginning of words and 

 syllables where the line of thought has more possibility of branching out. It 

 might be thought that the second line in (8), which we will call the reduced 

 text, contains much less information than the first. Actually, both lines con- 

 tain the same information in the sense that it is possible, at least in prin- 

 ciple, to recover the first line from the second. To accomplish this we need 

 an identical twin of the individual who produced the sequence. The twin 

 (who must be mathematically, not just biologically identical) will respond in 

 the same way when faced with the same problem. Suppose, now, we have 

 only the reduced text of (8). We ask the twin to guess the passage. At each 

 point we will know whether his guess is correct, since he is guessing the same 

 as the first twin and the presence of a dash in the reduced text corresponds 

 to a correct guess. The letters he guesses wrong are also available, so that at 

 each stage he can be supplied with precisely the same information the first 

 twin had available. 



Fig. 2 — Communication system using reduced text. 



The need for an identical twin in this conceptual experiment can be 

 eliminated as follows. In general, good prediction does not require knowl- 

 edge of more than N preceding lehers of text, with TV fairly small. There are 

 only a finite number of possible sequences of N letters. We could ask the 

 subject to guess the next letter for each of these possible iV-grams. The com- 

 plete list of these predictions could then be used both for obtaining the 

 reduced text from the original and for the inverse reconstruction process. 



To put this another way, the reduced text can be considered to be an 

 encoded form of the original, the result of passing the original text through 

 a reversible transducer. In fact, a communication system could be con- 

 structed in which only the reduced text is transmitted from one point to 

 the other. This could be set up as shown in Fig. 2, with two identical pre- 

 diction devices. 



An extension of the above experiment yields further information con- 

 cerning the predictability of English. As before, the subject knows the text 

 up to the current point and is asked to guess the next letter. If he is wrong, 

 he is told so and asked to guess again. This is continued until he finds the 

 correct letter. A typical result with this experiment is shown below. The 



