m BELL SYSTEM TECHNICAL JOURNAL 



3. The Series of Approximations to English 



To give a visual idea of how this series of processes approaches a language, 

 typical sequences in the approximations to English have been constructed 

 and are given below. In all cases we have assumed a 27-symbol "alphabet," 

 the 26 letters and a space. 



1. Zero-order approximation (symbols independent and equi-probable). 



XFOML RXKHRJFFJUJ ZLPWCFWKCYJ 

 FFJEYVKCQSGXYD QPAAMKBZAACIBZLHJQD 



2. First-order approximation (symbols independent but with frequencies 

 of English text). 



OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI 

 ALHENHTTPA OOBTTVA NAH BRL 



3. Second-order approximation (digram structure as in English). 



ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY 

 ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO 

 TIZIN ANDY TOBE SEACE CTISBE 



4. Third-order approximation (trigram structure as in English). 



IN NO 1ST LAT WHEY CRATICT FROURE BIRS GROCID 

 PONDENOME OF DEMONSTURES OF THE REPTAGIN IS 

 REGOACTIONA OF CRE 



5. First-Order Word Approximation. Rather than continue with tetra- 

 gram, • • • , w-gram structure it is easier and better to jump at this 

 point to word units. Here words are chosen independently but with 

 their appropriate frequencies. 



REPRESENTING AND SPEEDILY IS AN GOOD APT OR 

 COME CAN DIFFERENT NATURAL HERE HE THE A IN 

 CAME THE TO OF TO EXPERT GRAY COME TO FUR- 

 NISHES THE LINE MESSAGE HAD BE THESE. 



6. Second-Order Word Approximation. The word transition probabil- 

 ities are correct but no further structure is included. 



THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH 



WRITER THAT THE CHARACTER OF THIS POINT IS 



THEREFORE ANOTHER METHOD FOR THE LETTERS 



THAT THE TIME OF WHO EVER TOLD THE PROBLEM 



FOR AN UNEXPECTED 



The resemblance to ordinary EngHsh text increases quite noticeably at 



each of the above steps. Note that these samples have reasonably good 



structure out to about twice the range that is taken into account in their 



construction. Thus in (3) the statistical process insures reasonable text 



for two-letter sequence, but four-letter sequences from the sample can 



usually be fitted into good sentences. In (6) sequences of four or more 



