MATHEMATICAL THEORY OF COMMUNICATION 387 



abilities pij{k) would be required. Continuing in this way one ob- 

 tains successively more complicated stochastic processes. In the 

 general w-gram case a set of w-gram probabilities piii, ^2 , • • • , in) 

 or of transition probabilities pi^ , tj .••• . i„-i(0 is required to 

 specify the statistical structure. 

 (D) Stochastic processes can also be defined which produce a text con- 

 sisting of a sequence of ' 'words. " Suppose there are five letters 

 A, B, C, D, E and 16 ''words" in the language with associated 

 probabilities : 



Suppose successive "words" are chosen independently and are 

 separated by a space. A typical message might be: 

 DAB EE A BEBE DEED DEB ADEE ADEE EE DEB BEBE 

 BEBE BEBE ADEE BED DEED DEED CEED ADEE A DEED 

 DEED BEBE CABED BEBE BED DAB DEED ADEB 

 If all the words are of finite length this process is equivalent to one 

 of the preceding type, but the description may be simpler in terms 

 of the word structure and probabilities. We may also generalize 

 here and introduce transition probabilities between words, etc. 

 These artificial languages are useful in constructing simple problems and 

 examples to illustrate various possibilities. We can also approximate to a 

 natural language by means of a series of simple artificial languages. The 

 zero-order approximation is obtained by choosing all letters with the same 

 probability and independently. The first-order approximation is obtained 

 by choosing successive letters independently but each letter having the 

 same probability that it does in the natural language.^ Thus, in the first- 

 order approximation to English, E is chosen with probabihty .12 (its fre- 

 quency in normal English) and W with probability .02, but there is no in- 

 fluence between adjacent letters and no tendency to form the preferred 

 digrams such as TH, ED, etc. In the second-order approximation, digram 

 structure is introduced. After a letter is chosen, the next one is chosen in 

 accordance with the frequencies with which the various letters follow the 

 first one. This requires a table of digram frequencies Pi(;)- In the third- 

 order approximation, trigram structure is introduced. Each letter is chosen 

 with probabilities which depend on the preceding two letters. 



^ Letter, digram and trigram frequencies are given in "Secret and Urgent" by Fletcher 

 Pratt, Blue Ribbon Books 1939. Word frequencies are tabulated in "Relative Frequency 

 of English Speech Sounds," G. Dewey, Harvard University Press, 1923. 



