386 



BELL SYSTEM TECHNICAL JOURNAL 



(B) Using the same five letters let the probabilities be .4, .1, .2, .2, .1 

 respectively, with successive choices independent. A typical 

 message from this source is then: 

 AAACDCBDCEAADADACEDA 



EAD C ABED ADD CECAAAAAD 



(C) A more complicated structure is obtained if successive symbols are 

 not chosen independently but their probabilities depend on preced- 

 ing letters. In the simplest case of this type a choice depends only 

 on the preceding letter and not on ones before that. The statistical 

 structure can then be described by a set of transition probabilities 

 piij), the probability that letter i is followed by letter j. The in- 

 dices i and j range over all the possible symbols. A second equiv- 

 alent way of specifying the structure is to give the "digram" prob- 

 abilities p(i,j), i.e., the relative frequency of the digram i j. The 

 letter frequencies p{i), (the probability of letter i), the transition 

 probabilities pi{j) and the digram probabilities p{i,j) are related by 

 the following formulas. 



p(i) = Z pa, j) = Z pij, ^) = Z pu)pjii) 



3 3 3 



p{i,j) = p(i)pi{j) 



Z pi(j) = Z pi^) = Z pih i) = 1. 



As a specific example suppose there are three letters A, B, C with the prob- 

 ability tables: 



p{{) 



^T 





2 



A typical message from this source is the following: 

 ABBABABABABABABBBABBBBBAB 

 ABABABABBBACACABBABBBBABB 

 ABACBBBABA 



The next increase in complexity would involve trigram frequencies 

 but no more. The choice of a letter would depend on the preceding 

 two letters but not on the message before that point. A set of tri- 

 gram frequencies />(i, j, k) or equivalently a set of transition prob- 



