52 THE BELL SYSTEM TECHNICAL JOURNAL, JANUARY 1951 



The trigram entropy is given by 



F3 = - Z) p{hj, k) log2 pij{k) 



i,j.k 



= - E pihj, k) logo p{i,j, ^) + Z p{i,j) log2 p(i,j) (5) 



i,j,k i,j 



= 11.0 - 7.7 = 3.3 



In this calculation the trigram table^ used did not take into account tri- 

 grams bridging two words, such as WOW and OWO in TWO WORDS. To 

 compensate partially for this omission, corrected trigram probabilities p(i^ 

 j, k) were obtained from the probabilities p'{i, j, k) of the table by the folk - 

 ing rough formula: 



Pii,h *) = Jl P'^i'h *) + ^ '•(^)^O; *) + 4^ p{i,Mli) 



where r{i) is the probability of letter i as the terminal letter of a word and 

 s{k) is the probability of k as an initial letter. Thus the trigrams withi:^ 

 words (an average of 2.5 per word) are counted according to the table; the 

 bridging trigrams (one of each type per word) are counted approximately 

 by assuming independence of the terminal letter of one word and the initial 

 digram in the next or vice versa. Because of the approximations involved 

 here, and also because of the fact that the sampling error in identifying 

 probability with sample frequency is more serious, the value of F3 is less 

 reliable than the previous numbers. 



Since tables of .Y-gram frequencies were not available for N > 3, F4 , F^ , 

 etc. could not be calculated in the same way. However, word frequencies 

 have been tabulated^ and can be used to obtain a further approximation. 

 Figure 1 is a plot on log-log paper of the probabilities of words against 

 frequency rank. The most frequent English word "the" has a probability 

 .071 and this is plotted against 1. The next most frequent word ''of" has a 

 probability of .034 and is plotted against 2, etc. Using logarithmic scales 

 both for probability and rank, the curve is approximately a straight line 

 with slope — 1 ; thus, if pn is the probability of the nth most frequent word, 

 we have, roughly 



/.„ = -. (6) 



n 



Zipf* has pointed out that this type of formula, />„ = k/n, gives a rather good 

 approximation to the word probabilities in many different languages. The 



' G. Dewey, "Relative Frequency of English Speech Sounds," Harvard University 

 Press, 1923. 



" G. K. Zipf, "Human Behavior and the Principle of Least Effort," Addison-Wesley 

 Press, 1949. 



