14 Henry Quastler 



each category will have exactly r^- binary digits. This is illustrated in the following 

 example: 



32/32 = 1 



The first step separates 'A' (p = 1/2) from all other categories (aggregate 

 probability = 1/2); the second separates 'B or C (aggregate p = 1/4) from 

 'D or E or F or G' (aggregate /? = 1/8 + 1/16 + 1/32 + 1/32 = 1/4); the 

 third separates 'B' from 'C (p, 1/8 each) and 'D' (/j = 1/8) from 'E or F or G' 

 (aggregate /7 = 1/8), etc. 



The average number of digits per code word is the sum of the z/s, weighted 

 by the probabilities />(/) ; in our example: 



2p{i) • z, = 1/2 + 3/8 + 3/8 + 3/8 + 4/16 + 5/32 + 5/32 = 70/32 = 2.19 



i 



From p{i) = (1/2)^ 



we get : logg p(/) = z, • logg ( 1 /2) 



and, because: logg (1/2) = —1 



we have: z^ = —loga /?(/)• 



We get (for/?(/)'s which are integral powers of 1/2!) the following result: 

 Average number of binary symbols per event = —^p(i) logg /)(/). 



i 



We will check this result for the case of equiprobable categories. For 

 r categories, the probabihty of every one will be 1/r; so: 



-lp(i) log2/'(0 = -r--- log2 - = log2 r 



i r r 



This is the expression previously obtained for equiprobable categories. 



Any Probabilities — What if probabihties are not limited to the values 1/2, 

 1/4, 1/8, etc. ? In this case, it will — in general — not be possible to make divisions 

 into exactly equiprobable groups. We would suspect that in this case the 

 coding will be less than optimally efficient; accordingly, the average length 

 of a code word will be somewhat higher than —^p(i) logg /?(/). The approxi- 



mation is usually not bad. This is illustrated in the following example which 

 shows the construction of a binary code for the letters of the English alphabet, 

 taking into account their relative frequencies. As expected, it turns out that 



