12 Henry Quastler 



These are the three Fano codes with five words; all other codes can be 

 reduced to one of these three by rearranging the names of the events. All three 

 codes are confusion-proof. In decoding, one retraces the steps of encoding; the 

 code book shows, unequivocally, whether any symbol in a given sequence is a 

 terminal one, or whether it does not yet identify a single category. For instance, 

 suppose code (c) has been used, and the message received was: 



000000101. 



The first zero indicates 'B or C or D or E'; the second, 'C or D or E'; the 

 third, 'D or E'; the fourth designates 'E', unequivocally, and is a terminal 

 symbol. We mark off the four zeros and proceed. The first symbol of the second 

 code group is a zero, as is the second; this indicated 'C or D or E'; the next 

 symbol is a one, which is a terminal symbol and designates 'C. The remaining 

 code group, '01', means 'B', and the whole message is decoded unequivocally: 

 'E C B'. 



Code (b) has the minimum bulk, or lowest average number of digits per 

 word (2.4 digits, against 2.6 digits for code (a) and 2.8 for code (c)). The rule to 

 obtain the minimum bulk code with any number of categories is as follows: 

 all divisions and subdivisions must be between groups of categories of as nearly 

 as possible equal sizes. To find the word length in this code, detemiine the 

 largest integer k compatible with the condition that 



2^ ^r< 2*^+1 



{k ^n <k + \). 



Then, using equipartition as nearly as possible, each word will be of length k 

 ov k -\- 1, and the average number of binary symbols per category encoded will 

 be somewhat larger than logg r. In the example just given: 



r = 5, loga r = 2.33 



k = 2, k -{- I = 3, average length of word = 2.4. 



The worst discrepancy between log2 r and average word length occurs for r — 3. 

 We have: 



Category A 1 



Category B 1 ; 



Category C 



logs 3 = 1.58 



A' = 1, A' + 1 = 2, average length of word = 1.67 symbol 



excess digits per word = 1.67 — 1.58 = 0.09 or 5.7 per cent 



of 1.58 



Groups of Events — The excess of average word length over logo r is due to 

 some partition (especially an early one) dividing the set of categories into 



