694 BELL SYSTEM TECHNICAL JOURNAL 



1. We assumed for the random cipher that the possible decipherments 

 of a cryptogram are a random selection from the possible messages. While 

 not strictly true in ordinary systems, this becomes more nearly the case as 

 the complexity of the enciphering operations and of the language structure 

 increases. With a transposition cipher it is clear that letter frequencies 

 are preserved under decipherment operations. This means that the possible 

 decipherments are chosen from a more limited group, not the entire message 

 space, and the formula should be changed. In place of Rq one uses Ri the 

 entropy rate for a language with independent letters but with the regular 

 letter frequencies. In some other cases a definite tendency toward returning 

 the decipherments to high probability messages can be seen. If there is no 

 clear tendency of this sort, and the system is fairly complicated, then it is 

 reasonable to use the random cipher analysis. 



2. In many cases the complete key is not used in enciphering short mes- 

 sages. For example, in a simple substitution, only fairly long messages 

 will contain all letters of the alphabet and thus involve the complete key. 

 Obviously the random assumption does not hold for small ^V in such a case, 

 since all the keys which differ only in the letters not yet appearing in the 

 cryptogram lead back to the same message and are not randomly distrib- 

 uted. This error is easily corrected to a good approximation by the use of 

 a "key appearance characteristic." One uses, at a particular N, the effective 

 amount of key that may be expected with that length of cryptogram. 

 For most ciphers, this is easily estimated. 



3. There are certain "end effects" due to the definite starting of the 

 message which produce a discrepancy from the random characteristics. 

 If we take a random starting point in English text, the first letter (when we 

 do not observe the preceding letters) has a possibility of being any letter 

 with the ordinary letter probabilities. The next letter is more completely 

 specified since we then have digram frequencies. This decrease in choice 

 value continues for some time. The effect of this on the curve is that the 

 straight line part is displaced, and approached by a curve depending on 

 how much the statistical structure of the language is spread out over adja- 

 cent letters. As a first approximation the curve can be corrected by shifting 

 the line over to the half redundancy point — i.e., the number of letters where 

 the language redundancy is half its final value. 



If account is taken of these three effects, reasonable estimates of the 

 equivocation characteristic and unicity point can be made. The calcula- 

 tion can be done graphically as indicated in Fig. 8. One draws the key 

 appearance characteristic and the total redundancy curve Z>.v (which is 

 usually sufficiently well represented by the line ND^). The difference be- 

 tween these out to the neighborhood of their intersection is He(M). With 

 a simple substitution cipher applied to English, this calculation gave the 



