92 Martynas Ycas 



in the template must therefore exceed the number of residues in the correspond- 

 ing protein by a factor of at least three. 



Absence of intersymbol correlation shows that the 'overlapping' codes 

 discussed by Gamow, Rich and Ycas (6) do not correspond to reality. 



The third requirement is somewhat more hypothetical. From the evidence 

 presented above, it would appear that selection is not the sole factor determining 

 the frequency of occurrence of the various amino acids. This is strongly 

 suggested by the different frequencies of amino acids with aliphatic side chains, 

 and particularly by the characteristic preponderance of leucine over isoleucine. 

 It is therefore reasonable to believe that the coding principle itself imposes 

 certain differences in frequency on the various amino acids. 



If only one configuration of nucleotides corresponds to each amino acid, 

 the coding per se cannot make some amino acids frequent and others rare. 

 This can be done, however, if some amino acids have more than one configura- 

 tion of nucleotides to which they correspond. For this reason I am inclined to 

 believe that the type of coding proposed by Crick, Griffith and Orgel (135) 

 does not correspond to reality. 



Gamow and Ycas (7) have proposed a code that formally meets these three 

 requirements. An amino acid is presumed to be determined by three nucleotides, 

 taken without regard to order. In addition, the number of nucleotides in the 

 RNA is assumed to be three times the number of amino acid residues in the 

 corresponding protein. This has the following consequences: 



1 . There are twenty such triplets, the same as the number of amino acids. 



2. Neighboring triplets share no nucleotides between them. Any sequence 

 of amino acids is thus permitted. 



3. The frequencies of various amino acids, calculated on the assumption 

 that the sequence in RNA is random, are unequal. This is because the expected 

 frequency of any triplet is given by the product of the frequencies of the com- 

 ponent nucleotides and the number of configurations for the given composition. 

 Thus there are six triplets (all presumed to determine the same amino acid) of 

 the type ABC, three of AAB and one of AAA. 



The pattern of frequency distribution of the various triplets, calculated in 

 this manner, corresponds very closely to the amino acid distribution, as shown, 

 for example, in Fig. 3 for the case of E. coli. 



I believe that this type of coding, even if not itself the one wliich actually 

 occurs, is similar to the one that corresponds to reality. The most striking defect 

 is that it provides no explanation, in fact contradicts, the requirement that 

 in RNA the number of 6-keto groups should equal the number of 6-amino 

 groups. H. A. Simon (136) has proposed a modification to take care of this 

 difficulty. If RNA is a paired structure, somewhat similar to DNA, and 6-keto 

 bases pair with 6-amino ones, then the following four pairs of nucleotides exist 

 (again disregarding order) : 



Ad-Gu; Ad-Ur; Cy-Gu; Cy-Ur. 



If one takes these pairs, rather than the individual nucleotides, as units, 

 one can maintain an hypothesis of determination by sextuplets, analogous to 

 determination by triplets. The frequency distribution of sextuplets, calculated 

 for a random RNA sequence, is very similar to that obtained for the triplet 



