The Protein Text 93 



distribution. This suggests that a whole series of codes of this type may exist, 

 all having similar general properties. 



At present the major difficulty is not to produce a coding principle that 

 explains the known facts, but rather to make a choice between the many that 

 are possible. 



The correctness of a coding principle can, in general, be ascertained from a 

 consistency of correspondence of the RNA and protein texts. Unfortunately, 

 such a direct approach is not at present possible. Except perhaps in the case of 

 RNA viruses, it is not possible to isolate a pure RNA corresponding to a pure 

 protein, and were this possible, the sequence of nucleotides could not be deter- 

 mined by any method currently available. 



If the composition only of a series of RNA's and the corresponding proteins 

 is known, it is theoretically possible to check some coding schemes as follows: 

 If the coding scheme is correct, the various configurations of nucleotides can 

 be assigned to the amino acids in such a manner as to give, when summed over 

 the protein, the experimentally determined RNA composition, and this con- 

 sistently for all RNA-protein pairs. No assumption need be made that the 

 RNA sequence is random. Actual application of this method requires a large 

 number of RNA protein pairs of accurately determined composition, obviously 

 diftering as much as possible from each other, and the facilities of an electronic 

 computer. 



The electronic computer is much the easier of the two to provide. At 

 present the data are hopelessly inadequate, although analyses of the proteins 

 and RNA's of viruses may eventually make such an approach possible. However, 

 in attempting a correlation of viral RNA and protein (Table VII), it should be 

 remembered that some viral RNA's do not show the equality Ad + Cy = 

 Gu + Ur characteristic of non-viral RNA (89). This suggests that normal 

 RNA may be multi-stranded, while viral may not be. It is therefore not im- 

 possible that viral RNA may contain all the information, but not all the material 

 of a protein determining structure, and hence differ in composition from it. 

 An additional difficulty is that it is not certain that all viral RNA is concerned 

 in the determination of the protein which eventually appears in the virus 

 particle. 



In lieu of anything better, I have attempted to make consistent assignments 

 of triplets to amino acids on the assumption that the sequence in RNA is 

 random. The random frequencies of triplets were calculated for liver (Fig. 5), 

 Tobacco Mosaic and Turnip Yellow virus. I then tried to assign each triplet 

 to an amino acid in such a manner that each member of the pair would have 

 approximately the same frequency in the three cases. No satisfactorily consistent 

 assignments could be obtained by this method. Assuming that the RNA's and 

 proteins actually correspond, failure indicates one or more of the following: 



1 . The coding principle used is false. 



2. The RNA is not a random sequence. 



3. The proteins of viruses are so small that relatively large deviations from 

 expected frequencies may be found. The molecular weight of TMV protein 

 is about 17000 (48, 137), that of Southern Bean mosaic about 26000 (129), 

 Several of the amino acids occur as only a few residues per molecule, so that a 



