The Protein Text 71 



Several attempts, none completely convincing, have been made to determine 

 the coding system employed (4, 5, 6, 7). Cryptography must be based on a 

 study of texts, and 1 shall therefore attempt an examination of protein molecules 

 from this point of view. The following aspects of protein structure will be 

 examined : 



1. The number of kinds of amino acids which occur in proteins. 



2. The effect of mutations on amino acid sequence. 



3. Whether intersymbol correlations exist between adjacent residues. 



4. The frequency of occurrence of the various amino acid residues. 



5. Whether any restrictions exist on the length of peptide chains. 



After considering the empirical evidence, I shall indicate its bearing on the 

 problem of encoding protein sequence information into the RNA molecule. 



I. THE NUMBER OF AMINO ACIDS OCCURRING IN PROTEINS 



In previous studies (6, 7) it has been assumed that proteins are composed of 

 exactly twenty different kinds of residues. Since in fact more than twenty 

 kinds of residues occur in proteins, the assumption requires some justification. 



All organisms, from viruses to mammals, use the same building blocks 

 for their proteins. With minor qualifications this is also true of the nucleic 

 acids, but not true of the third major class of biologically-occurring high 

 polymers, the polysaccharides. The amino acids which invariably occur in 

 all organisms and virtually all proteins are the following: ala, arg, asp, aspn, 

 cys, glu, glun, gly, his, ileu, leu, lys, met, phe, pro, ser, thr, try, tyr, val. The 

 number in this list is exactly twenty. 



It will be noted that I omit cystine from this list. Because of its structure, 

 cystine corresponds to two residues. The structure of insulin (8) shows that 

 one cystinyl residue can occupy non-adjacent positions in a peptide chain 

 or even participate in two different chains. Cystine is best regarded as an 

 oxidation product of cysteine, formed after incorporation of the cysteinyl 

 residue into the peptide chain. This view is supported by the recent discovery 

 of an enzyme which reversibly catalyzes the reaction 



2 cysteinyl :^ cystinyl 



when these residues are protein bound (9). Another example of such a reaction 

 may be the cyclic oxidation and reduction of protein SH groups during the 

 various stages of cell division (10). 



In addition to the above twenty, other alpha amino acids occur in nature. 

 Some of these, such as homocysteine, citruline and ornithine are well known 

 biochemical intermediates but do not occur in proteins. It is clear that the 

 number of amino acids which occur in proteins is limited by an inability to 

 incorporate, rather than make, amino acids. Hydroxyglutamic acid and 

 norleucine, previously believed to be protein constituents, have been shown 

 not to exist as natural products (11). Alpha amino-adipic acid has been isolated 

 from an impure protein hydrolyzate, but it has not been demonstrated that 

 it is a protein constituent in the same way as other amino acids (12). Diamino 

 pimelic acid, commonly occurring in bacteria, appears to be associated with 

 the polysaccharide material of the cell wall (13, 14). 



Nevertheless, there are amino acids, other than the twenty enumerated, 



