The Protein Text 



83 



It may therefore be concluded that there is no evidence for any intersymbol 

 correlation between nearest neighbors. Inspection of sequences reveals like- 

 wise no obvious correlations of residues more than one removed from each 

 other, but to decide this question definitely will require more knowledge of 

 longer sequences than is now available. 



Gamow, Rich and Ycas (6) have previously studied this question of 

 intersymbol correlation. They examined a grid diagram, similar to Fig. 2 

 but embodying fewer data, to see whether the frequencies of entries follow 

 the PoissoN distribution. This method is invalid, since it does not take into 

 account the fact that different amino acids occur with very different frequencies. 

 I am glad to avail myself of this opportunity to correct these authors. 



IV. FREQUENCY OF OCCURRENCE OF DIFFERENT AMINO ACIDS 



Amino acids occur with different frequencies in proteins. Some, like leucine, 

 are consistently abundant, others, like methionine, consistently rare. The 

 frequency of occurrence of the various amino acids in the bulk protein of a 

 whole organism, Escherichia coli, is shown in Fig. 3. 



10 

 RANK 



20 



Fig. 3. Composition of bulk protein of Escherichia coli (87), amino acids arran- 

 ged in order of abundance. The vakies for glu, glun and asp, aspn arbitrarily 

 taken as half of glx and asx, respectively. The value of cysteine taken from 

 Roberts and Cowie (88). 'Triplets' refers to the frequencies of triplets of 

 nucleotides, calculated according to the hypothesis of Gamow and Ycas (7) from 

 the composition of E. coli RNA (89). 



Data on the composition of twenty-three proteins are summarized in Table V. 

 This table shows that the composition of individual proteins is not too different 

 from that of bulk protein. The most abundant amino acid usually has a 

 frequency of about 0.10 to 0.12, the least 0.005 to 0.01. 



Table V suggests the possibility that the differences in composition of 

 various proteins may be merely the result of chance fluctuations from a mean, 

 and not importantly related to biological function. This notion may not be 

 as far-fetched as might appear at first sight. The most important function of 

 proteins is catalysis, and the enzymatically active site probably involves only 

 a few amino acids. In addition, proteins of a given organism appear to have 



