78 



Martynas Ycas 



is a strong tendency for the terminal residues of such proteins to be identical. 

 This is certainly not due to the chains being identical in all cases, since the 

 hemoglobins, for example, do differ in the penultimate positions (Table I). 

 Rather it appears to indicate that multi-chain proteins arise by reduplication 

 of genetic material, so that the several chains start out by being identical, 

 but gradually diverge in the course of evolution in the same way as homo- 

 logous proteins of different species. This hypothesis, as applied to the hemo- 

 globins and insulin, has been previously discussed (6). Determinations of the 

 residue sequence along different chains of one protein may therefore throw 

 additional light on the replacement process. 



Table I shows that the process by which replacements become established 

 is very slow. Elucidation of the sequence of homologous proteins may therefore 

 make it possible to determine phylogenetic relations between large groups 

 such as phyla, which cannot now be certainly determined from morphological 

 and embryological evidence. 



III. CORRELATIONS BETWEEN ADJACENT RESIDUES 



Are there any forbidden combinations of adjacent residues? An examination 

 of the sequence of residues in proteins (Table IV) could provide an answer 

 to this question. 



<0 Q.Q. too 3>- CO 



_iq: men >-_» _i_i — 

 << < < oo oo X 



3(oHuJOttQ:>-a:_i 

 ui>-iiJXD:uxir>4 



ALA 

 ARG 

 ASP 



ASPN 

 CYS 

 GLU 



GLUN 

 GLY 

 H IS 



ILEU 

 LEU 

 LYS 

 MET 

 PHE 

 PRO 

 SER 

 THR 

 TRY 

 TYR 

 VAL 



Fig. I. Dipeptlde sequences now known to occur in proteins, compiled from 

 Table IV. The N-terminal amino acids are plotted in the rows, the C-terminal 



in the columns. 



There are of course 400 possible pairs of the twenty amino acids. The 

 known protein sequences in Table IV have been broken down in the following 

 way. A sequence, say, of ala-arg-gly is broken down into the dipeptides ala-arg, 

 arg-gly, and the appropriate cells in Fig. 1 are then filled, the N-terminal 

 residues being represented by the rows, the C-terminal by the columns. Using 

 all the data available in Table IV, Fig. 1 shows that somewhat more than half 

 of all possible dipeptide combinations are known to occur. The question 



