36 THE BIOSYNTHESIS OF PROTEINS 



2. Coding Problems 



Although the colinearity principle is nothing more than pure hypothesis, 

 it looks so logical that several studies have already been devoted to the type 

 of correlations which might exist between the arrangement of the nucleo- 

 tides in DNA and the amino acid sequence in a polypeptide. 



The problem is to discover and to decipher the language in which 

 genetic information is written. These studies are based on abstract argu- 

 ment and often resemble mathematical recreations. But some of these are 

 very ingenious and they may be fruitful if their conclusions are amenable 

 to experimental test or if they suggest clear experiments. 



Since there are only (essentially) 4 different nucleotides for controlling 

 the arrangement of 20 amino acids, it is obvious that individual amino 

 acids cannot correspond to individual nucleotides ; they might correspond 

 to groups of nucleotides. There are 16 possible oriented pairs of nucleotides, 

 which is not enough either. The number of different sequences of three 

 nucleotides is 64, which is much more than what is needed. 



Gamow (1954) proposed a coding system in which triplets of nucleotides 

 in a double helix of DNA would correspond to individual amino acids. 

 These triplets are overlapping, i.e. each nucleotide is part of 3 triplets and 

 concerns therefore 3 amino acids. 



Crick et al. (1957) considered another coding principle also involving 

 partly overlapping triplets. Such systems would make it possible to have 

 about the same number («+2) of nucleotides in the DNA chain as there 

 are amino acids in the corresponding polypeptide. But the overlapping 

 triplets would impose restrictions on the possible sequences of amino acids. 



Systems have also been proposed in which only a few amino acids, e.g. 

 the aromatic amino acids, are controlled by the gene (Schwartz, 1955), 

 whereas the sequences in between are not. 



The most ingenious system presented so far is the 'code without commas' 

 (Crick et al., 1957; Crick, 1957). It is assumed that a sequence of three 

 nucleotides in the DNA chain corresponds to a single amino acid and that 

 the trios of nucleotides are contiguous in the chain, but non-overlapping. 

 Since in DNA thousands of nucleotides follov/ each other in a regular 

 linear sequence, each occupies equivalent positions in the chain. A diffi- 

 culty therefore arises for reading the information. One does not know how 

 to cut the series of nucleotides into trios. A way out of this difficulty is to 

 further assume that certain trios 'make sense', i.e. correspond to an amino 

 acid, whereas others do not, just as certain groups of letters make a word 

 and others are meaningless. The information will be readable 'without 

 commas' if each triplet which makes sense can only produce with its neigh- 

 bours overlapping triplets which are meaningless. A fascinating feature of 

 this system is that the maximum number of trios which fulfil these con- 

 ditions is exactly 20, i.e. exactly the number of the amino acids species to 



