76 



the biology of microscopic life received a powerful stimulus from the 

 rise of medical bacteriology. 



It was left to the 20th century to analyze the cell into its chief 

 chemical components: proteins, nucleic acids, fats, and carbohy- 

 drates. Of these chemical species, proteins were soon discovered to 

 be giant molecules made up of thousands of atoms. If proteins are 

 gently broken down, they fall apart into amino acids; there are only 

 20 different amino acids in all life. Thus, a protein can be described 

 as a word string in an alphabet of 20 letters (the amino acids). It was 

 found as well that the nucleic acids (DNA and RNA) were strings 

 over an alphabet of four letters (nucleotides) (fig. V-l). In the case of 

 DNA, the letters (or molecules) are adenylic acid (A), guanylic acid 

 (G), cytidylic acid (C), and thymidylic acid (T). In the case of RNA 

 it is A, G, C, as above, but uridylic acid (U) instead of thymidylic 

 acid (T). The number of combinatorial possibilities are more than 

 astronomical. Assuming we have formed a string of amino acids 

 100 letters long, how many different ones could be present? There 

 are 20 different possibilities for the first member of the string and 

 20 for the second and so on. Therefore, there are (20) 100 different 

 proteins, a number large past imagining. 



The DNA in E. coli is found in a single molecule which is about 

 1 mm long. There are about 3 million nucleotide base pairs in one 

 molecule of DNA. The different protein strings realized are encoded 

 in the sequence of bases in the double-stranded molecule of DNA. 



A protein molecule called RNA polymerase transcribes the 

 sequences within the DNA molecule into RNA molecules, called 

 messenger RNA molecules. An RNA molecule then enters a protein- 

 synthesizing machine, which is best compared to a molecular tape 

 recorder, in which the RNA molecule is read and the output is a 

 sequence of amino acids. 



Since there are four bases (A, G, C, U) in the RNA alphabet 

 that can be used to code for amino acids, it can be seen that three 

 bases (4 3 = 64) are required to accomplish this code: neither one 

 base (4 1 = 4) nor two bases (4 2 = 16) would provide a unique code 

 for the 20 amino acids. The representation of the sequence of amino 

 acids of proteins by a sequence of nucleotides in RNA is called the 

 genetic code (table V-l). In table V-l the first base in the triplet is 

 listed in the first column, the second base is listed along the row, and 

 the third base is listed in the last column. The 64 triplets are thus 



