474 Information Theory and Biology /25 : 5 



5. The Coding of Genetic Information 



Information theory also is used in discussions of genetics and reproduc- 

 tion. Complex vertebrate organisms grow from a single cell during the 

 reproductive process. Within that cell, in a microscopic or submicro- 

 scopic volume, is coded the information necessary for building a complete 

 organism. The amount of information stored is extremely large, yet it 

 takes up very little space. 



Various attempts have been made to estimate the amount of genetic 

 information necessary. Although these do not agree exactly, they 

 indicate the general orders of magnitude expected. The following is 

 an outline of such an estimate. In every type of nucleus, there are 

 chromosomes characteristic of the particular animal or plant. Along 

 these there are sites functionally connected with different properties of 

 the organism and with the various enzymes within the cells. These 

 sites are called genes. Each gene has several different possible forms 

 called alleles. Estimates indicate there may be as many as 16 viable 

 alleles per gene. Thus, the average information per viable gene is about 



H 9 = log 2 16 = 4 bits 



Inclusion of nonviable alleles would raise H g . 



The number of genes in vertebrates has been estimated as low as 

 3 x 10 4 and as high as 10 6 . Thus, the total information / necessary 

 to be transmitted from generation to generation probably lies in the 

 range 



10 5 < / < 10 7 bits 



These are certainly only estimates. However, it would be very sur- 

 prising if the estimate of the lower limit was a factor of 1 too high or of 

 the upper limit a factor of 10 too low. 



One test of the hypothesis that DNA carries the information of cellular 

 reproduction is to ask if this amount of information can be coded in 

 the DNA in one cell. As was shown in Chapter 15, DNA consists of a 

 double helical chain with "rungs" between the two chains. These 

 rungs are made of the pairs adenine-thymine, (AT), and guanine- 

 cytosine, (GC). If there is a method of sensing the chain direction 

 then there are four possible rungs, AT, TA, GC, CG. Discovering one 

 of these rather than the others reduces the uncertainty, that is, increases 

 information by 



/ = log 2 4 = 2 bits 



Within the nucleus there are more than 10 9 such rungs. Because this 

 number is larger than the maximum estimate of information needed 

 for reproduction, DNA may be the storehouse of such information. 



