Some Introductory Ideas Concerning the Application of Information Theory in Biology 51 



Watson and Crick (1) that genetical infomiation is carried by the exact 

 order of four kinds of nucleotide pairs provides a molecular vehicle for the 

 genetic control of protein specificity. Gamow (2) was the first to see that 

 this control implied the existence of a four-letter to twenty-letter code. 

 Thus by following the logical consequences of purely biological, or perhaps 

 biochemical, problems one is lead directly to a problem purely mathematical 

 in character. 



This notion of the role of order, which is basic to information theory, is 

 worth pursuing in biology since it provides a way of measuring what we are 

 speaking about and expressing it in numbers. Furthermore, from the results 

 of applying the theory to specific problems, we may obtain an experimental 

 check on the validity of these ideas as first principles. In this article we shall 

 apply these considerations to the storage and transfer of biochemical specificity. 

 We shall explore, in particular, the role of noise in the genetical message. In 

 my article in Part V the theory is applied to the practical problem of calculating 

 and understanding survivorship curves. 



The present status of the means of storage and transfer of specificity is 

 given by Gamow, by Ycas and by Augenstine in their respective articles in 

 this volume. The question of the exact way in which information is destroyed 

 by read-off error, radiation damage, aging, thermal fluctuations, biochemical 

 side reactions, and so forth, is of equal importance. This problem is also 

 discussed in this volume but no final and detailed account can be given at 

 this writing. Nevertheless, since there is virtue in attempt, we shall attempt 

 the development of a mathematical formahsm which is information theoretic 

 in character. 



Most animals and plants exist at one time, at least, in the form of a single 

 cell; we can consider that cell to contain a substantial part of the directions 

 for the development of the organism. Since infonnation is conserved unless 

 lost due to noise, it shall be assumed that the mature organism is characterized 

 by substantially the same information content as the fertilized egg or seed. 

 In order to fix the idea we shall develop the formalism on the basis of Watson 

 and Crick's suggestion concerning the role of DNA. It should be remembered 

 that the central ideas of this paper are independent of much of the detail 

 embodied in Watson and Crick's papers and are dependent only on the possi- 

 bility of genetical endowment being conveyed by a series of structures composing 

 an information bearing molecule. 



Suppose we imagine the symbols A, B, C, D (Gamow's predilection is to 

 the less prosaic spades, clubs, hearts, diamonds!) arranged in one-to-one 

 correspondence with the nucleotide pairs of the DNA found in a particular 

 given cell. The cell will have been selected from a number of similar but not 

 identical cells in a colony under study. This colony may be thought of as 

 being indefinitely large, so that in principle we may consider the ensemble 

 of all possible organisms identifiable as being members of the colony. Since 

 the number of nucleotides in DNA is finite, the number of elements in this 

 ensemble is also finite. Because of this one-to-one correspondence it will be 

 seen that the set of symbol sequences, which is the mathematical model of the 

 ensemble of organisms, will contain the informational or specificity properties 

 of the ensemble of organisms. 



