52 Hubert P. Yockey 



The importance or value of a theory lies, among other things, in its capability 

 of treating a wide variety of phenomena from a single point of view. It is 

 well to think, at the start, of the field of validity this theory may have and, 

 if it should fail, the significance of its failure. If it should be discovered that 

 Watson and Crick's suggestion has very little bearing or applicability then 

 this development, while negative, is still a valuable result. One would then 

 perforce search for another explanation for the great detail and specificity 

 characteristic of any biological phenomenon. At present it is the most detailed 

 proposal based specifically on molecular chemistry. The theory here developed 

 is essentially statistical and may be expected to express its results in the form 

 of expectation values, probabihty distributions, and their functions. The 

 statistical character of the theory is directly in the line of thinking of both 

 modern biology and modern physics. It should be kept clearly in mind that 

 information theory deals with organizational problems and so some aspects 

 of organisms will be outside its scope. In this sense it may be that the role 

 information theory will play in biology will parallel that played by thermo- 

 dynamics in physics and chemistry. 



II. NOISE IN THE GENETICAL INFORMATION 



The Instability of a Perfect System 



Let us consider an ensemble of organisms and discuss the communication 

 of information from the DNA to protein. There is evidence discussed by 

 Gamow and by Ycas in this volume that the code which translates information 

 from the four-symbol DNA code via RNA to the twenty-symbol protein 

 code is based on triads of nucleotide pairs. Indeed it can be seen that it must 

 be at least the triads since a twenty-symbol alphabet carries 4.32 bits per symbol 

 whereas the pairs in a four-symbol alphabet carry exactly four bits per symbol, 

 assuming no intersymbol constraints. The triads carry six bits per symbol 

 and so this represents some inherent redundance. It would be desirable to 

 express this formalism in terms of the DNA triads of nucleotide pairs ; however, 

 this requires a knowledge of the DNA to protein code. These data are missing. 

 Our objective is to develop the mathematical fomialism in as simple a way 

 as possible so it appears more appropriate to consider the communication of 

 specificity from DNA to RNA. Here we are dealing with a coding between 

 two four-symbol alphabets. 



Suppose we are considering an ensemble of organisms which is isogenic, and 

 further that this means that each organism is characterized by exactly the 

 same order of nucleotides in the DNA of its nucleus. We shall now show that 

 this situation is unstable and that therefore a real ensemble of organisms will 

 be represented by an ensemble of messages recorded in its DNA. From this 

 it will follow that there is a distribution in the message entropy, characteristic 

 of any ensemble of organisms, even one which is isogenic. 



The message entropy is 



H=H,-H, (1) 



where H^ is the message entropy of the genetical information and H„ is the 

 loss of information due to noise. That is, //„ is the loss of information from 



