25 : 4/ Information Theory and Biology 473 



total number of residues N and the number of types of residues m are 

 fixed. In the five-amino-acid residue, polypeptide, discussed earlier, 

 if there are four glycines and one phenylalanine, then there are only five 

 possible arrangements 



ggggp gggpg ggpgg gpggg pgggg 



The information has been reduced from 



/ = log 2 10 = 3.33 bits 



for three (g)'s and two (p)'s to 



/ = log 2 5 = 2.33 bits 



for four (g)'s and one (p). 



For larger values of N and m, the variation is much greater. In 

 general, one can compute an 7 max and an 7 min for fixed Nandm. Because 

 there are usually about 20 types of amino acids within the cell, one can 

 also compute an I^x for fixed N and 20 types of residues. It is instruc- 

 tive to consider the ratios 



1 and /max 



r ) t 5 "~« T (2 0) 



■"max J min -'max 



It has been found for all proteins tested that /// max is greater than 0.5. 

 For all proteins within living cells, in fact, this ratio is greater than 0.7, 

 and for most it is greater than 0.85. The information per residue is 

 about 



H r — 3.6 bits/amino acid 



for a typical protein. No values are less than half this or greater than 

 5 bits. For instance, for a protein such as albumin with over 500 

 residues, this is a total information of 2,000 bits needed to build the 

 molecule. Fibrinogen has 3,400 amino acid residues; a total informa- 

 tion of 10,000 bits is necessary to distinguish it from all other proteins 

 with the 3,400 amino acid residues and the same types of amino acids. 



It is not just because there are no other possibilities that these values 

 are so high. For albumin, the ratio of I/I mia is about 15. Nor does 

 replacing 7 max with 1^21 alter the situation very much. The ratio of 

 the last two is not very different from one. 



Thus, information theory has emphasized a common feature of all 

 natural proteins, namely that for a given number of residues N and a 

 given number of types m of amino acids, the information contained in 

 the protein molecule is close to a maximum. In terms of entropy, this 

 states that the amino acids are ordered in such a fashion as to minimize 

 the entropy. Information theory was not necessary to reach this con- 

 clusion, but it helped focus attention in this direction. 



