472 Information Theory and Biology /25 : 4 



for example three glycines and two plenylalanines, the only possibilities 

 are 



that is, 10 or 



In general, there are N amino acid residues of m types in a protein, 

 such that there are n x of the first, n 2 of the second, and so on. The 

 number of types m is less than or equal to 2 1 . The number of ways of 

 arranging these in a straight chain is 



(5) 



(«i !) («a !)•••(«« 



If all are equally likely, the information necessary to build a particular 

 protein is 



m 



1= + log 2 P = log 2 JV! -2l°g2(»«0 



The average information per amino acid residue is 



J_ \ 



N~ N 



T 1 '" 



tf B = ^^[log 2 M-|log 2 ( Wi !)] (6) 



Because N is large compared to one, Sterling's formula can be used, 

 namely that 



/. log 2 N\ = (log 2 e) log e N\ = 1.45 N\og 2 N 



Therefore, the average information per residue, or negative entropy per 

 residue, is 



I m 



H R = 1.451og 2 iV-^2 lo g2(^ ! ) (7) 



If, in addition, all the n { are large, this expression becomes 



m n m n n 



H R = 1.45 log 2 N - 1.45 | ^ log a n t = - 1.45 | ^log 2 | (8) 



In a long molecule, the ratio nJN is the relative probability of finding 

 an amino acid of the i th variety, so that (except for a numerical constant) 

 the foregoing formula is identical to the previous form for H. Un- 

 fortunately, the values of n x are so small that Equation 7 must be used. 

 The values for / and for H R can vary widely even though both the 



