PROTEIN STRUCTURE AND 

 INFORMATION CONTENT* 



L. G. AUGENSTINE 



Brookhaven National Laboratory, Upton, New York 



I. INTRODUCTION 



In stating that a given system has an information content of a certain number 

 of bits, care must be taken to specify not only the context within which this 

 number has been derived but also an attempt must be made to give meaning 

 and utility to this measure. Specifying the context is particularly important 

 since for most systems there are many levels at which the information content 

 can be derived. For example, the information content for a cell is very low, if 

 one is concerned only whether it is living or dead, but it is very large if one is 

 interested in specifying the parameters of each of its individual elementary 

 particles. In this article, estimates will be made of the information content of 

 given proteins by taking into account that they are a sequence of amino acids 

 which can assume only a discrete number of configurations. An attempt will 

 be made to study some of the factors which affect the infonnation content and 

 the types of constraints which must operate in the elaboration of proteins. 

 Some idea of the magnitude and types of the constraints pertinent to proteins 

 can be obtained from parallel studies on proteins and printed English (for which 

 the constraints are known). Finally, the information content based upon 

 structure will be compared with estimates of information content obtained 

 within the context of protein function. 



Although the fact has not always been fully appreciated, information 

 measures are usually more effective in selecting among alternative hypotheses 

 than in suggesting new ones. This particular trait arises from the fact that 

 information estimates, which depend only upon the probabilities associated 

 with a class of experimental outcomes, will often describe the degree to which a 

 number of variables interact but indicate little or nothing about the behavior 

 of the individual variables. As a result no novel synthetic procedures or 

 selection principles are advanced here to explain the manner in which polypep- 

 tide sequences and/or configurations are determined. Rather, in this paper 

 information theory considerations have been used to evaluate alternative 

 explanations of some aspects of protein construction. 



II. ESTIMATION OF STRUCTURAL INFORMATION CONTENT 



AND CONSTRAINTS 



At the structural level the total information content (/() of a protein will be 

 treated as the sum of two terms; one (/,) depends upon the amino acid sequence 



* Research carried out at Brookhaven National Laboratory under the auspices of the U.S. 

 Atomic Energy Commission. 



103 



