A Primer on Information Theory 41 



course, a particular isotope located in a sensitive spot and decaying at a critical 

 moment can have very large effects. In a case like this, the selection of a set of 

 categories becomes a matter of compromise. 



The probabilities, finally, are never actually known. We have to estimate 

 them, on more or less sound bases. In many situations where generalized 

 information theory is used, the bases for estimating probabilities are rather 

 uncertain. Therefore, it becomes important to assess the dependence of 

 information functions on fluctuations of probabilities. 



The contingent nature of information measures has not always been obvious. 

 All early applications of infomiation theory dealt with telecommunication 

 systems. In all of these, all informational characteristics are perfectly well 

 defined. In Morse code, all we have to know is whether a particular information- 

 carrying element is a blackness or a whiteness, and whether it is long or short. 

 In pulse code modulation, the only thing that counts is presence or absence 

 of a pulse within a stated interval of time. In pulse amplitude modulation, 

 all information is vested into the amplitude of pulses. In all these cases, there 

 is no question about the infomiational characteristics of the process under 

 consideration. 



The situation is radically different in the larger domain of applied infor- 

 mation theory. For instance, take the case of two people transmitting information 

 to each other by talking. The information-carrying element is a clause; to 

 simplify our analysis, let us consider just words (remembering that the infor- 

 mation content of a clause cannot be greater than that of its constituent words). 

 Now, each person culls his words from a reservoir which is known to be large, 

 but its actual size is not exactly known. The information content of a single 

 word depends on the probability of its use, and these probabilities are not 

 exactly known either. Furthermore, they will hardly be the same for both 

 persons involved in a conversation. Also, each word can have several meanings, 

 one of which may be more or less determined by the context. The relations 

 between words, meanings, and context, again, are not the same for any two 

 people. This is not all. Information is conveyed not only by the choice of 

 words but also by inflection of voice, loudness, timing, and accompanying 

 gestures. In such a situation we have obviously no hope ever to obtain a 

 precise, unequivocal, and incontestable measure of information content. 

 We are, thus, confronted with two alternatives. These are: not to use infor- 

 mation theory, or to try to devise ways of producing usable approximate 

 estimates. Obviously, our choice is the latter alternative (19). 



Approximation MetJiods 



It appears that the approximation methods to estimate infonnation functions 

 are based on the following rules: 



1. Averaging increases uncertainty; 



2. Pooling decreases uncertainty; 



3. Disregarding constraints increases uncertainty; 



4. Rare events have small effects on uncertainty measures; 



5. Smafl variations in probability have small effects on uncertainty measures; 



6. In systems, information functions can be estimated in different ways, 

 and care should be taken to select the most appropriate one; 



