42 * Henry Quastler 



7. If it is not possible to measure the actual infonnation functions desired, 



then one can try to substitute closely related measurable quantities. 

 In the following paragraphs, these rules will be amplified and illustrated. 



1. Averaging Increases Uncertainty — The fact was demonstrated in Section 

 III. It suggests a simple bracketing procedure: obtain a lower and upper 

 bound of uncertainty by using probabilities which are certainly more and 

 less unbalanced than they actually are. In particular, if the number of categories 

 is known but their respective probabilities are not, then one can follow Laplace's 

 procedure and set all probabilities equal which maximizes uncertainty. 



2. Pooling Decreases Uncertainty — This, too, has been proven in the third 

 section. It is equally of value in bracketing procedures: using only categories 

 actually discriminated puts a lower bound on uncertainty; assuming more 

 categories than could be of interest establishes an upper bound. 



3. Disregarding Constraints Increases Uncertainty — Let x and y be different 

 events, where y may differ from x only in time or place of occurrence or in 

 any other respects. If H(x) is the uncertainty of x, and Hy(x) the uncertainty 

 of .Y if y is known, then: 



H^x) < H(x). 



That is, knowing some other event, y, cannot increase the average uncertainty 

 concerning x; it will leave it unchanged if there is no association between x 

 and y; it will reduce it if constraints exist which are manifested in a statistical 

 association between x and y. 



Rule 3 can be used for a bracketing procedure. Disregarding constraints 

 yields an overestimate of H(x) ; introducing constraints known to be too strong, 

 an underestimate. 



Constraints have to be very marked to cause large changes in H(x). For 

 instance, the large inequalities of letter frequency in English texts reduce H 

 from a possible maximum of 4.7 bits per letter to 4.1 bits; the strong constraints 

 between successive letters and words result in an additional reduction to 

 1.5-2.0 bits per letter. 



Formally, rule 3 is a special case of rule 1. 



4. Small Effects of Rare Events — The information functional is a sum of 

 terms of the form (—p log/»). This function rises steeply between zero and .10, 

 hence, small probabilities contribute little to the total sum. For instance, 

 ten equiprobable alternatives correspond to an H of 3.32. If one of these 

 alternatives is replaced by ten separate sub-categories, each of probabihty 

 .01, then the resulting H is 3.65. If instead of ten, one introduces 100 equi- 

 probable sub categories, each with probability .001, the resulting H is 3.99, 

 or equivalent to sixteen equiprobable categories. 



A good example turned up in a study by A. A. Blank. He calculated the 

 information content of single Enghsh words. For particular reasons, the sample 

 was restricted to four letter words. Thorndyke's list contains 1550 such words. 

 H, based on the observed frequency of these words, is 8.13 bits per word. 

 Of these words, 119 occur with the greatest frequencies. Computing H on 

 the basis of these words alone gives a value of 6.34 bits per word. Thus, taking 

 into consideration only about one tenth of all categories already yields about 

 four-fifths of the final information function. 



