A Primer on Information Theory 19 



abolished*. The prior uncertainty does not depend on the event that has 

 actually happened, but, rather, on the whole set of events which could have 

 happened at this particular occasion. For instance, if one wishes to compute 

 how much information is acquired, on the average, by a glance at the speedo- 

 meter, one proceeds to estimate how uncertain a motorist is before he glances. 

 The amount of this uncertainty must depend on the number of needle positions 

 which the motorist thinks he can distinguish. Suppose his speedometer scale 

 reaches from zero to one hundred and he can read the position to the nearest 

 mile per hour; then, he will be able to distinguish 101 positions, and the amount 

 of his uncertainty will be somehow related to this number. However, it wouldn't 

 be realistic to relate his uncertainty only to this number, 101. Because, suppose 

 his speedometer scale ranges up to 150 instead of 100 miles per hour; yet, 

 when he is driving along the highway at a moderate speed, this extra portion 

 of scale does not contribute in any way to his uncertainty; he will be quite 

 sure that his needle will not be in this interval. In fact, he will expect to find 

 his needle somewhere within a range of about 10 m.p.h., and he will be almost 

 certain to find it within a somewhat larger range of, say, 20 m.p.h. Thus, to 

 describe his uncertainty realistically, we must not only state every possible 

 result of his reading, but will have to qualify each by a statement of expectation 

 or probability. 



The Amount oj Uncertainty 



As before, we turn to a binary situation to obtain a simple perspective of 

 the problem. Suppose somebody has made a record of 100 tosses of a coin; 

 he has registered only whether the coin fell 'head up' or 'tail up', but neglected 

 all other features such as on what spot the coin came down, which direction 

 the head faced, etc. What is the average amount of information in the record 

 of any one toss? In other words, what is the amount of uncertainty before 

 the record is seen ? 



The uncertainty must be a function of 'two', the number of alternatives; 

 it must be modified by their relative frequencies. If it is known that the record 

 is that of a coin so thoroughly biassed that 'head' always turns up, then there 

 will be no uncertainty at all; if the coin is moderately biassed, then the outcome 

 of a toss will be uncertain but not qui.te as much as with an unbiassed coin. 

 If we don't know the bias of a particular coin, then we do not know exactly 

 how uncertain we should feel about the outcome of a toss. If we know that 

 the record contains 60 'heads' and 40 'tails', then a record of 'head' will show 

 up with a probability of .60, a record of 'tail' with a probability .40. The 

 uncertainty can be described by a statement of these probabilities: 



Probability of head up 0.60 



Probability of tail up 0.40 



In the same way we can describe any number of binary uncertainties with 

 a 60-40 choice between any class 'A' and its complement 'non-A' — where 

 'A' and 'non-A' may be males and females, hits and misses, friends and foes. 



* At some time there was some discussion whether uncertainty and information should be 

 given opposite signs. Present usage prescribes the same sign for both. 



