A Primer on Information Theory 27 



From the marginal sums, the uncertainties concerning the height of daughters, 

 H{x), and of fathers, H(y), are computed as described in the preceding section. 

 The uncertainty concerning both heights in a father-daughter pair is computed 

 in similar fashion from the joint probabilities, p(i,j). This function is properly 

 called the Joint uncertainty, or uncertainty of the two-part system ; its symbol 

 is H(x,y). It is compared to the sum of the two individual uncertainties. If 

 the two heights were completely independent of each other, then the joint 

 uncertainty should be equal to the sum of the individual uncertainties. In our 

 case, it is smaller by 0.22 bits. The deficit is a measure of the internal constraints 

 in the system, which lead to an association between heights of fathers and 

 daughters. The function is designated by the symbol T(x;y). Its defining 

 equation is : j,^^ .^^ _ ^^^^ _^ ^^^^ _ j^^^.^^ 



This information function is germane to other statistics which measure the 

 relatedness of two variables, such as the coefficients of correlation and of 

 contingency. The T-measure is of very general applicability; the values of the 

 variables do not have to be quantitative, not even ordered — they must only 

 be distinguishable. For instance, one can compute a T-measure for a relation 

 between color and shape. 



The two functions, H and T, differ in the way in which they are affected by 

 change of scale. Let us consider what would have happened if he had chosen 

 one-inch intervals instead of three-inch intervals. It could be the case that only 

 one one-inch interval out of any group of three is occupied at all. Then, the 

 information that a certain height falls into a given three-inch interval would 

 automatically locate it in some one-inch interval; hence, the uncertainty is 

 not increased by the subdivision of intervals. However, this is an extremely 

 unlikely situation. It is much more likely that the three one-inch intervals are 

 populated with approximately equal frequencies. In this case, additional 

 information of logg 3 = 1.58 bits is needed to specify the proper one-inch 

 interval. Then, the uncertainty concerning the height of fathers with regard to 

 a one-inch scale will be 2.00 + 1.58 = 3.58 bits, and the uncertainty concerning 

 the height of daughters 1.92 + 1.58 = 3.50 bits. The joint uncertainty will be 

 increased by a factor of logg 9 = 3.17, because each cell in the table will be 

 replaced by nine cells as one goes from three-inch intervals to one-inch intervals. 

 If one uses a still finer grain, going from inches to millimetres, then the individual 

 uncertainties can be increased by another 4.7 bits, the joint uncertainty by 9.3 

 bits. This is quite the expected behavior. The more categories are recognized, 

 the greater the uncertainty of classification. The uncertainty can become infinite 

 for a continuous function. However, it will always remain finite for any set of 

 real observations. 



T, on the other hand, depends very little on the scale interval used. With 

 very coarse grouping, T tends to be less. In the extreme cases, where all heights 

 are pooled into one single class, all individual and joint uncertainties vanish, 

 and with them their differences. In the other extreme case, where measurements 

 are taken and registered to so many digits that no two results are alike, we must 

 get //(x) = //(;,') = //(x,v) = r(x;v) = loga 1376. But, between these un- 

 reasonable extremes, the measure of constraints is characteristic of the system 

 and not of the scale which is used in measuring it. 



