Karl Pearson and David Heron 
181 
means obvious. It is conceivable that a better measure of association would give 
a limit below unity for such a case, while providing a limit nearer and nearer to 
unity as the information given with regard to what really occurs inside the broad 
categories is more and more complete. That is to say, a desirable coefficient of 
association would be one which would always lie numerically between 0 and 1, 
but which would not take the absolute value 1, unless far more detailed infor- 
mation were provided than is given in the statement of such a table as (H). 
From this standpoint we see at once how idle is Mr Yule's criticism of the co- 
efficient of contingency. He suggests that it is invalid because (i) it has an upper 
limit less than unity, when the contingency table has a limited number of cells, 
and (ii) its value rises when you increase the number of cells. It is less than unity 
in the first case, because we are ignorant of what may happen when we analyse 
the contents of the big cells ; it increases in the second case because we have 
additional knowledge. It only becomes unity when one character A is absolutely 
fixed by a second B, i.e. when A is a function of B*. The coefficient of con- 
tingency is a valid measure of association, whether the table be fourfold -f- or 
n x ft'-fold. It presents far fewer logical anomalies than Mr Yule's Q or the Boas- 
Yulean <£, and it readily admits of our calculating, what for many cases is essential, 
the probability that the two attributes are independent. 
But Mr Yule dismisses the coefficient with (a) a quite unreasoned criticism 
that it increases in value as the number of cells increases, and (b) an illustration 
that it is not equal to the coefficient of correlation for one particular table of 
heterogeneous material, i.e. for a surface of zero correlation with a cock's comb 
of absolute correlation erected along its diagonal. Nobody, as far as we are aware, 
ever asserted it would be. The assertions made with regard to the coefficient of 
mean square contingency may be summed up as follows : (i) for any frequency 
distribution the coefficient of contingency is a reasonable measure of the extent 
of the deviation of the attributes from absolute independence, and (ii) for such 
frequency surfaces in homogeneous material as occur in actual practice the co- 
efficient of contingency, if tlie proper corrections are made\, gives a value close to 
the coefficient of correlation, whether we divide the table up into 3 x 3-fold or 
8 x 8-fold groupings. The skewness of the distribution — its deviation from Gaussian 
frequency — is not a very disturbing factor, as we shall show in the sequel. When we 
take material which has — if there be an indefinite number of cells — an indefinitely 
great improbability of independence, i.e. material for which C 2 =l, we shall not 
* See Pearson, Grammar of Science, 3rd ed. p. 162. 
t The probable error of a coefficient of contingency C 2 for a fourfold table is 
•67449 ^ 2 il-2C,2+ C2(1 -^V -|GV + 
Jn t (1-Cf?)i I 
where X and fx have the same values as on p. 170. It does not become zero when the Boas-Yulean <j> 
is equal to unity, unless X = ( u = 0. 
% These corrections have been several times referred to (see Grammar of Science, ed. 1911, ftn. 
p. 163) and have been in use in the Laboratory, but the further memoir on Contingency which has been 
for some time in hand has been delayed owing to pressure of other work. It will shortly be issued and 
deal more fully with the corrections merely stated in this paper. 
