Karl Pearson 
249 
lose all the advantage of having one variable in multiple categories. It is further 
to be noted that in many cases it is impossible to suppose any adequately graduated 
variable behind the multiple categoric variate. For example, if we wish to find 
the relationship between crime and occupation, we may easily select 10 or 12 
groups of kindred occupations, and discuss the number of convictions in each such 
group of individuals, but it would be unreasonable to suppose any continuous 
variable behind this grouping, while we might reasonably suppose a continuous 
variable behind the tendency to criminality. The present method has the 
advantage of giving a unique solution for double row contingency tables; it makes 
less appeal to hypothesis than the fourfold division method applied to such cases, 
and it is far more rapid in execution. 
Finally, the necessity for some such method has been forced upon my notice by 
the great frequency with which such double row contingency tables have recently 
occurred in the work of those dealing with medical, sociological and criminological 
statistics in my laboratory. 
(2) The theory of the method is very simple. Let y be the categoric variate 
in multiple classes, and x the alternative variate. We suppose x to be ultimately 
continuous, but by using y we do not suggest continuity. For such a system the 
correlation ratio, r], has a perfectly definite meaning, it is the ratio of the standard 
deviation of the weighted means of the ?/-arrays of x% to the standard deviation 
of the whole population ; in symbols : 
ivcj" 
where is the number of individuals in any ^/-category ; Xy is the mean x for this 
category and x and a^. are the mean and standard deviation of the ^c's of the whole 
population N. This value of it is well-known*, must lie between 0 and 1 ; it 
becomes equal to ?-, the correlation coefficient, when the regression is linear. 
Further the mean standard deviation of the weighted arrays is known to be 
cTa; VI—t;^, a value which becomes small as t] approaches unity, or x becomes 
absolutely defined by the ^/-category in which it lies. 
Now let ycr^ be the standard deviation for the £c's which fall into a special 
category y, then we may take : 
.■4«(".2^.^)-(0 
since *S' = Nu\ 
Now we have just seen that the mean value of = cr^ Vl — t/'^, we shall 
therefore assume that the distribution is sufficiently homoscedastic for us to 
replace ydildi by its mean value (\ — rf'). Hence: 
(1-T)^^ 
'n.y Xy- 
Roy. Soc. Froc. Vol. 71, p. 303. " On the General Theory of Skew Correlation and Non-linear 
Drapers' Research Memoirs, 1905. Dulau & Co. 
