ON A NEW METHOD OF DETERMINING CORRELATION, 
WHEN ONE VARIABLE IS GIVEN BY ALTERNATIVE 
AND THE OTHER BY MULTIPLE CATEGORIES. 
By KARL PEARSON, F.R.S. 
(1) In a recent paper* I have dealt with the case when one variable is given 
by alternative categories and the other proceeds by quantitative groupings, for 
example when a population whose ages are recorded is classed into anaemic and 
non-anaemic ; or a population whose cephalic indices have been measured is 
classed into conscientious and unconscientious sections. The object of the present 
paper is to carry the idea involved in such double row correlation tables a stage 
further by supposing the variable classified into multiple categories to be purely 
qualitative. Such variables I have elsewhere spoken of as categoric variables, to 
distinguish them from graduated variables. As illustrations we may take the eye 
colour of an individual and the presence or absence of pulmonary tuberculosis. 
Here the eyes may be grouped into seven or eight classes, but we can only record 
the presence or absence of the disease ; it is true that the immunity of the indi- 
vidual is almost certainly a graded variable, but we are not able at present to mea- 
sure it, and must be contented with an alternative classification. Again, the presence 
of malaria and the skin tint is another illustration from the same field ; for while 
skin tint is undoubtedly a graduated variable, no medical inquirer has probably 
the energy or time to do more than group into multiple classes — perhaps seven or 
eight — separated by certain skin tint mosaics. 
Hitherto such double row contingency tables could only be reduced by using the 
fourfold process, originally published by me in the PJiil. Trans. Vol. 195 A, pp. 1 — 4. 
Such a process has two disadvantages; it assumes that a graduated variate which 
follows the Gaussian law is at the bottom of both classifications ; and further it 
requires us to make one fourfold grouping, where many are possible. The choice 
left to the operator is not unique, and different selections may modify somewhat 
the result. Even if the mean result of taking several divisions be adopted, we do 
not get rid of an arbitrary element in the process, and the labour may occasionally 
be excessive. 
The fourfold process, absolutely necessary as it often is, should on the whole be 
reserved for those cases in which a fourfold division has arisen from the very 
nature of the data and not be applied to double row tables, where by using it we 
* Biomctrilw, Vol. vii. pp. 96-105. 
