198 
Mhcellanea 
X. On a Coefficient of Class Heterogeneity or Divergence. 
By KARL PEARSON, F.R.S. 
(1) In considering the sub-groups of a population — especially in dealing with local races in 
man, animals or plants — a problem of the following character has not infrequently arisen : It 
is found that a sub-class, for example a local sample, difi'ers considerably from the general 
population. This divergence may have any magnitude upwards from the probable limits of 
random sampling. We require some coefficient which will express by a single number the 
relative divergence from the general population of each sub-class or local group. For example, 
we take the frequency of alternative characteristics of the local population and find these 
are represented by certain percentages in the genei'al population ; we know also the percentages 
in the sub-group. We can, of course, take the difference of each individual percentage and 
of the general population percentage and find the probable error of this difference, but this gives 
us a series of nvmibers, and not a single measure of the general heterogeneity of the group. 
These numbers may also belong to correlated characters, and when one number marks a great 
excess in percentage we may expect a great defect in a second percentage for this very reason. 
But this makes the weight to be given to such a complex system of numbers extremely difficult 
to estimate. 
The necessity for some general coefficient of class heterogeneity was impressed upon me, 
while discussing with Mr J. F. Tocher his reduction of the Anthropometrical Surveys recently 
made of the iimiates of Scottish Asylums and of the children in Scottish Schools. It was 
needful to find a single number, which would measure local heterogeneity, or the divergence 
from a random sample of the general population in a series of characters of the local population. 
The number chosen must be such (i) that allowance is made for the size of the sample, (ii) that 
the numbers for different sub-groups or localities are strictly comparable, and (iii) that we have 
some idea as to the size of its probable error. Following up a suggestion of Mr Tocher I have 
reached what I think is a workable coefficient of divergence, which may be useful in dealing with 
local race problems. 
Suppose a contingency table formed in which the columns are marked by the alternative 
characters under consideration and each row is peculiar to a sub-group or district. Thus 
let the characters be a, (3, y, h ... and the sub-groups a, b, c, d, e .... We have the table : 
TABLE I. 
a 
y 
8 
e 
0) 
Totals 
a 
''ay 
«aS 
ay] 
««« 
h 
oa 
H& 
n, 
be 
c 
11 
ca 
'>cS 
Ce 
d 
»dS 
de 
e 
«e« 
% 
z 
'hy 
%S 
n. 
Totals 
\ 
n 
n 
n 
fa) 
N 
Here the first column gives all the districts or sub-groups which form the total population 
The distribution of the alternative characters in the total population is given in the last row, 
while the last column gives the total frequency of each sub-group. Any number such as niy 
