Karl Pearson 
97 
(2) The basal idea is so extraordinarily simple that one is inclined to believe 
that it must have been noticed before. I am not able, however, to refer to any 
previous mention of it. 
I start from the hypothesis that the regression is linear. Accordingly, if a 
volume of the frequency be cut off from the frequency surface by a vertical 
plane at a given value of the variate B, the vertical through the centroid of this 
volume cuts the regression line. If p and q be the coordinates of this point 
of section measured from the means of the two variates, p, q lies on the regression 
line and we have, a-^ and being the standard deviations of the two variates and 
r their correlation, 
Hence 
r = ^ (i) 
So far there is no assumption of a Gaussian distribution. Now p is the mean 
value of the ^-vai'iate, for all the pairs with specially marked i?-variate; thus in the 
example, in the first illustration I have given above, the mean age of all candidates 
who passed the examination. And ctj is the standard deviation of the measured 
character and can therefore be found, e.g. in the same illustration is the age vari- 
ability of all candidates. Thus the numerator in Equation (i) can always be 
found. The next point is to consider how the denominator can be discovered. 
Now the £-variate is not given quantitatively, but we are given the percentage 
of B beyond the arbitrary division, i.e. in our illustration the number out of the 
candidates who succeed in passing. We cannot therefore find qja^hy the usual 
processes of determining a mean. If, however, we assume the i^-variate to follow 
reasonably closely a Gaussian distribution, the percentage of the i?-variate gives, 
by means of the probability-integral tables, the ratio oi y/a.^ for the distance from 
the mean at which the 5-variate is divided, and then 
9 _ 
Here both numerator and denominator are known as soon as y/a^ has been 
found. They are for example the z and ^(1 - a) of Sheppard's Tables (Biometrika, 
Vol. II. p. 182). 
Thus by the simple hypotheses that the regression is linear, and that a fairly 
close value of the mean of the marked part of the J5-variate can be found on the 
assumption that it has a normal distribution, we can readily find numerator and 
denominator of the value of r as given by E(]uation (i). 
Biometrika vii 13 
