Karl Pearson 45 
As early as 1897 Mr G. U. Ytile*, then my assistant, made an attempt in this 
direction. He fitted a line or plane by the method of least squares to a swarm of 
points, and this has been extended later to u-variates and is one of the best ways 
of reaching the multiple regression equations and the coefficient of multiple 
correlation f. Now while these methods are convenient or utile, we may gravely 
doubt whether they are more accurate theoretically than the assumption of a normal 
distribution. Are we not making a fetish of the method of least squares as others 
made a fetish of the normal distribution ? ' For how shall we determine that we 
are getting a ' best fit ' to our system by the method of least squares ? 
If we are fitting a. curve y = f{x, Ci, c.>, c-) 
to a series of observations we can only assert that least square methods are 
theoretically accurate on the assumption that our observations of y for a given x 
obey the normal law of errors. That is the proof which Gauss gave of his method 
and I personally know no other. Theoretically therefore to have justification for 
using the method of least squares to tit a line or 2>hine to a swarm of points we 
must assume the arrays to follow a normal distribution. If they do not, we may 
defend least squares as likely to give a fairly good result but we cannot demonstrate 
its accuracy. Hence in disregarding normal distributions and claiming great 
generality for our correlation by merely using the principle of least squares, we are 
really depriving that principle of the basis of its theoretical accuracy, and the 
apparent generalisation has been gained merely at the expense of theoretical 
validity. Take other distributions of deviations for the arrays and the method of 
least squares is not the one which will naturally arise from making the combined 
probability a maximum. It is by no means clear therefore that Mr Yule's 
generalisation indicates the real line of future advance. 
I have endeavoured to indicate in this paper the broad outline of the early 
history of correlation which has now a most extensive literature. It is a long step 
from Francis Gal ton's ' reversion ' in sweet pea seeds to the full theory of multiple 
correlation, which we now know to be identical with the spherical trigonometry of 
high-dimensioned space, the total correlation coefficients being the cosines of the 
edges of the polyhedra and the partial correlation coefficients the cosines of the 
polyhedral angles. But to find the correlation of the health of a child with the 
number of people per room while you render neuti al its age, the health of its parents, 
the wages of its father, and the habits of its mother, is no less vital a problem than 
Galton's correlation of character in parent and offspring. It requires indeed more 
mathematics, but the mathematics are not there for the joy of the analyst but 
because they are essential to the solution. It is the transition from the mill as 
pestle and mortar to the mill with steam driven grain crushing steel rollers. But 
the inventor of milling was the person who bruised grain between two stones, and 
Galton was the man who discovered the highway across this new country with what 
he aptly terms " its easy descents to different goals." 
* Journal of Royal Statistical Society, Vol. lx. Part iv, p. 3. 
t Biometrika, Vol. viii. p. 438. The method adopted in the paper is not that of fitting a generalised 
plane by least squares, but of making a generalised correlation coefficient take its maximum value. It 
appeals only to the rules of the differential calculus and not to the method of least squares, or indirectly 
to Gauss' law of errors. 
