Correlation and Application of Statistics to Problems of Heredity 55 



Table V on Galton's p. 143 is noteworthy. In Column 3 we have the co- 

 efficients of correlation tabled under the now familiar symbol r. In Column 4 we 

 have the values of v 1 — r", to enable the Quartile of the arrays to be found. In 

 Column 5 we have, placed one under the other, the two regression coefficients, 

 and in Column 6 in the same manner the Quartiles of the arrays (i.e. 

 •67449 cr x J 1 — i 9 and -G7449 a- y J\ — r 2 ) *. Throughout, without referring 

 directly to the matter, Galton assumes linear regression and homoscedasticity, 

 i.e. he is thinking in terms of the bivariate normal surface. Next he draws 

 attention to the relation of his present work to his former work on heredity. 



On the fifth line of p. 144, he has the words: " from — to - x — - = 1 



to 0'44, which is practically the same." This should read " from — to 



11 .... „ . 



- x — - = 1 to 0"47, which is identically the same," as it should be since it 



expresses the coefficient of correlation found from the second regression 

 line. Galton emphasises the importance of the reduction in the variability 

 of the array, as measured by v 1 — r 2 , and points out how this affects the 

 efficiency of Bertillon's system of identification by anthropometric measure- 

 ments. Bertillon had asserted that his measurements were independent 

 variates. A reference to Plate LII of our second volume will show that 

 Galton had chosen several of Bertillon's " independent " measurements and 

 determined their actual correlation. 



Galton next outlines a method by which the influence of n variates on 

 another might be determined. He suggests that after transmuting the 

 variates we should sum them, when the probable error of the sum would " be 

 Jn, if the variates were perfectly independent, and n if they were rigidly 

 and perfectly related. The observed value would be almost always some- 

 where intermediate between these extremes, and would give the information 

 that is wanted" (p. 145). 



This would not, I believe, be a feasible method of approaching multiple 

 correlation; it neglects the possibility of negative correlations, and does not 

 provide for the influence on one variate of all the remainder. It is an 

 attempt to obtain a sort of average value of the interlinkage of a system of 

 n variates f. I do not think that at this time Galton had realised the 

 existence and importance of negative correlation. 



* A large proportion of values in the 5th and 6th columns have rather serious numerical 

 errors, corrected by Galton on a copy of the paper in my possession. He also states thereon 

 that he wishes to change the symbol r to p, presumably because he was thinking of it as the 

 " correlation coefficient," not as the regression coefficient, when units are reduced to respective 

 variabilities. The regression coefficient without reduction he had termed »-in his memoir on stature. 



t Let x lt a; 2 , ...x„, ...x n be the n variates, and <r lt <r a , ...«r„...«r H their standard deviations, 



n 



.<•, , .<\, ,...x s ,...x n their means. Then if x = £>(%* — *»)/o" s , we have : 



<r x 2 = M + 25'(»v) 



= n, if all the correlations »■„• are zero, 

 = re + 2 J n (re - 1) = re 2 , if all the correlations are plus one. 

 Hence <j x = Jn and n in the two cases respectively, as Galton says. But the actual value of <r x 



