56 Life and Letters of Francis Gallon 



Galton sums up his results as follows*. Let x be the deviation of the 

 subject, and y i , y a , y 3 , etc. the corresponding deviations of the correlative, 

 all deviations being reduced to their proper unit of variability, and also let 

 the mean of the y deviations for the given x be y x , then we find : 



(1) That y x = rx for all values of as; (2) that r is the same, whichever of 

 the two variables is taken for the subject; (3) that r is always less than 1 ; 

 (4) that r measures the closeness of correlation. 



It will be seen at once that we have here the first fundamental statement 

 as to the correlation coefficient and its properties. Probably Galton did not 

 recognise that r = does not signify independence of the two variat.es, only 

 the independence of means of arrays. In addition to this, complete independ- 

 ence involves the arrays being similar and similarly placed curves. It was 

 not till normal distributions were seen to be non-universal that the distinction 

 between the vanishing of r and the absolute independence of variates was 

 fully recognised. For the same reason the idea of non-linear regression did 

 not cross Galton's mind. He got as far as an acceptance of the normal 

 frequency distribution permitted. Only when we look at what has happened 

 since 1888, do we realise the importance of that short paper on "Co-relations" ! 

 Thousands of correlation coefficients are now calculated annually, the 

 memoirs and text-books on psychology abound in them ; they form, it may 

 be in a generalised manner, the basis of investigations in medical statistics, 

 in sociology and anthropology. Shortly, Galton's very modest paper of ten 

 pages from which a revolution in our scientific ideas has spread is in its 

 permanent influence, perhaps, the most important of his writings. Formerly 

 the quantitative scientist could only think in terms of causation, now he can 



would not be proportional to the sum of the r m ' even if thoy were all positive. Perhaps a better 

 measure of the same type would be to use <r x 2 , where 



n 



X = S(x„ - *„)-/o-» 2 and x = »j 



l 



hence : tr x a = mean (\ - x) 2 



S (a, - *,) 4 /<r, 4 + 2S' (x, - Jc s f (av - **Y/<rf <r/ 



-2nS{x 6 -x s yi<r? + iA 



= 3« + 26" (1 + 2rJ) - 2ri 2 + ?t 2 



= <2n + \S'(r\), 



if the variates follow normal distributions, and thus cr x " lies between 2»t and 2m 1 . This at any 

 rate would present no difficulty arising from the existence of negative correlations. We see, how- 

 ever, from this result that possibly the best measure, u, of the total correlativity in a system 

 would be simply to take 



n{n-\y 



for in this case u will always lie between and 1, the former value corresponding to no associa- 

 tion in the variates of the system, and the latter to perfect correlation of all of them. 



* Galton has interchanged his x and y variates. The paper shows here as elsewhere signs of 

 haste in preparation. 



U 



