Correlation and Application of Statistics to Problems of Heredity 5 



the vast system of factorial genetics which has arisen from Mendel's peas — 

 and this even in the theory of heredity. We see now what Galton might 

 have done, he might have provided us with data to check Johansen's later 

 bean-weight experiments, he might have thrown light on the "pure line." 

 [ He might possibly have reached the correlation coefficient instead of the re- 

 1 Agression slope in his first attempt to get a measure of correlation. Whatever 

 he might have done, he reached the idea of regression before he reached that 

 of the coefficient of correlation. As long as he was dealing with heredity in 

 the same sex, the approximate equality of variabilities in the two genera- 

 tions preserved him from any great error. 



Galton was driven to his second problem by Bertillon's system for the 

 identification of criminals. Bertillon claimed, as I remember Dr Garson did 

 at a much later date, that the measurements chosen were practically inde- 

 pendent. Galton needed a criterion to show whether such measurements as 

 head length, foot length, stature, etc. were or were not associated. He saw that 

 the problem closely resembled that of heredity, but he was troubled by the 

 fact that the slope of his regression line depended on the units in which its 

 two component variables were measured. It was not till more than 13 years* 

 after his first attack on the subject that Galton realised, namely in 1889 during 

 a walk in Naworth Park, that the two problems were identical, provided 

 each character were measured in its own variability as unit (see our Vol. II, 

 p. 393). With that provision the slope of the regression line becomes what 

 we now term the coefficient of correlation. It is needful to realise this 

 history of Galton's. progress : namely that he reached regression and even 

 the constancy of the array variabilities 12 to 14 years before he formulated 

 his coefficient of correlation, in order to understand fully the sequence of 

 his memoirs on this topic. 



One further fact it is necessary to bear in mind in order to measure his 

 achievements. He started like Quetelet from the normal curve as describing 

 the deviations of a population or of any selected population, e.g. that of an 

 array of offspring from a parent of given character. \ He did not start with a 

 general definition of correlation and see whither that would lead him. His 

 justification was that he was dealing with anthropometric characters or 

 measurements on living forms whose deviations from type approximately 

 followed this special law of distribution. Thus he naturally reached a straight 

 regression line, and the constant variability for all arrays of one character 

 for a given value of a second f. It was, perhaps, best for the progress of the 

 correlational calculus that this simple special case should be promulgated 

 first; it is so easily grasped by the beginner. But it has had the disadvan- 

 tage that certain branches of science, as psychology for example, have rarely 

 got further, and, without taking the trouble to apply tests, adopt linear 



* In his Natural Inheritance, 1889, p. 79, Galfcon says his sweet-pea data were collected 

 more than 10 years previously. His lecture at the Royal Institution, Feb. 1877, shows that he 

 was then already in possession of sweet-pea data, and the first measurements seem to have been 

 made in 1875. 



t What we now term "homoscedasticity." 



