October 22, 1915] 



SCIENCE 



575 



SPECIAL ARTICLES 



ON THE COEFFICIENT OF CORRELATION AS A 

 MEASURE OF RELATIONSHIP 



The theory of correlation deals with the rela- 

 tionship between variable quantities in the case 

 where that relationship lies somewhere be- 

 tween functional dependence and complete 

 independence. In the case of normal corre- 

 lation for two variables a certain quantity r, 

 which is zero for complete independence and 

 rt 1 for functional dependence, plays an impor- 

 tant role. The formula for r, in terms of n 

 observed pairs of values of two variables x and 

 y, is 



.2 (xi — xo) {yi — yo) 



(1) 



Vl^ 



Xo)2 ■ .2 (?/.- - 2/o)- 



where a;„ is the mean of the x-values and y„ the 

 mean of the 2/-values.^ This formula has also 

 been given an interpretation for the case of 

 skew correlation- which makes r an important 

 quantity in many instances of such correlation. 

 The quantity r is usually termed the coeffi- 

 cient of correlation and is said to measure the 

 amount of correlation between the variables 

 X and y. This latter statement is too vague as 

 it stands for scientific procedure, so it is de- 

 sirable to state more precisely what is meant 

 by it. In the case of normal correlation r has 

 been shown to have the following significance:' 

 if we take the mean of all the y's corresponding 

 to a given value of x, then the deviation of 

 this mean from the mean of all the y's, divided 

 by the standard deviation of the y's, is equal 

 to r times the deviation of the given a;-value 

 from the mean of all the x's, divided by the 



1 Cf. Pearson, ' ' Eegression, Heredity and Pan- 

 mixia, ' ' PMlosopliical Transactions of the Boyal 

 Society, 187 A (1896); also Bravais, "Analyse 

 matlifimatique sur les piobabiIit6s des erreurs de 

 situation d'un point," AcadSmie des Seienoes: 

 Mgmoires prgsentfe par divers savants, Ser. 2, Vol. 

 9 (1846). 



2 Cf. Yule, ' ' On the Significance of Bravais 's 

 FormulEe for Eegression, etc., in the Case of Skew 

 Correlation," Proceedings of the Eoyal Society, 

 Vol. 60 (1897). 



3 Cf. Pearson, I. c. 



standard deviation of the a;'s. Thus r may be 

 said to measure the tendency of a given devia- 

 tion from the mean in one of the variables to 

 be associated with an average deviation from 

 the mean of corresponding magnitude in the 

 other variable. 



It is clear that the value of r throws much 

 light on the relationship between two variable 

 quantities in the case of normal correlation. 

 It is not apparent, however, that it gives us 

 in every instance the information we are most 

 interested in obtaining, and it will be shown 

 in what follows, that in certain cases of inter- 

 est in the applications of the theory of corre- 

 lation it will not necessarily give it. 



The formula (1) is well adapted to the com- 

 putation of r from observed values of x and y. 

 For our purposes, however, we need a formula 

 which exhibits r as a function of the under- 

 lying variable quantities that determine x and 

 y and the relationship between them. We shall 

 now proceed to obtain such a formula on the 

 basis of assumptions similar to those that 

 Pearson used in his derivation of (1).* 



Let 

 (2) x = A(e„ .,...,.,„), 



y — f'M, f=, •■•, Em), 



where the e's are independent variables that 

 follow a Gaussian distribution, and the /'s are 

 analytic functions. If we expand the right- 

 hand members of (2) abovit the mean values 

 of the e's and neglect higher powers than the 

 first,^ we have 



X — X„ = Oiii7i -I- a-i^-q, -f- . . . + a^m-nm, 



y — Vo = fibi'?! + 0=::'?= + • • • + a.,„-nm, 



(3) 



where the ij's are deviations of the c's from 

 their mean values and a;„ and 2/„ are mean 

 values of x and y, respectively. 



Since the e's are independent variables fol- 

 lowing a Gaussian distribution, we have 



iL. 0. 



5 Pearson assumes that the variations of the e's 

 from their mean values are small in comparison 

 with those values, in order to justify the dropping 

 of higher powers. It is more general to assume 

 merely that for the range of values of the e's con- 

 sidered, the /'s are sufficiently good approxima- 

 tions to linear functions to warrant the neglect of 

 higher powers. 



