558 



Journal of Agricultural Research 



Vol. XX, No. ; 



+ i or — i . For many purposes it is enough to look on it as giving an 

 arbitrary scale between + i for perfect positive correlation, o for no corre- 

 lation, and — i for perfect negative correlation. 



The correlation ratio (tj) equals the coefficient of correlation if the rela- 

 tion between the variables is exactly linear. It does not, however, depend 

 on the assumption of such a relation, and it is always larger than r when 

 the relations are not exactly linear. It can only take values between 

 o and + i , and it can be looked upon as giving an arbitrary scale between 

 O for no correlation and i for perfect correlation. 



The numerical value of the coefficient of correlation (r) takes on added 

 significance in connection with the idea of regression. It gives the aver- 

 age deviation of either variable from its mean value corresponding to a 

 given deviation of the other variable, provided that the standard devia- 

 tion is the unit of measurement in both cases. The regression in terms 

 of the actual units can, of course, be obtained by multiplying by the 

 ratio of the standard deviations. Thus, for the deviation of X correspond- 

 ing to a unit deviation of Y, we have reg x . Y = r XY — • This formula may 



be deduced from the theory of least squares as the best linear expression 

 for X in terms of Y. The formula for what Galton later called the coeffi- 

 cient of correlation was, in fact, first presented in this connection by 

 Bravais (i) in 1846. Any such interpretation is of course impossible 

 with the correlation ratio. 



The numerical values of both coefficients, however, have significance in 

 another way. Their squares (rj 2 , or r 2 if regression is linear) measure the 

 portion of the variability of one of the variables which is determined by 

 the other and which disappears in data in which the second is constant. 

 Thus if Y (r 2 x is the mean square deviation of X for constant Y, Pearson 



has shown that: 



y <r 2 x = o- 2 x (i-T? 2 x . Y ) 



or yc 2 x = 0' 2 x( i— -y2 xy) ^ regression is linear. 



It often happens that it is desirable to consider simultaneously the 

 relations in a system of more than two variables. For such cases, involv- 

 ing onlv linear relations between the various pairs of variables, Pearson (6) 

 has devised the coefficient of multiple correlation. 



R, 



x(abc • • • n) 



-4 



in which 



A = 



1 



