Sec. 7.4 COEFFICIENTS OF LINEAR CORRELATION 217 



Therefore, it is clear that the observed variability of the F's about the 

 regression line will be large or small according to the size of [H}(xy) 2 

 -f- 20 2 ) compared with the size of 2(?/ 2 ). If [Z(xy)\ 2 /?,(x 2 ) is multi- 

 plied by w(?/ 2 )/2(i/ 2 ) — which equals 1 and only changes the form of 

 the quantity by which it is multiplied — it follows from (7.41) that the 

 S(F — Y) 2 can be expressed as follows: 



2(7 - YY = 2(2/ 2 ) 



j _ P(xy)¥ 



VY~2 



(^)-2(2/ 2 )J 



[2(xy)] 2 



Clearly, the quantity — has the following statistical features : 



2(^)-2(^) 



(a) Its value cannot be less than zero nor more than one because it 

 is essentially zero or positive, and if it exceeded one the sum of 

 squares of deviations from the trend line would be negative, which is 

 absurd. 



(6) If this quantity is near zero there is about as much scatter of 

 the sample points about the trend line as about the horizontal line 

 Y= y; hence there is little or no linear trend. 



(c) If this quantity is near one there is very little scatter about 

 the regression line; hence the sample points lie quite close to that line. 



(d) As the size of this quantity varies — for different samples — 

 from zero to one the scatter of the sample points about the least- 

 squares regression line varies from a completely trendless, shot-gun, 

 pattern to a perfect fit to a linear trend line. 



(e) This quantity is unitless so that the features noted above are 

 true regardless of the units in which Y and A' are measured. 



(/) In its present form this quantity cannot distinguish between 

 positive and negative slopes of trend lines, but its square root would 

 have the same sign as b and would make this distinction if the square 

 root of the denominator were always taken as positive. 



It follows then that the square root noted in /, 



2(:n/) 



{7A2) r = vWyW) ' 



is a unitless number within the range —l^r^-\-l which indicates 

 the direction and strength of the observed linear trend. This number, 

 r, is called the product- moment coefficient of linear correlation be- 

 tween two measurements X and Y. It obviously is subject to sam- 

 pling variations and therefore has a sampling distribution. It is a 

 sampling estimate of a corresponding population parameter indicated 



