THE CORRELATIONS OF AREAS OF MATURED CROP AND THE RAINFALL. 375 



The question is one of some difficulty. , . 



Let x m y*n be the co-ordinates of any character, as here of rainfall and crop 

 respectively. 



Let N m be the total frequency of the occurrence of the character x m , that is in a 

 unit interval enclosing x m . Here x m , y m are the true values of the characters. Sup- 

 pose, then, that x m (rainfall) is correctly measured, but that y m (crop) is incorrectly 

 measured as k, n y m ■ It is reasonable to assume that in measurements of the cropped 

 area, the error is proportional to the area measured, but that the multiplying factor 

 has a random distribution, so that if y m is constant the erroneously measured y w 's 

 have the same mean value as y m , but are 'normally ' distributed about this mean. 



Then the errors in measurement will alter the distribution of each array correspond- 

 ing to a given x m , but N m will remain unaltered. 



To make the process clearer consider first the case of perfect 

 linear correlation. 



Then let P be point on the regression line, and \etx m be its 



abscissa. Then the whole array is concentrated at P, thus there 



are Nm individuals with characters x m y m say at P. Now when 



the erroneous process of measurement of the kind supposed is applied 



to these N m individuals, the y m character will become distributed in „ 



* Fig 1^73. 

 an array about P with a standard deviation about P of ky m , P 



remaining the centroid of the array, and k is to be a constant for 



all the arrays. That is to say the new distribution, that is the distribution from 



which, in practice, the correlation and regression coefficients are estimated, becomes 



such that the regression remains linear but is not ' normal , ' and the correlation is 



reduced below unity. 



Put °> m for the old standard deviation of the m th array, which in this case is 

 zero, and " ym ' for the new standard deviation, which is equal to ky m . Let °> be the old 

 standard deviation for the whole system about an axis parallel to Ox and <*y the new 

 standard deviation. 



And let N be the total frequency. Then we have at once 



2V(<y*+F)=S \N m {y„?+° y J*)\ 

 where S denotes a summation, and y is the ordinate of the centroid of the whole system. 



= S \N m y* m {i+k % )\ 



=(i+£ a ). N{y+°s) 

 Thus v a =V+£ a (y*+V)- 



Now let 1,1' denote the old and new product moments of the distribution about the 

 axes o x , o y . Then, clearly, 1=1', and therefore the product moment about the 

 centroid is unchanged. Hence if r' is the new (erroneous) coefficient of correlation 



