4 KARL PEARSON 



really a random sample from material in which the two variates are independent. 

 Tables for finding P from x' have been calculated by Mr Palin Elderton, the well- 

 known actuary*. 



By determining the value of P, we are always in a position to ascertain 

 the improbability of independence or 1 — P is a proper measure of the grade of 

 relationship. Unfortunately we do not think in millions, and to say that P = 718/10*" 

 gives us a very poor mental estimate of the interrelationship of two quantities, 

 compared with the simple statement that their coefficient of correlation is "60. We 

 do not think on such an extended scale of figures as the improbability scale provides 

 us with, and we are bound to ask ourselves whether it is not possible to translate it 

 into the simpler ideas of correlation. We might ask : what is the probability P' that 

 in a population of N individuals an observed correlation r has arisen not from real 

 association but from random sampling? We should then reduce our correlation to 

 a probability scale. There is no difficulty about such a process at all, it depends 

 solely on the distribution of frequency of r in random sampling. By simply equating 

 the above value of P to P', we could then determine on a correlation scale— that 

 is on a scale readily appreciable, the improbability of a given deviation being due to 

 random sampling and not to true association. We should say it is as unreasonable 

 (or as reasonable) to suppose this contingency has arisen by random sampling in a 

 population of N individuals as to suppose that a correlation coefficient of magnitude 

 r could arise solely from random sampling. Thus r would not be used in any way 

 to represent features of linear or other regression lines, but solely as an artifice for 

 transferring to an adequate mental scale improbabilities often sensible only in the 

 30th or 40th decimal place. 



Now the improbability of r, arising from a random sampling of material having 

 its variates unassociated, depends on the size of the standard deviation of r, and this 

 size depends on the method by which r is determined. It is not the same when 

 found from (i) a product moment table assumed to represent a Gaussian frequency!, 

 or (ii) from a fourfold table representing the same frequency divided at its meansj, 

 or again (iii) from a fourfold table of Gaussian frequency divided very far from 

 its means §, or lastly (iv) from a product moment table for a frequency which is very 

 far from Gaussian ||. Hence to obtain a scale of correlations by which to represent 

 contingency improbabilities, we must select the nature of the method by which r is 

 supposed to be reached as well as the size of the population. It will not do to 

 say that "67449 (1 —r^)/v/iV" or, for zero real association '67449/7^^, is the probable 

 error of r, because this probable error depends on the determination of r by a method 

 which is never applicable to a fourfold table. It seems needful to select our corre- 

 lation scale to be such : (i) that the standard deviation of our correlation will vary 



* Biometrika, Yol. i. p. 155. 



t Pearson and Filur, Phil. Trans. Vol. 191, a, p. 242, 1898. 

 X Sheppard, PhU. Trans. VoL 192, a, p. 148. 

 § Pearson, Phil. Trans. Vol. 195, a, p. 14. 



II Sheppard, Phil. Trans. Vol. 192, a, p. 128. See also Pearson, Drapers' Company Resea/rch Memoirs, 

 II. p. 20. 



