t 
508 Distribution of the Correlation Coeflcients of Samples 
In the second of these two papers the more difficult problem of the frequency 
distribution of the correlation coefficient is attempted. For samples of 2 the 
frequency distribution between the only two possible values — 1 and + 1 was 
TT . TT . 
determined by Sheppard's theorem to be in the ratio ^ + sin~^ p : ^ — sm~^ p, 
where p is the correlation of the population. Besides this theoretical result, 
" Student " appeals only to experimental data. From these he derives an 
empirical form for the distribution when p = 0, and makes several valuable 
suggestions. It has been the greatest pleasure and interest to myself to observe 
with what accuracy " Student's " insight has led him to the right conclusions. 
The form when p = 0 is absolutely correct, and as a further instance I may quote 
the remark* "I have dealt with the cases of samples of 2 at some length, because 
it is possible that this limiting value of the distribution, with its mean of 
2 . . /2 . \^ . 
— sin~^p and its second moment coefficient of 1 — (— sin~ip) , may furnish a clue 
TT \7r / 
to the distribution when n is greater than 2." As a matter of fact it is just these 
quantities with which we shall be concerned. 
To Mr Sopor's laborious and intricate paper I cannot hope to do justice. 
I have been able to establish the substantial accuracy and value of his approxima- 
tions. It is one of the advantages of approaching a problem from opposite 
standpoints that Mr Soper's forms are most accurate for those larger values of n, 
where the exact formulae become most complicated. 
2. The problem of the frequency distribution of the correlation coefficient r, 
derived from a sample of n pairs, taken at random from an infinite population, 
may be solved, when that population can be represented by a normal surface, 
with the aid of certain very general conceptions derived from the geometry of 
n dimensional space. In this paper the general form will first be demonstrated, 
and for a few important cases some of the successive moments will be derived. 
Incidentally it will be of interest to compare the exact form with Mr Soper's 
approximation, and with reference to the experimental data supplied by "Student." 
If the frequency distribution of the population be specified by the form 
, 1 \ (x- 2p {x - Wi) [j) - vij) (y - OT2)ii / 
27ra-^ o-o V 1 — p^ 
where df is the chance that any observation should fall into the range dxdy, then 
the chance that n pairs should fall within their specified elements is 
^-P'^^ 2.1., 2.,2 iaw,dy,...dx,,dyn:.a). 
(2'7ro-i<T^\/l- pY 
and this we interpret as a simple density distribution in 2n dimensions, 
* Biometrika^ Vol, vi. p. 304. 
