ON THE PEOBABLE ERROR OF THE CORRELATION 
COEFFICIENT TO A SECOND APPROXIMATION # . 
By H. E. SOPER, M.A. 
(1) It is very important in determining whether the coefficient of correlation 
as found by any particular method differs significantly from the calculated value to 
know not only its standard deviation but also to have some idea of the nature 
of the frequency distribution. When the numbers dealt with are large, then, 
provided r be not nearly + 1, we may quite legitimately assume a normal 
distribution and calculate the frequency of r on this basis. But if n be small, 
or if r have a value near either end of the range, then the usual values for the 
s.D. of r are not applicable and what is more in the latter case the frequency of r 
is of a markedly skew character and differs widely from a Gaussian curve. In such 
case the value of r found from a single sample will most probably be neither the 
true r of the material nor the mean value of r as deduced from a large number 
of samples of the same size, but the modal value of r in the given frequency 
distribution of r for samples of this size. In this paper the following notation will 
be used : 
p = correlation coefficient of the material from which the sample is drawn ; 
f = mean value of correlation coefficient for N samples of size n ; 
r = modal value of the correlation in the distribution of the values of r as found 
from N samples of size n ; 
r = correlation coefficient of any arbitrary sample of size n. 
The first question we have to answer is what is likely to be the distribution 
of the r's. Clearly, when p differs from unity, it must be a skew distribution of 
limited range lying between + 1 and — 1. The general skew curves discussed 
in Phil. Trans., Vol. 186 a, pp. 343 — 414, have proved themselves so capable of 
describing all sorts of types of frequency that one naturally turns to them in the 
first place in the present problem. There appears very little chance of successfully 
determining — at least for a product-moment table — the distribution of r. We 
must start with the assumption of a reasonable frequency distribution and justify 
* The frequency-distribution of the correlation coefficient in small samples was first discussed by 
" Student" in his paper in Biometrika, Vol. vi. pp. 302-10; he invited further mathematical investiga- 
tion and to a large extent supplied the impulse and direction to the present paper. 
12—2 
