304 Probable Error of a Correlation Coefficient 
Also, it' an indefinitely large number of such differences be taken, it is clear 
that the means of the distributions will have the value zero. Hence, if the 
correlation be determined from a fourfold division through zero we can apply 
Mr Sheppard's* result that if A and B be the numbers in the large and the 
ttB 
small divisions of the table respectively cos^ = R, where R is the correlation 
of the original system. 
But if a pair of individuals whose difference falls in either of the small divisions 
be considered to be a random sample of 2, their r will be found to be —1, while 
that of a pair whose difference falls in one of the large divisions is + 1. Hence 
the distribution of r for samples of 2 is AN at +l,and EN at —1, where A-\-B=\, 
, „ COS"' R 
and B = — . 
77 
When R = 0, there is of course even division, half the values being + 1, and half 
cos ' '66 
— 1; when R = -66, B= ='27l, therefoie J. = 729, and the mean is at 
77 
•729 - -271 = -458. The S.d. = Vi -(-4.58)2 = -889. j^- noteworthy that the mean 
value is considerably less than R. 
I have dealt with the cases of samples of 2 at some length, because it is possible 
2 . 
that this limiting value of the distribution with its mean of — sin~' R and its second 
^ 77 
moment coefficient of 1 — ^- sin~'^ R^j may furnish a clue to the distribution when 
n is greater than 2. 
Besides these series, I have another shorter one of 100 values of r from samples 
of 30, when the real value is '66. The distributions of the various trials are given 
in the table. 
Several peculiarities will be noticed which are due to the effects of grouping, 
particularly in the samples of 4. Firstly, there is a lump at zero ; with such small 
numbers zero is not an uncommon value of the product moment and then, whatever 
the values of the standard deviations, ?- = 0. 
Next there are five indeterminate cases in each of the distributions for samples 
of 4. These are due to the whole sample falling in the same group for one variable. 
In such a case, both the Standard Deviation and the product moment vanish and 
r is indeterminate. 
Lastly, with such small samples one cannot use Sheppard's corrections for the 
Standard Deviations, as r often becomes greater than unity. So I did not use 
the corrections except in the case of the samples of 30, yet on the whole the values 
of the Standard Deviations are no doubt too large. This does not much affect the 
values of r in the neighbourhood of zero, but there is a tendency for larger values 
* Phil. Tnnis. A. Vol. cxcii. p. 141. 
