MISCELLANEA. 
I. The Correction to be made to the Correlation Ratio for Grouping*. 
By STUDENT. 
Using the ordinary notation viz. « X() = the number in the x array of y's whose mean is at x p , 
j/ Xp = the mean of this array, N the total number in the sample, and y the general mean of y, 
we have rf defined by the relation 
. S{n x {y x -yf} 
.v,/ «■ 
If rf is required to fit a regression curve to the actual observations as in Professor Pearson's 
original memoir " On the General Theory of Skew Correlation and Non-linear Regression," no 
correction is necessary. 
But if we require a ratio which shall remain constant under wide variations of grouping 
and of number in the sample and which shall consequently be more comparable from one sample 
to another, there are two corrections to be made. 
The first of these has already been given by Professor Pearson {Biometrika, Vol. viii, p. 256), 
and he has expressed it as follows : — If ^ 3 be the value of rf actually found by the use of (i), 
and if be the value which would be found from an infinitely large sample, then if k be the 
number of x arrays 
1-(k-2)/iV 
■ (H) 
But there is a further effect of grouping which has not hitherto been noted and which can be 
evaluated as follows : 
Suppose the x p array to be divided into elementary x arrays and let y p be the mean of 
the x p elementary array and n p its frequency. 
Then clearly the proper contribution of the x p array to rj 1 is 
N*,f 
This is equal to 
S {Wp(&»- 1 
[S K {y H - W) + 25 {n p (y Xp - y) (y p - y Xp )} + S {n v (y„ - y Xfi f}\ 
Now y x? — y is of course constant for this summation, S (n p )=n ^ and S {n p (y p - y Xp )}=0, 
herefore the contribution to rf 
% &p - W S {rip (y p -y x )) 
= N„* + aw (m) : 
* See above p. 118 of this Journal. 
