248 
" Goodness of Fit " in Statistics and Physics 
and (iii) that the distribution of the means of arrays is for the larger arrays approxi- 
mately normal, but that for the smaller arrays the distribution is approximately 
symmetrical and markedly leptokurtic. The small arrays will, however, have 
small relative weight and this is the only and not very satisfactory reason for 
adopting a Gaussian system throughout. Slutsky* assumes the normality without 
I think adequate investigation, and for that reason I have been unable to consider 
his paper as adequate and final in this matter. 
With the assumption stated above we can use for the multiple regression 
surface 
Z = Z^e ' ' "^'w/' S , 
where n" j, = (l - + 2- 
Here is the observed value of the mean of the j^th array, and mj,, cr^^^ and 
are constants which if it were feasible ought to be obtained from the sampled 
population. S denotes a summation for all values of /J from the first to the last 
array. Now if we suppose m.p, oy,^^ and Hj, known, we can calculate 
for, say, the series of t arrays, and as there will be t independent variables all we 
have to do is to determine from this value of the value of P, the probability, 
corresponding to it in the tables for "goodness of fit"f under the value n' = t + I. 
This applies to any form of frequency surface giving any type of regression line, 
i.e. locus of fiij,. 
Thus far the problem looks straightforward, but now arises the difficult question 
as to what values are to be given to n^,, nij, and ct^^^, which represent the unknown 
sampled population. Usually in problems, as of probable error, where we have 
the imknown constants of the sampled population we replace them by the corre- 
sponding constants of the sample. But it appears a somewhat arbitrary course 
to do this in the present instance (as suggested by Slutsky J) for and ct^^, but 
not for nij,. I do not think it accordingly legitimate to substitute for n^, and 
a^^ the sample values and leave TOj, to be determined from other considerations. 
Clearly since our object is to test the goodness of fit of the regression fine we have 
to replace m.p by / where 
™3> =/(2/3>) 
gives the regression curve or mean value of the array of cc's corresponding to the 
value yj, of the other variate y. Of course this regression curve is determined 
from the whole series of observations and not from an individual array. But 
* "On the Criterion of Goodness of Fit of the Regression Lines, etc." Journal of the R. Staiistical 
Society, Vol. lxxvii. p. 79. 
■f Tables for Statisticians, p. 26. 
t loc. cit. pp. 78-84. 
