MISCELLANEA. 
I. On the *6st of Goodness of Fit. 
By KARL PEARSON, F.R.S. 
In a paper published in the Philosophical Magazine for July 1900, pp. 157^ — ^175, I dealt with 
the following problem : A very large population is sampled, say, the population Wi, W2, ••• ««) ••■ 
with total N, and any individual sample is Wi, m-i, ... ... m^, total M. The " probable constitu- 
tion " is given by : 
, M , M , M , M 
If a large number (jf samples of size M are taken, what is the distribution of variations from 
the "probable constitution" in these samples? 
I showed that if the distribution of categories were such that no category contained a few 
isolated units, then the distribution depended on the calculation of x^ — '^i '^^ '^^^^ > V^^' 
vided a value for the probability P that samples would not diverge more than any given sample 
from the "probable constitution." This process is now familiar to statisticians as the x^i P test. 
The sole limiting conditions were that the samples should be random, and each should be of 
the same size M. 
In some cases the " probable constitution " {m' series) can be found at once because the dis- 
tribution of the sampled population is known a friori. In other cases the values of the m' series 
have to be approximated to, and such approximations are the general rule in all discussions of 
probable error. 
We say for example that the standard deviation of the mean of a sample taken from an 
indefinitely large population of size N and standard deviation a is o-/v' n, where n is the size of 
the sample. 
We say that the standard deviation of second moment-coefficients of samples of size n is 
where /i2 ( = o"^) s-i^d /X4 are the second and fourth moment-coefficients of the population sampled. 
In fact every constant of the sample has a probable error determinable in terms of the constants 
of the sampled population. All these distributions of deviations from " probable constitution " 
are true for perfectly general but random samples of size n drawn from our indefinitely large 
population. 
But unfortunately in a considerable number of cases that sampled population is unknown to 
us ; we have no direct means of finding fi^, H-i> ^^c. What accordingly do we do ? Why we replace 
the constants of the sampled population by those calculated from the sample itself, as the best 
information we have. And the justification of this proceeding is not far to seek, fx^ as found for 
the sample will only differ from the fi^ of the sampled population by terms of the order l/s/n; 
for example if we are not dealing with small samples, and a' be the standard deviation of the 
sample, a' differs from a- by terms of the order (r/yf ^n and accordingly the standard deviation of 
the mean is written (r'/Jn when it is really o-/v'". This method of treating probable errors is 
univer.sal in the case of fiiir sized samples to-day and .scarcely needs justification. In writing the 
