172 
Miscellanea 
II. On the Curves which are most suitable for describing the 
frequency of Random Samples of a Population. 
By KARL PEARSON, F.R.S. 
(1) In determining the variability of random samples, or in other words in forming the 
probable error of a class frequency, an argument of the following kind is usually adopted : Let 
the chance of occurrence of an individual with a character of the given class be p, and q = l --p 
be the chance of an individual not of this class occurring, then if a random sample of n indi- 
viduals be taken the distribution of M such random samples will have frequencies given by the 
terms of the binomial 31{p + q)": 
The first four moment coefficients of this distribution about its mean* are: 
fi2 = o^n'Pq (i), 
fX3 = chipq{p-q) (ii), 
lJH = c*npq (3 {n-2)pq+l) (iii). 
These lead to 
ft=M37M2^ = — -- (iv), 
' npq n 
/32=^Vm22=3 + _L_!? (v). 
^ 7ipq n 
Now if n be indefinitely large, — and neither p nor q be indefinitely small,— there results 
/3i = 0 and /33 = 3, i.e. no skewness and mesokurtosisf. Accordingly, as is well known, the binomial 
passes over into the symmetrical (or Gaussian) normal curve of errors, with a standard deviation 
c sjnpq. The great bulk of investigators, — at least of the wiser class who know the importance 
of basing inferences on probable errors — are thus accustomed to content themselves with 
calculating the probable error of a class frequency from the formula 
p.E. = '67449 V^jo^' (vi), 
c, the group base, being taken as unity. The odds against the corresjjondence between an observed 
class frequency and its theoretical value are then calculated from tables of the probability 
integral. In other words the distribution of random samples of a class frequency is assumed to 
follow the normal curve 
y^y^^e-'^"'' (vii), 
where a = Jnpq. 
The validity of this process for practical statistics remains unquestioned, provided n is fairly 
large and neither p nor q approximate to zero|. Historically this is the very problem, for the 
solution of which the probability integral and the normal curve were introduced. 
But if any frequency distribution be examined, we find class frequencies, which are them- 
selves small, for example often small classes towards the extreme values of the character, and it 
is not legitimate to put /3i = 0 and ft — -^ = 0 and adopt the normal curve in considering the 
probable error of such class frequencies ; for, although n be fairly large p will be very small and 
np, the frequency of the class in the sample, be possibly only a few units. Thus the value of 
\jl^npq) may easily range from unity downwards. For example, if n=1000 and «jD = 2 or 3 we 
cannot possibly consider the skewness represented by ft = '3 to '5 or the kurtosis ft — 3 = "3 to "5 
as passably corresponding to a symmetrical, mesokurtie Gaussian curve of errors. 
* Pearson : Phil. Trans. Vol. 186 A, p. 347. 
t Biometriha, Vol. iv. p. 173. 
X Thus Mendelian halves and quarters with 100 to several hundred individuals in the series may be 
quite effectively tested in this manner. 
