324 
MR, R. A. FISHER ON THE MATHEMATICAL 
a maximum for variations of 6 V 0 2 , 0 Z , &c. In this form the method is applicable to 
the fitting of populations involving any number of variates, and equally to discontinuous 
as to continuous distributions. 
In order to make clear the distinction between this method and that of Bayes, we 
will apply it to the same type of problem as that which Bayes discussed, in the hope 
of making clear exactly of what kind is the information which a sample is capable of 
supplying. This question naturally first arose, not with respect to populations distri¬ 
buted in frequency curves and surfaces, but with respect to a population regarded as 
divided into two classes only, in fact in problems of 'probability. A certain proportion, 
p, of an infinite population is supposed to be of a certain kind, e.g., “ successes,” the 
remainder are then “ failures.” A sample of n is taken and found to contain x successes 
and y failures. The chance of obtaining such a sample is evidently 
n\ 
x \y\ 
p’O-pY- 
Applying the method of maximum likelihood, we have 
S (log/) = x log p + y log(l-p) 
whence, differentiating with respect to p, in order to make this quantity a maximum, 
x 
P 
V 
, or 
l-p 
P = 
x 
n 
The question then arises as to the accuracy of this determination. This question was 
first discussed by Bayes (10), in a form which we may state thus. After observing 
this sample, when we know p, what is the probability that p lies in any range dp ? In 
other words, what is the frequency distribution of the values of p in populations which 
are selected by the restriction that a sample of n taken from each of them yields x 
successes. Without further data, as Bayes perceived, this problem is insoluble. To 
render it capable of mathematical treatment, Bayes introduced the datum, that among 
the populations upon which the experiment was tried, those in which p lay in the range 
dp were equally frequent for all equal ranges dp. The probability that the value of p 
lay in any range dp was therefore assumed to be simply dp, before the sample was 
taken. After the selection effected by observing the sample., the probability is clearly 
proportional to 
p x (l — p) v dp. 
After giving this solution, based upon the particular datum stated, Bayes adds a 
scholium the purport of which would seem to be that in the absence of all knowledge 
save that supplied by the sample, it is reasonable to assume this particular a priori 
distribution of p. The result, the datum, and the postulate implied by the scholium, have 
all been somewhat loosely spoken of as Bayes’ Theorem. 
