296 REPORTS ON THE STATE OE SCIENCE, ETC. 



information to be contained in the sample of twenty, with mean of 23-56 cm. and 

 a standard deviation of 1-44 cm. by the use of the appropriate probability table, we 

 can assign a measure of probability of 7 in 1000 for the mean in the population sampled 

 lying outside the range 22-56 cm. to 24-56 cm. (Section D, No. 5b). It follows that, 

 in whichever way the problem is presented, one is justified in concluding that it is 

 most unlikely that the difference between the sample mean (23-56) and the population 

 mean (22-56) can be due to the chance fluctuation of sampling. It is therefore what 

 is termed a significant difference, the cause of which must be sought elsewhere. 



(2) Another type of illustration is as follows : — 



Suppose that in a Mendelian experiment there are theoretical reasons for expecting 

 ratios of 2 : 1 : 1 in three frequency groups. In a sample of forty the following 

 frequencies are observed : 22, 7, 11. There it is possible to say that a divergence 

 from the ' expected ' frequencies of 20, 10, 10 as great or greater than that observed 

 will occur in the long run on fifty-five random samples out of one hundred, or in other 

 words that the divergence is not at all exceptional (Section D, No. 8). 



2. The Disadvantage of Small Samples. 



In both the examples that have been given, the samples contained only a small 

 number of observations. If the distribution of the character or characters in the 

 population is known it is possible to obtain a measure of the probability of drawing 

 a given sample, however small that sample may be. But when the sample only is 

 known, the nature of the population can be inferred with far less precision from small 

 than from large samples. The difficulties in interpretation to which this may lead 

 can again be illustrated by the previous examples. 



(1) Suppose that in addition to the sample of twenty observations with mean 

 of 23-56 cm. and standard deviation of 1-54 cm., there is a second sample of twenty- 

 five with mean 23-14 cm. and standard deviation 1-61 cm. Then in neither case is it 

 possible to estimate the mean of the population sampled with sufficient precision to 

 conclude that the two samples have been drawn from different populations. The 

 position may be put into exact terms by stating that on the evidence available a 

 difference, one way or the other, between the means as great or greater than the 

 observed 0-42 cm. would occur in thirty-sfven cases out of one hundred in the 

 random drawings of two samples from the same population. 



If, however, the samples had been each ten times as large, viz. 200 and 250, it 

 would have been possible to obtain a more precise estimate of the populations sampled, 

 and to infer that they had almost certainly different means. This can be expressed 

 by saying that for these larger samples a difference in means of 0-42 cm. or more 

 would be expected to be found in only five cases out of one thousand (Section D, 

 No. 6). 



(2) If two possible hypotheses existed as to the Mendelian ratios, viz. either 2:1:1 

 or 9 : 3 : 4, the evidence provided by the sample figures 22, 7, 11 would be quite inade- 

 quate to distinguish between them. It has been seen that the odds are 55 to 45 in 

 favour of obtaining so great a divergence from the expected numbers on the first 

 hypothesis, and for the second hypothesis the corresponding odds are 925 to 75 in 

 favour. These figures would not justify the acceptance of one hypothesis rather than 

 the other, for the observations are not improbable on either hypothesis. 



If, however, the sample had been of 400 and the group frequencies 220, 70, and 110, 

 it is found that samples with as or more divergent frequencies would only occur: 



(a) in 25 cases in 10,000 if hypothesis 2:1:1 were true ; 



(b) in 49 cases in 100 if hypothesis 9:3:4 were true. 



It follows that now the evidence is sufficient to show that the first hypothesis is 

 quite improbable, while the second is still in reasonable accordance with the facts 

 (Section D, No. 8). 



The statistical methods available thus allow one to test the validity of various 

 hypotheses in relation both to the nature and to the extent of the data presented. An 

 increase in the number of observations will usually increase the precision of all tests, 

 and may justify conclusions which would otherwise be doubtful. The size of the sample 

 is not, however, always a mere matter of the number of individuals measured. Each 

 unit may be a district, a season, or a complete and lengthy experiment, and for such 

 cases the more exact methods appropriate for small samples will be particularly 

 necessary. 



