10 



KARL PEARSON 



conveyed in the table, e.g. that the distribution is Gaussian, or that wasp-waist 

 distributions are impossible. 



Consider for example the following table. Actually it is purely hypothetical 

 and small frequencies would occur where I have put empty cells, but I have done 

 this to emphasise the principle involved, which would have acted equally, but have 

 been obscured had I given a real table : 



Mothers — before birth of child. 



o 

 .a 



Si 



o 



Now, if the division here is between "employed" and " unemployed " mothers 

 only, the coefficient of correlation on the assumption of a Gaussian frequency is unity, 

 and the coefficient of association is also unity. Against these we find that the 

 coefficient of contingency is only '707. There cannot be I think a doubt that this is 

 the better estimate. It leaves "293 over in reserve untU we know something of the 

 sub-classifications of mothers' employment before and after the birth of the children. 

 In the example we see that the degree of employment changes after the birth and 

 that a number of the factory workers tend to take in work or to go charing. The 

 coefficient of contingency is now C2 = "755. It will be clear from such an illustration 

 that we should have got a poorer result had we tried to correct contingency on the 

 basis of tables with diagonal cells only occupied being representable by a coefficient 

 of value unity. Contingency very properly allows for the extent of our ignorance, 

 the coefficient of association does not. The correlation coefficient assumes a know- 

 ledge of the exact character of the distribution, and even if a Gaussian distribution 

 be adopted, still no trained statistician would dream of applying it to a table of the 



form 



a 



d 



and argue that the correlation was therefore perfect. 



According to the view taken here we should anticipate that when a fourfold 

 table really represents approximately Gaussian material, then the value of rj^a-r will 

 give a probability fairly closely approaching that deduced from the contingency; 

 on the other hand when the material has no approach to a Gaussian distribution 

 the value of r found by equality of improbabilites will be higher than that deduced 

 by a simple fourfold correlation table. 



To sum up then, the present paper proposes to deal with tables of few cells by 

 using the probability P determined from the square contingency ■)^. I can see 



