284 A Statistical Study in Cancer Death-Rates 
So far then as these figures go we have no evidence to connect meteorological 
factors witli the variable incidence of cancer. This, although a negative result, is 
not, I think, without its importance. 
Finding that the theory of the infectious origin of this disease gained no 
support from the values thus discovered, it occurred to me to correlate the cancer 
rates with those of some other disease. For this purpose I chose diabetes, and for 
the following reasons. (1) Both diseases have very much the same age distri- 
bution. (2) They stand almost alone as being on the increase, while other causes 
of death show declining rates. (3) The aetiology of both diseases is obscure. 
(4) Both being diseases of old age, the heredity factor cannot have been increased 
by the results of modern medical skill, for the prolongation of life that might thus 
have been procured would have been mainly at a period when the procreative 
power was passed. In fact the tendency of modern times is rather to postpone 
marriage to a later period of life, and this might be expected to reduce any 
heredity factor there may be. (5) If there were a common factor in the causation 
of the dual increase a correlation between these diseases might be discovered. 
There seemed then sufficient justification for undertaking the labour of 
calculating correction factors for the diabetes rates. As was to be expected both 
correction factors are similar in value. 
Professor Pearson has shown that when correlation tables are formed between 
rates, in such a way that a common factor occui's in both the variables, a " spurious 
correlation " is obtained. In this case therefore the usual formula r = S{xy)INa-xcry 
can no longer be employed. On his advice* I have therefore used the following 
formula when dealing with the correlation as existing between rates : 
fxy fzx' ^^zy 
Here x = the number of deaths from one disease, y = the number from a second, 
and z = the number of individuals in the district. Throughout this paper where 
the symbol p occurs it refers to values found by this partial correlation formula. 
In dealing with diseases like cancer and diabetes, a significant correlation 
will probably occur when uncorrected deaths are used, due to their similar age 
incidence. This will be most marked in the value found for the races, and 
occupation groups, and least so for the cities. This will be seen by comparing 
the " Coefficient of Variability " for the cancer correction factors for the different 
groups : 
Cities F, = 11-46%. 
States Vg = 14-57 7o- 
Races F,= 38-39 %. 
28 Occupation Groups F„ = 43-86 7„. 
* [This formula gets rid of the "spurious correlation" due to forming "rates," i.e. the common 
population totals, but it does not of course get over the high spurious correlation of the so-called 
"age correction factors." Ed.] 
