Jan. 3. 1921 Correlation and Causation 559 



and A xx is the minor made by deleting row X and column X. 

 •K 2 x(abc • • • n) measures the degree of determination of X by the whole 



A 



set of other factors, and 1— R 2 X ( ABC . • -n)= t — is the maximum possible 



squared correlation between X and a factor independent of those con- 

 sidered. This formula for multiple correlation leads to one for multiple 

 regression. Letting X', A', B' ', etc., be the deviations of variables X, 

 A, B, etc., from their mean values, Pearson has shown that the most 

 probable value of X' for known values of the other variables is given by 

 the formula 



Xi = *M^ + A M & A^AT 



°X Axx °A AxX ^B A xx <T N 



^ 



0X — JV • • • BA a X ~ 0"X 



Analogous but more complex formulae have recently been published 

 by Isserlis (5) for the multiple correlation ratio for use in cases in which 

 the regressions are not necessarily linear. 



CAUSATION 



In all the preceding results no account is taken of the nature of the 

 relationship between the variables. The calculations thus neglect a very 

 important part of the knowledge which we often possess. There are 

 usually a priori or experimental grounds for believing that certain factors 

 are direct causes of variation in others or that other pairs are related as 

 effects of a common cause. In many cases, again, there is an obvious 

 mathematical relationship between variables, as between a sum and its 

 components or between a product and its factors. A correlation between 

 the length and volume of a body is an example of this kind. Just because 

 it involves no assumptions in regard to the nature of the relationship, a 

 coefficient of correlation may be looked upon as a fact pertaining to the 

 description of a particular population only to be questioned on the grounds 

 of inaccuracy in computation. But it would often be desirable to use a 

 method of analysis by which the knowledge that we have in regard to 

 causal relations may be combined with the knowledge of the degree of 

 relationship furnished by the coefficients of correlation. 



The problem can best be presented by using a concrete example. In 

 a population of guinea pigs it will be found that the birth weights, early 

 gains, sizes of litters, and gestation periods are all more or less closely 

 correlated with each other. The influence of heredity, environmental 

 conditions, health of dam, etc., are also easily shown. In a rough way, 

 at least, it is easy to see why these variables are correlated with each other. 

 These relations can be represented conveniently in a diagram like that 

 in figure 1 , in which the paths of influence are shown by arrows. 



