jan. 3,1921 Correlation and Causation 561 



external conditions are favorable, which also favors long gestation periods 

 and vigorous growth. 



The coefficient of correlation is a resultant of all paths connecting the 

 two variables. It would be valuable in many cases to be able to deter- 

 mine the relative importance of each particular path. The usual method 

 in such cases is to calculate the partial correlation between two variables 

 for a third constant, using Pearson's well-known formula 



c ab n n — r-. S ; 



-V( I -^AC)(I-^BC) 



for correlation between A and B for constant C. Such partial correla- 

 tions, however, must be interpreted with caution. It is true that by 

 making constant a connecting link between two variables, whether it is 

 a common cause or the cause of one and effect of the other, we eliminate 

 the path in question. This elimination of connecting paths in which the 

 constant factor is a link is not, however, the only way in which correlation 

 is affected. If an effect of a number of causes is made constant, spurious 

 negative correlations appear among the causes and their other effects. 

 Thus, if weight at 33 days is made constant, the correlation between 

 birth weight and gain necessarily becomes — 1 . We are simply picking 

 out a population in which any deficiencies in birth weight happen to be 

 exactly balanced by excess in gain after birth. This is an extreme case, 

 but where the relations of cause and effect are at all complex it is evident 

 that the correlation between two variables may be changed in more than 

 one way by making a third variable constant, making the interpretation 

 doubtful. 



Where there is a network of causes and effects, the interrelations could 

 be grasped best if a coefficient could be assigned to each path in the 

 diagram designed to measure the direct influence along it. The following 

 is an attempt to provide such a coefficient, which may be called a path 

 coefficient. 



DEFINITIONS 



We will start with the assumption that the direct influence along a 

 given path can be measured by the standard deviation remaining in the 

 effect after all other possible paths of influence are eliminated, while 

 variation of the causes back of the given path is kept as great as ever, 

 regardless of their relations to the other variables which have been made 

 constant. Let X be the dependent variable or effect and A the inde- 

 pendent variable or cause. The expression a x . A will be used for the 

 standard deviation of X, which is found under the foregoing conditions, 

 and may be read as the standard deviation of X due to A . In a system 

 in which variation of X is completely determined by A, B, and C we 

 have o- x . A = CT CB (r x representing the constant factors, B and C, and 

 also the variation of A itself (a A ) by subscripts to the left. The path 



