706 APPENDIX 



If ;- = -f i, all the individual points of the population will lie on the line 

 of regression, and we can therefore, when one character is given, tell 

 exactly what the associated character is in magnitude. In this case the 

 correlation is said to be perfect positive correlation. Similarly, if r = i, 

 the correlation would be perfect negative correlation. 



Three variables. The theory of correlation is easily extended to apply to 

 more than two variables. For example, we might investigate the correlation 

 of the statures of sons with respect to the statures of both parents. This is 

 the case of biparental inheritance treated in the text, page 529, and the for- 

 mulas there used must be special cases of those which we are about to derive 

 for giving the most probable value of a variable z where z is the numerical 

 value of a character correlated with characters of measurements x and y. 



Suppose that the following system of corresponding deviations from the 

 means have resulted from measurement: (x^y^z^ (x v y v z^ (x^y** **)* 

 . . . , (x n ,y n iStJ). Represent these measurements with respect to coordinate 

 axes in three dimensions. These axes are to be taken at right angles to 

 each other, as is conventional in analytic geometry, and may be referred 

 to as the x, y, and z axes. It now requires two letters to mark an array of 

 z's. We shall call (.r/, j/) the mark of a class. Now imagine the means 

 plotted for every ^-array. If correlation exists, these means will not lie at 

 random in space, but will arrange themselves approximately on a surface 

 called the " surface of regression." The equation of a surface is of the form 

 z=f(X)y}. We shall consider only the case where this/ 1 function is of 

 the first degree, for the same reasons that we considered only the case of 

 a first degree function in the case of correlation of two variables. 



It results that the required function is 



* = r *" ~ yi yz ~ x + Ty \ ~ r * Mr S y =y* 0) 



where r yz is the correlation coefficient between the y- and ^-characters, and 

 similar meanings are to be given to the other r's, as indicated by the sub- 

 scripts. This equation gives the mean value of the ^--arrays corresponding to 

 given x andj, if they can be given by a linear function. If they cannot be 

 accurately given by a linear function, this equation must merely be looked 

 upon as giving a first approximation. 



Standard deviation of arrays. If the equation (i) be used to estimate the 

 value of the ^-character corresponding to a selected x and y, we have the 

 square of standard deviation of each ^-array about this estimated value 

 given by the expression 



^ ) 



as an average value, where the summation extends to all the observations ; 

 and 



r xg - r xy r yi <r g r yg - r xz r xy <r x 



-^-~, and * = ^ ;rW- 



I - r. 



