22 THE MATHEMATICAL TREATMENT OF DATA 



If we had wanted to fit a more complicated curve, the procedure would 

 have been entirely similar. For example, suppose we had decided to try 

 something quite complicated, such as 



y = a sin bx + c log x. 



Aside from the purely mechanical problem of carrying out the calculus 

 operations, the procedure is entirely similar. We find expressions for 

 a, b, and c so as to minimize the sum of the squares of the deviations 

 from this expression, thereby obtaining three equations for the three 

 parameters. We could, in fact, go as high as five parameters with these 

 data, and then we would be able to fit the data exactly at the experi- 

 mental points, although the curve might fluctuate wildly between those 

 points. 



There are other methods of obtaining curves to fit data, but the curves 

 obtained by the least-squares principle are the generally accepted ones. 

 Further, there are many subtleties in the least-squares approach which 

 the student can find in the standard texts. 



2. Correlation 



We can say a few things about the concept of correlation. Here we 

 ask the question of the extent to which variations in one variable are 

 accompanied by a parallel variation in another variable. It must be 

 strongly emphasized that there is not the slightest implication of a direct 

 connection between the two variables. The pedagogically favored illus- 

 tration of this point is the finding of an almost perfect proportionality 

 between teachers' salaries and liquor consumption. The pedagogue then 

 declares that it is manifestly ridiculous to deduce the validity of the 

 causative relation — the teachers are drinking all that liquor. If we study 

 the details of the situation, we discover that teachers' salaries rise during 

 prosperous times and during such times the entire population has extra 

 money, some of which is spent on liquor. 



In the way in which we will introduce correlation, we look at the extent 

 to which points cluster about the presumed line expressing the possible 

 correlation, and compare it with the clustering around the mean value 

 of all the points. Consider Fig. 9. 



If we call y the average value of //, then the variation around the 

 average is £(// y) 2 . 



The variation around a proposed line (to be found theoretically) is 



H(>J - - 2/th) 2 . 



