CHAPTER 7 



Linear Regression 

 and Correlation 



It often is advantageous to consider two types of numerical meas- 

 urements simultaneously because they are related to each other. For 

 example, the following table records the mean monthly temperatures 

 from January to July at Topeka, Kansas, along with the month of 

 the year: 



Month of the Year: Jan. Feb. Mar. Apr. May June July 



Mean Temperatures (degrees 



Fahrenheit) 38.0 41.7 54.0 66.0 74.4 83.8 88.7 



If the month to which each temperature applies were to be ignored, 

 these temperatures simply would be seven numbers which might fall 

 in the following random order (obtained by drawing them at ran- 

 dom) : 88.7, 54.0, 66.0, 38.0, 83.8, 74.4, and 41.7. In this form the 

 numbers seem to be quite variable about their arithmetic mean, 

 63.8°F. However, when considered in conjunction with the month 

 as the second variable, these temperatures follow an orderly pattern. 

 This point is illustrated graphically in Figures 7.01 A and B, in which 

 temperatures are first plotted against the random order in which they 

 were drawn, and then against the month to which they apply. 



Figure 7.01A merely re-emphasizes the remarks made above about 

 the excessive variability about the mean, 63.8, and suggests that such 

 an average would be of doubtful utility because the temperatures are 

 too inconsistent. But it appears from Figure 7.01B that the mean 

 temperatures for the first six months of the year increased in quite 

 an orderly manner from month to month, with little deviation from a 

 linear upward trend. Hence, a better analysis of these data can 

 be obtained by taking proper account of the second variable, time. 



A straight line is drawn into Figure 7.01^4, 63.8 units above the 

 horizontal axis, to represent the arithmetic mean of the tempera- 

 tures whose individual magnitudes are indicated by the ordinates of 



192 



