206 LINEAR REGRESSION AND CORRELATION Ch. 7 



For example, the slope of the above sample estimate of the linear re- 

 gression line is calculated to be b = 0.6072, whereas the true slope is 

 known to be (3 = 0.5000. In actual practice, only the b is known, and 

 it is necessary to measure its reliability as an estimate of /3. This will 

 be done later when the necessary techniques have been discussed; but 

 it can be stated here that if the sample has been taken with the X's 

 fixed — as suggested in Figure 7.21 — so that there is no sampling error 

 in X or in x, the b as defined is an unbiased estimate of the parameter /3. 



The value Y which is obtained from formula 7.24 by substituting a 

 particular value for X is described as the estimated average Y for that X. 

 For example, if X is taken as 5, Y = 0.6072(5) + 10.04 = 13.1, ap- 

 proximately. By reference to Figure 7.21 we learn that this estimate 

 is somewhat low because the true average Y for turkeys weighing 5 

 pounds at 16 weeks of age is 13.5 pounds. If X is taken = 8, Y = 

 14.9 pounds, which is nearer to the true average Y of 15 pounds than 

 was obtained when X = 5 and the true Y was 13.5 pounds. It will 

 be seen in a later discussion that greater accuracy in estimating the 

 true average Y is to be expected for X's near the mean X. There 

 often are more sample data near the mean ; but also errors in estimating 

 the |S will cause the ends of the trend line to be swung farther from the 

 true position than is the middle of the line. In the above example 

 the slope was b = 0.6072 instead of /3 = 0.5000; hence the line deter- 

 mined from the sample is too steep and therefore too low at the left- 

 hand end. This appears to be the major reason why the estimate of 

 the true average Y for X = 5 was too small. Of course, the general 

 height of the sample line must be in error to some extent, and this 

 also contributes to the inaccuracy of any estimate made from the 

 sample trend line. 



The method described for obtaining the straight line which fits a 

 linear trend best is called the method of least squares because it 

 makes the sum of squares of the vertical deviations of the points of 

 the scatter diagram from the regression line the least it can be made 

 for any straight line. Table 7.23 has been prepared to illustrate 

 specifically the meaning of this minimization. Columns 6, 7, and 

 8 were obtained from the equation given over the right-hand side 

 of the table. This equation represents a straight line which appears 

 to the eye to fit the trend of the scatter diagram about as well as 

 the line obtained by the method of least squares, as can be verified 

 from Figure 7.22, which shows both lines. 



It should be noted that the total of the fifth column of Table 7.23 is 

 less than that of the eighth column. This will always be true no mat- 

 ter which straight line is used to obtain Yj as long as the equation is 



