Sec. 7.2 DETERMINING LINEAR TREND LINES 205 



As stated above, the a and 6 will be calculated in terms of the col- 

 lective amount by which a line misses the points of the scatter dia- 

 gram. The a and the b also are considered estimates of the popula- 

 tion parameters a and /? in the true linear regression equation, 



(7.22) F = a + 0(X - y), 



where ju = the true mean of the X's. 



For reasons given earlier for using 2(X — x) 2 to measure variation 

 about the mean instead of either 2(X — x) or w| X — x |, it is found 

 advisable to use 2(F — F) 2 to measure the scatter of the F's about 

 the trend line. Therefore, the best-fitting straight line has been 

 chosen as that one for which the 2(F — F) 2 has the least possible 

 size. This action makes the standard deviation about the trend line as 

 small as possible. The mathematical process of achieving this goal 

 produces formulas from which the a and the b can be computed. When 

 these values are substituted into formula 7.21 a specific equation of a 

 regression line is obtained. This line will have the property that the 

 standard deviation about it is as small as it is possible to make it for 

 any straight line. In other words, the variability of the F's has been 

 reduced as much as it can be in consideration of their linear trend 

 with X. 



The formulas for a and b are as follows : 



(7.23) a = y, and 6 = = — > 



2(X,-- x) 2 2(x 2 ) 



where y = mean of the F's in the sample and y = the deviation of a 

 F from the mean, y. The 2(xy) — which the student has not met 

 before in this book — is (Xi — x) ( Y\ — y) + (X 2 — x) ( F 2 — y) + • • • 



+ (X n - x)(Y n -y) = x x y x + x 2 y 2 H h x n y n * 



For the data of Table 7.22, a = S(F)/n = 439.0/30 = 14.63, 

 b = 2(x?/)/2(x 2 ) = 23.0200/37.912 = 0.6072, and x = S(X)/n = 

 226.8/30 = 7.56. Therefore, since y + b(X - x) = bX + {y - bx), 



(7.24) F = 0.6072X + 10.04. 



Students in a statistics course are in an unusually fortunate position 

 because when they take samples from laboratory populations they can 

 see readily how well, or poorly, certain features of their samples agree 

 with the corresponding features of the populations being sampled. 



* Experience shows that beginners in this field tend to think that 2(xy) = 

 S(x) -2(y). If the reader will recall that S(x) = S(X — x) = for any set of data 

 — and likewise for ~Z(y) — it becomes apparent that 2(x?/) is not the same as S(j) • 

 S(j/) or it always would be zero. This obviously is untrue. 



