Sec. 7.3 STANDARD DEVIATION ABOUT TREND LINE 213 



lar X = Xi. It is true that the variance of y + X{b is the sum of the 

 variances of the two terms, but this will not be proved here. 

 It can be shown that the ratio 



•& 



(7.38) t = — . 



Sy 



where n y . x is the true average Y for the given X, follows the ^-distribu- 

 tion with n — 2 degrees of freedom. This fact makes it possible to place 

 a confidence interval on n y . x with any appropriate confidence coefficient. 



The meaning of the ix y . x can be made clearer by reference to Figure 

 7.21. For each particular X there is a frequency distribution of the 

 corresponding F's. This distribution of Y'a has a true arithmetic 

 mean, which is the p. y - x for that X. 



If we wish to make an interval estimate which applies to an indi- 

 vidual rather than to a group mean, we must take account of the 

 greater variation exhibited by such individuals as compared to the 

 group. For example, suppose that a study has been made of the rela- 

 tionship between the ages of Kansas females and their basal metabolism 

 rates as expressed in calories per square meter of surface area per hour. 

 It is supposed that the age interval chosen is such that a linear relation- 

 ship exists between these two variables, and that the least-squares 

 equation for Y has been obtained from a sample. Suppose, further- 

 more, that the equipment needed to determine the basal rate is not 

 available in a certain area, and a Kansas woman 25 years of age wishes 

 an estimate of her basal metabolism rate as a matter of interest. The 

 best point estimate is the Y calculated for X = 25 ; but when an inter- 

 val estimate is needed — and it is more useful in the present problem — 

 account must be taken of the fact that this woman is not supposed to 

 be an average person representing all those who are Kansas females 

 25 years of age. She is regarded as one particular person who wishes 

 an estimate of her own basal rate. In this circumstance the variance 

 of Y used earlier in this section is not correct because it includes only 

 two sources of variation : one from the mean and one from the sampling 

 regression coefficient. In the present problem a third source must be 

 included, namely, individual variation about the mean. When the 

 particular X has been taken into account, this additional variance is 

 just s y . x 2 ; hence — again it turns out that this can be added to the 

 other two components — we obtain the following formula for the vari- 

 ance of the Y for an individual : 



..2 _ 2 



1 (X - x) 2 



H h ^~ 



n 2(.t ) . 



