u=y-b= 3(x-a) 



y = (b - aB) + 6x 



NOTE: The proper estimate of the regression coefficient (B) now forces the 

 prediction through the point (a, b) as desired. By using this procedure the 

 correct regression coefficient is obtained by using any familiar computational 

 routines. The second most frequently reported output from regression programs, 

 the correlation coefficient (r) , is also the correct, unbiased estimator for 

 Model II. 



If additional information is provided by the regression program, then 

 corrections may be necessary before adopting them for the real data set. The 

 estimate of the residual variance will be correct for simple regression (one 

 independent variable) and can be easily adjusted for multiple regression (see 

 Table 1). Any sums of squares, cross products, and F-values produced by the 

 program will be exactly twice the correct values. The standard error of the 

 estimated slope will be too small by a factor of ^. Therefore, the t-value, 

 for testing the zero slope hypothesis, will be too large by the same factor. 



Table 1 indicates the corrections for most of the elements produced by 

 various digression programs. However, employing the described extended data 

 procedures does not require consideration of any part of the output beyond that 

 used in the standard unconstrained approach. 



IV. SELECTING BETWEEN MODELS I AND II 



If either the true or mean value (whichever interpretation fits the situa- 

 tion) of the dependent variable (Y) is unknown for all values of the independ- 

 ent variable in the range of concern, then the customary model (I) may be 

 appropriate. However, if the postulated physical relationship between X and 

 Y dictates constraint through any point (a, b) and the relationship is linear 

 from the maximum observed x to x = a, then Model II should be used. To pro- 

 ceed with the customary evaluation of Model I would be equivalent to ignoring 

 what is already known about the relationship between X and Y and, instead, 

 relying totally on the limited information available in the sample data. The 

 objective should be to obtain the best interpretation of the data, which does 

 not override any more firmly established understanding of the situation. 



Assuming Model II applies, it may still be useful to evaluate Model I to 

 test in the conventional way (Draper and Smith, 1966) the significance of the 

 estimated nonzero intercept. If this test fails to provide enough evidence to 

 reject the strawman hypothesis (H^: a = 0) then this failure may be cited as 

 additional evidence strictly from the data, substantiating the choice of Model 

 II to estimate B. The results of this formal test of hypothesis should not, 

 however, be relied on as the criterion for selecting Model II. It should serve 

 only as a source of auxiliary information clarifying the extent to which the 

 sample data will support the model choice. The choice should be made on the 

 basis of functional insight and understanding of the relationship between X 

 and Y. 



Comparing the correlation coefficients or r-values, produced using the 

 real data and the extended data, is likewise not a valid method for choosing 



13 



