The value of Y may be known for a single value of X (not necessarily 0) . 

 The best prediction should then be sought from among the limited subset of 

 lines through this point. All these lines will have a larger sum of squares 

 (S[y - y] ) than the line that would have been selected by Model I. A simple 

 procedure is described herein for picking from among these restricted candidates 

 the one with the smallest I[y - y]^. Thus, regressing through the origin is 

 but one specific case that can be solved by a general model forcing regression 

 at an arbitrary point. 



III. SOLUTION TO THE PROBLEM 



This report describes a method for getting the best fit to all data points 

 (in the sense of least squares) while forcing an exact fit at any known point. 

 A simple procedure for forcing regression through the origin was described by 

 Hawkins (1980), who indicated the procedure was not well known. The author of 

 this report knows of no references to the general case of an exact fit to an 

 arbitrary point. However, if a fit can be constrained through the origin, then 

 a simple transform of variables can force the line through any given point. 

 The details of the through-the-origin solution will be explained first. 



1. Regression Through the Origin . 



For each set of measured dependent and independent variables observed 

 (yi, x-j^) , also enter, or program, a mirror-image set (-y±, -x-j^) . Thus, the 

 computer is given an extended data set consisting of 2n data points, only n 

 of which were observed. By definition of this extended data set, the depend- 

 ent and all the independent variables each individually sum to zero, forcing 

 a zero intercept: 



a = y - 3 X by the principle of least squares 



a = because Ex and Zy = and thus 



X = y = on the extended data set 



Thus a zero- intercept solution is obtained. Is it still the least squares 

 solution for the observed data set? The principle of least squares by defini- 

 tion minimizes the sum of the squares of the deviations of the observed from 

 the predicted values. Because each squared deviation from the observed data 

 set generates an identical squared deviation in the extended data set, the sum 

 of these two positive sequences is minimized over the extended data set only 

 if it is also minimized over both the observed and the mirror-image sets. 

 Thus, the regression coefficient produced in this manner^not only the least 

 squares solution for the artificially extended data set, but for the observed 

 data set as well. By this artifice the proper estimate is obtained for the 

 regression coefficient (6) with the prediction forced through the origin. 



2. Regression Through Any Arbitrary Point (a, b) . 



If the predicted response (Y) must be a when the independent variables 

 (X) are b, then regress an extended data set u on v, where u = x - a 

 and v = y - b. If (a, b) = (0, 0), then this collapses to the exact 

 situation described above. If (a, b) ^ (0, 0), the direct results, u = 3v, 

 should be unraveled to produce the y prediction: 



12 



