FORCING REGRESSION THROUGH A GIVEN POINT USING ANY 

 FAMILIAR COMPUTATIONAL ROUTINE 



by 

 Edward B. Hands 



1. INTRODUCTION TO REGRESSION 



The engineer frequently needs to estimate some response or dependent variable 

 Y (e.g., sand transport rate, change in shoreline position, or structural dam- 

 age), when given the magnitude of other factors, or independent variables X 

 (e.g., longshore wave energy flux, storm frequency, elevation of storm surges, 

 etc.). A common approach is to assume a linear model, 



Y = a + BX + e (Model I) 



then adopt the principle of least squares; and use sample data to estimate the 

 unknown parameters, a and 3- Both 6 and X can be considered as strings 

 of numbers in the case of multiple regression with several independent varia- 

 bles; e indicates that the response is not being thought of as an exact linear 

 function of X. The e represents random and unpredictable elements in Y; 

 therefore, e does not appear in the prediction equation: y = a + 3 x, where 

 y, a, and 3 are estimates of the corresponding components in the conceptual 

 Model I. The assumption that e has an expected value of zero indicates that 

 the "average" response is considered linear. If e varies widely. Model I, 

 though conceptually correct, may have only limited predictive value. In such 

 a case the estimated mean value of Y would frequently be thrown off by noise 

 in the data. If e varies only slightly, good predictions will be possible 

 provided good estimates of a and 3 are available. Adopting the principle 

 of least squares means one is willing to define the best estimates of a and 

 3 as those that minimize the sum of the squares of the deviations between the 

 observed and predicted values (i.e., y and y) . 



Customarily, no constraints are placed on the contenders for the best fit 

 line. Of all possible lines in the XY plane, the prediction equation is 

 chosen because it has the least sums of squares of deviations in y's from the 

 data points. The y- intercept, ci, is the point where the best fit line inter- 

 sects the Y-axis. The a may be of special interest, e.g., in the regression 

 of current speed against longshore wave energy flux measured in a field test 

 (Fig. 1). An intercept substantially above zero would suggest that during the 

 test a component of the longshore current was driven by mechanisms other than 

 waves (e.g., tides or winds). In this case, the nonzero intercept would not 

 only be meaningful, but would also provide a good estimate of the velocity of 

 any steady, nonwave-generated coastal current during the test. 



An additional example of unconstrained regression would be where greater 

 and greater structural damage occurs as the wave forces exceed an undetermined 

 threshold value. Again Model I applies and produces the correct regression 

 coefficient (3). In the process it produces a meaningless response intercept 

 well below zero (Fig. 2). In contrast with the previous example, the interest 

 here is strictly in the prediction of future damage for given wave forces, not 

 in the value of the intercept itself. The resulting linear relationship applies 

 only to values of the independent variable above the threshold of wave effect. 



