FISHERY BULLETIN: VOL. 80, NO. 3 



harvests are regressed against conditions during 

 the same year, 1 yr ago, 2 yr ago, etc. 



In the case of species which do not spawn in 

 Maryland and where environmental conditions 

 in the Chesapeake Bay would not influence year 

 class size (i.e., menhaden and bluefish), any sig- 

 nificant correlations arising would either be the 

 result of how the Chesapeake Bay environmental 

 conditions influence the availability of the spe- 

 cies to Maryland fishermen, or how its con- 

 ditions might be correlated with critical condi- 

 tions at the remote spawning site. Oysters and 

 striped bass, being the longer lived of the species 

 of interest, were regressed against conditions as 

 long ago as 9 yr in the past. Conditions affecting 

 the remaining species were investigated over the 

 past 5 yr. 



As mentioned in the introduction, we wished to 

 limit our attention to those variables which are 

 most likely to be good predictors of future har- 

 vests. In conventional stepwise regression, that 

 variable which increases the goodness-of-fit by 

 the greatest amount (usually measured by R 2 , the 

 percentage of the variance explained by the 

 model) is included as the next variable in the re- 

 gression equation. We chose instead to enter that 

 variable which improved the model prediction of 

 independent data points by the greatest amount. 



To implement this alternate criterion we ran- 

 domly chose 25% of our data to be reserved for 

 testing. At each step in the regression all of the 

 remaining variables were entered in turn into a 

 least squares multiple regression using the re- 

 maining 75% of the data (employing subroutine 

 GLH from the Univac 9 STAT-PAK library). The 

 coefficients derived for each entry were then 

 used to see how well they would predict the test 

 values of the dependent variable. That variable 

 whose inclusion generated the greatest improve- 

 ment in fitting the test data (as measured by the 

 sum of the squares of the deviations) was entered 

 into the prediction equation. 



Ivakhnenko et al. (1979) suggested that one 

 should continue to include terms until the predic- 

 tion can no longer be improved. It became appar- 

 ent during the first few runs, however, that with 

 six or eight degrees of freedom in the test data, 

 statistically insignificant improvements in pre- 

 dicting the independent data were occurring. 

 Accordingly, no variable was added to the pre- 

 diction equation when its F-to-enter statistic 



'Reference to trade names does not imply endorsement by 

 the National Marine Fisheries Service, NOAA. 



(calculated on the fit to the test data) dropped 

 below 3.5. This somewhat low level of confidence 

 (a little below 90%) was chosen so as not to ex- 

 clude potential predictors early in the screening 

 process. It should also be pointed out that be- 

 cause of the small number of points in the test 

 data, a relatively large percentage of the error in 

 the test data must be explained to meet this F-to- 

 enter criterion (40-60% in our trials). 



By separating test and regression data in a 

 random manner, it was always possible that by 

 chance the set of test data chosen for any single 

 run was unduly influenced by high or low pro- 

 duction years. Such bias in the test data could 

 result in a predictor accurate only under particu- 

 lar circumstances. Hence, it was necessary to 

 run several (and in the later stages of the screen- 

 ing process, many) trials with different ran- 

 domly chosen sets of test data. Presumably, the 

 predominance of any single sequence of predic- 

 tor vectors among the various trials would be an 

 indication that the associated model might be a 

 robust tool for forecasting. Once the functional 

 form of the best predictor has been chosen, the 

 parameters of this equation are redetermined 

 using the full data set. 



The sequence of searches outlined above 

 should provide a necessary (although not suffi- 

 cient) test for prediction formulae. 



RESULTS AND DISCUSSION 



To facilitate easy recognition of the environ- 

 mental variables in the regression equations that 

 are to follow, we adopt a two-letter, one-digit 

 code to designate each of the 260 possible predic- 

 tor vectors. The first letter will be either A, C, E, 

 or X according to whether the processed variable 

 represented an annual average, cumulative de- 

 viation, episode, or extremum, respectively. The 

 second letter will designate air temperature, 

 water temperature, daily precipitation, or salin- 

 ity by T, W, P, or S, respectively. When it is 

 necessary to distinguish between high or low de- 

 viations of these variables, the low values will be 

 designated by writing the second letter in the 

 lower case. Finally, the digit will designate the 

 number of years lag behind the harvest figures. 

 As examples, Cs3 would indicate cumulative low 

 salinity 3 yr in the past, whereas EW2 would de- 

 note the longest episode of high water tempera- 

 tures 2 yr ago. 



After the field of predictor variables for each 

 fishery had been narrowed to five or fewer, 1,000 



614 



