Prager and MacCall: Contaminant and climate effects on spawning of three pelagic fishes 



317 



cause one of the models contained no contaminant data, 

 agreement between the two models on the importance 

 of contaminant variates was considered an indication 

 of severe and troublesome collinearity between relevant 

 climate and contaminant variables. Alternatively, it 

 might simply indicate no contaminant effects. In the 

 presence of strong contaminant effects, and lacking 

 such severe collinearity, one would expect the ex- 

 planatory effect from the model including contaminants 

 to be more highly correlated with the contaminant 

 variables. 



Selection of explanatory variables 



It is widely recognized by modelers and stat- 

 isticians (e.g., Gilchrist 1984:11) that a pro- 

 found source of uncertainty in statistical 

 modeling is the possibility of error in speci- 

 fying the model's structure. Such specifica- 

 tion error biases parameter estimates and 

 renders most confidence intervals and hy- 

 pothesis tests invalid (Kennedy 1979, Gil- 

 christ 1984). Unfortunately, the possibilities 

 of specification error and its ramifications 

 are often overlooked when statistical mod- 

 els are used in ecology. The models chosen 

 and presented here were undoubtedly mis- 

 specified; they may have included unimpor- 

 tant effects or omitted important ones, and 

 they were limited to a linear functional re- 

 lationship, which is unlikely to be the true 

 one. 



After specifying the linear structure of 

 Eq. 3, choice of variables was the main con- 

 cern. When no theoretical basis exists to 

 guide it, choice of variables in a regression 

 model must be regarded as heuristic. Nei- 

 ther the theory of the underlying disci- 

 pline — ecology — nor that of statistics can 

 answer this question unequivocally, so we 

 were forced to use an empirical approach. 

 We started by retaining only the first 10 

 components from each of the principal-com- 

 ponent analyses. (In each case, this retained 

 -80% of the weighted variance. ) To arrive 

 at a parsimonious model, we fit regression 

 models to all combinations of 10 or fewer 

 variables and ranked the many models by 

 C p , a goodness-of-fit statistic from Mallows 

 (1973). For each combination of stock and 

 data type (climate or combined), we accepted 

 the model with the lowest C p . However, al- 

 ternative models of similar fit were similar 

 in structure. 



Any method of variable selection in which 

 many candidate explanatory variables and 

 combinations of variables are examined may lead the 

 investigator to accept models which fit well solely 

 through chance. This point has recently been empha- 

 sized by Flack & Chang ( 1987). Although the use of C p 

 is a relatively conservative approach, the true statisti- 

 cal significance of our results is unknown. Thus we 

 view the modeling exercise as one of hypothesis gen- 

 eration, rather than hypothesis testing. The limita- 

 tions of inference that come from an empirical choice 

 of model structure are not unique to this study; they 

 pertain to all modeling exercises except those in which 



