Stockhausen and Fogarty. Removing observational noise from time series data using ARIMA models 



93 



et al., 1999). We derived annual time series of estimated 

 total abundance during the spring and fall seasons for 

 nine finfish species from trawl survey catch records for 

 the Georges Bank area. Reported catches (biomass) from 

 strata encompassing Georges Bank were expanded on a 

 per-tow basis by using species and length-specific cor- 

 rections for catchability and coefficients from Edwards 

 (1968) and Harley and Myers (2001). Annual stratified 

 mean (expanded) catch-per-tow was then calculated for 

 each species for both seasonal surveys. Finally, annual 

 total population abundance was calculated by applying 

 a swept area factor to the stratified mean catch-per- 

 tow. Abundance indices corresponding to fall surveys 

 spanned 1963-2002, and those corresponding to spring 

 surveys spanned 1968-2003. Following Pennington 

 (1985), we assumed that changes in catchability induced 

 a lognormal error structure on the observed time series. 

 However, because some time series (notably those for 

 the schooling pelagic fish herring and mackerel) includ- 

 ed zeroes, a Inlx) transformation was not applicable to 

 all series. Thus, before further analysis, all time series 

 were ln(A:-i-l) transformed. 



For each resulting time series, we used SAS (vers. 8, 

 SAS Institute, Gary, NC) to identify candidate ARIMA 

 models, to estimate parameters for these models, and to 

 extend the time series by using forecasts and hindcasts 

 before applying the smoothing algorithm. Candidate 

 model structures for each time series were based on ex- 

 amination of empirical autocorrelation, partial autocor- 

 relation, and inverse autocorrelation functions for the 

 series (Box and Jenkins, 1976). When these functions 

 indicated that the series was nonstationary, we applied 

 the backward difference operator (l-S) to the series 

 and examined the correlation functions for the new 

 (differenced) series. In addition, because of its special 

 significance in terms of interpretation, we always tried 

 a RWUPN model as a candidate. Once candidate models 

 were identified, we estimated coefficients for each model 

 and calculated the associated Akaike information cri- 

 terion (AIC; Akaike, 1973). AIC provides an objective 

 criterion based on information theory for selecting the 

 "best" approximating model from among a group of good 

 candidate models (i.e., the criterion selects the model 

 with the minimum AIC). 



We examined the residuals and the empirical au- 

 tocorrelation, inverse autocorrelation, and partial au- 

 tocorrelation functions of the residual time series for 

 significant deviation from white noise. When significant 

 deviation was indicated, we dropped the model from fur- 

 ther consideration. We also dropped models with orders 

 ^P,D,Q^ that were inconsistent with the assumption of 

 additive (after log transformation) observation noise 

 iQ^P+D). Following this screening procedure, we were 

 left with a group of "good" alternative models. We then 

 selected the ARIMA model with the smallest AIC from 

 among the remaining candidates as the "best" model to 

 smooth the data. 



We developed a MATLAB (vers. 6.5, The Mathworks 

 Inc., Natick, MA) program to perform the noise reduc- 

 tion for each time series based on an extended version 



of the series and its associated ARIMA model. To allow 

 smoothed estimates to be calculated near the ends of 

 each time series, we extended each time series before 

 smoothing to 40 years before its start by hindcasting 

 with the selected ARIMA model and to 40 years past its 

 end by forecasting with the model. Using the MATLAB 

 program, we then calculated K*, the maximal smooth- 

 ing weights o){B) corresponding to oJ^= K*, and the 

 smoothed estimates E(Zi\y), following the approach of 

 Box et al. (1978) outlined previously. 



Results 



The abundance indices we derived from fishery-indepen- 

 dent bottom trawl surveys for nine finfish species during 

 two seasons displayed a wide variety of trends, as well 

 as a high degree of apparent interannual variability. 

 This variability may be associated with environmentally 

 driven, high-frequency changes in catchability, but we 

 regarded it here as observation noise (Figs. 2 and 3). For 

 example, springtime winter skate (Leucoraja ocellata) 

 biomass appeared to increase by a factor of six during 

 the early 1980s from a lower (but highly variable) mean 

 level of -140,000 t, to which it subsequently returned in 

 the early 1990s (Fig. 2A). Yellowtail flounder {Limanda 

 ferruginea), in contrast, declined by a factor of 10 during 

 the 1970s and early 1980s from a high at the beginning 

 of the time series and then began to rebound in the latter 

 1980s (Fig. 2G). Most recently, yellowtail flounder appear 

 close to regaining their earlier peak abundance. 



The 18 ARIMA models corresponding to the In(xH-l)- 

 transformed abundance indices formed a diverse set 

 (Tables 2 and 3). Half the time series were found to be 

 nonstationary; however, one application of backward 

 differencing (d=\) sufficed to achieve stationarity in all 

 nine instances. Interestingly, the ARIMA models for all 

 nine of these time series were consistent with RWPUN 

 models (i.e., [0,1,1] models). For all the models, autore- 

 gressive orders (p) ranged between and 3, and mov- 

 ing average orders (q) ranged from 1 to 5. The spring 

 and fall models for little skate (L. erinacea) had the 

 highest AR order, and the spring model for silver hake 

 {Merluccius bilinearis) had the highest MA order. Only 

 the models for little skate, Atlantic herring (Clupea 

 harengus), and yellowtail flounder exhibited the same 

 ARIMA order for both the spring and fall. And although 

 none of the models reflected pure AR processes {q = 0. 

 an impossibility given the observation noise assump- 

 tion), two were found to be pure MA processes (winter 

 flounder and Atlantic mackerel [Scomber scombrus], 

 both in spring). All nine IMA processes were RWPUN 

 processes (i.e., (0,1,1)). 



The effect of ARIMA-based noise reduction on these 

 18 sets of indices was fairly variable in outcome (Figs. 2 

 and 3). Little or no smoothing occurred for little skate 

 (spring. Fig. 2B), silver hake (spring. Fig. 2D), and 

 haddock (Melanogrammus aeglefinus) (fall. Fig. 3F). 

 Moderate smoothing occurred for winter skate (spring. 

 Fig. 2A; fall. Fig. 3A), little skate (fall. Fig. 3B), 



