150 
Fishery Bulletin 119(2-3) 
stock or region specific is common practice (NRC, 1998; 
Dichmont et al., 2016). Indeed, many of the age-based mod- 
els used for stock assessments conducted in the United 
States use software developed for generic use. 
In some literature reviews, the minimum data require- 
ments, output, and projection capabilities of stock assess- 
ment models in the United States have been compared to 
facilitate model choice (NRC, 1998; Dichmont et al.’). In 
addition, simulation-based research has been designed to 
evaluate the performance of various models and their abil- 
ity to meet the needs of fishery managers (Smith et al., 
1993; Sampson and Yin, 1998; Cadrin and Dickey-Collas, 
2015; Deroba et al., 2015). However, comparison of assess- 
ment models through both side-by-side code comparison 
and simulation tests remains scarce. 
Four age-structured stock assessment models are used 
most commonly in the United States. We refer to them 
as models, although in reality they are software packages 
that may be configured to represent various forms of math- 
ematical models. Three of the models, the Age Structured 
Assessment Program (ASAP) (Legault and Restrepo, 1999), 
the Beaufort Assessment Model (BAM) (Williams and 
Shertzer, 2015), and Stock Synthesis (SS) (Methot and Wet- 
zel, 2013), are frequently used for assessments on the East 
Coast. On the West Coast of the continental United States, 
recent assessments are primarily conducted by using SS. 
The development trajectory for the Assessment Model for 
Alaska (AMAK) (AFSC, 2015) is similar to that for the ASAP. 
Alaska fishery scientists created simple age-structured 
statistical models by using a program that implements 
automatic differentiation, AD Model Builder (e.g., Ianelli 
and Fournier, 1998; Fournier et al., 2012), but they mostly 
tailored them for the individual stock characteristics and 
types of data available. From these bespoke models, a more 
general model, the AMAK, was developed and applied to a 
number of stocks, such as walleye pollock (Gadus chalco- 
grammus) in the Aleutian Islands region (Barbeaux et al., 
2019). Although there are additional models and diagnos- 
tic tools that can be used to assess fish stocks (Dichmont 
et al.), these 4 age-structured assessment models are the 
main approaches applied in the United States for “data- 
rich” stock assessments, and they share similar conceptual, 
mathematical, and statistical frameworks. 
The varied development and preference among regions 
for different assessment models may be attributed to spe- 
cial requirements or features of a stock assessment model 
given the availability of observed data (data collection pro- 
grams vary regionally) and length of time of commercial 
fishery operations (periods of hundreds of years on the East 
Coast versus periods that began roughly after World War II 
in Alaska or more recently on the West Coast) that create 
different states of initial depletion. Additional reasons for 
different software and modeling approaches include inertia 
and continuity with past practices (or application to similar 
' Dichmont, C. M., A. R. Deng, A. E. Punt, and L. R. Little. 2017. 
Stock assessment integration: a review. Fish. Res. Dev. Corp. 
Rep. 2014-039, 106 p. CSIRO Publ., Hobart, Australia. [Avail- 
able from website.] 
stocks), region-specific training, and the presence of local 
expertise (Cadrin and Dickey-Collas, 2015). The availabil- 
ity of different assessment approaches may provide flexibil- 
ity, but it also requires testing to determine how different 
assumptions affect results. A first step is to test whether 
various models of a given type produce similar estimates 
when configured similarly without introducing misspec- 
ifications to the models. Then identifying sources of any 
differences can inform and improve assumptions used in 
actual assessments (e.g., NRC, 1998). 
Simulation testing provides a means to evaluate the 
accuracy of individual assessment models because we know 
the correct values used to generate the data. An operat- 
ing model (OM) is configured to reflect hypotheses about 
true stock dynamics and is the basis for generating age- 
structured stock assessment inputs for each assessment 
model (which are referred to as estimation models [EMs]). 
The OM-EM framework to fit EMs to simulated data (with 
errors) has been used previously to assess the ability of 
assessment models to estimate stock conditions (Wetzel 
and Punt, 2011; Henriquez et al., 2016). Deroba et al. (2015) 
conducted both self-tests and cross-tests from a simulation- 
estimation framework to compare the robustness of assess- 
ment models to error. A self-test fits an assessment model 
to the simulated data generated from the same assessment 
model, and a cross-test fits an assessment model to data 
generated from a different model (Chang et al., 2015; Deroba 
et al., 2015). They highlighted that the lack of robustness 
in self-tests may indicate bias and that a lack of robust- 
ness in cross-tests may indicate differences in structural 
assumptions between assessment models. To avoid the bias 
introduced during the cross-test process, we attempted to 
develop an OM based on common features of the 4 EMs. 
The aim of this study was to improve our understand- 
ing of both the similarities and differences among 4 pri- 
mary age-structured stock assessment models used in the 
United States. To our knowledge, this evaluation is the 
first in which a comprehensive comparison of source codes 
and a simulation-estimation analysis of these models has 
been conducted. This study specifically addressed the fol- 
lowing 4 primary questions: What are the key features 
and source code that need to be examined before develop- 
ing an OM for comparing multiple EMs? Do the EMs give 
similar and accurate estimates under a range of cases? 
What are the sources of differences in estimates, if there 
are any? What recommendations can be drawn for future 
model development after examining the similarities and 
differences of the 4 EMs in our study? Addressing these 
questions is critical for improving the understanding of 
current models and for developing next-generation stock 
assessment models (Punt et al., 2020). 
Materials and methods 
General framework 
To compare assessment models, we conducted a comparison 
of key features in the code from the 4 EMs as well as OM-EM 
