Li et al.: A comparison of 4 age-structured stock assessment models 
population can be estimated directly as parameters. In all 
4 EMs, this feature of estimating the initial numbers at 
age is implemented by estimating the initial age compo- 
sition as deviations from an equilibrium age composition. 
The level of recruitment that anchors the equilibrium age 
composition is calculated by using the spawner-recruit 
relationship. In other words, the initial equilibrium 
recruitment is lowered from the unfished recruitment 
level by an initial equilibrium F’. However, in the AMAK, 
the initial equilibrium stock size is treated independently 
of historical fishing F because it typically is used in situa- 
tions where this value would be negligible. 
In this study, results from the comparison between 
case 0 and case 12 indicate that the AMAK, which does not 
specify the initial F, will scale the RO downwards to match 
the numbers at age 1 in the initial year and, consequently, 
the EM will produce lower MSY and SSBysy compared 
with the true values. The magnitude of RE will increase 
when initial F becomes higher. 
This study was not designed to compare the perfor- 
mance of EMs under different initial population condi- 
tions, and the initial F and recruitment variability were 
fixed at low levels in cases 0 and 12. The initial numbers 
at age in year 1 may be quite different from those for the 
unfished equilibrium populations, especially when fish- 
ing occurred many years prior to the first year of data 
or with large variability in recruitment. When fishery or 
survey composition data are available near the start of 
commercial fishing, the estimates of unfished condition 
or stock status may accurately reflect the true condi- 
tions when initial equilibrium stock size is treated inde- 
pendently of historical fishing F in an EM. In addition to 
having this configuration option, the ASAP, BAM, and SS 
allow the initial numbers at age to be controlled through 
different processes (Legault and Restrepo, 1999; Methot 
and Wetzel, 2013; Williams and Shertzer, 2015). Future 
work on assessment model development should consider 
which options are most accurate and efficient for comput- 
ing initial numbers at age. 
Spawner-recruit parameters: median-unbiased or 
mean-unbiased 
The results from cases 10 and 11 have clear implications 
for bias adjustment of recruitment. First, we identified 
the fundamental differences between 2 bias adjustment 
methods (Table 4, Suppl. Table 3 [online only]). In the BAM, 
median-unbiased spawner-recruit parameters are used, 
and in SS mean-unbiased parameters are used. Therefore, 
the inputs and outputs of RO and h from the 2 models are 
comparable only after conversion, for example, by using the 
function introduced here (Equations 11-14). These find- 
ings highlight the importance of clarifying in assessment 
reports and meta-analyses whether estimates of spawner- 
recruit parameters correspond to geometric mean or arith- 
metic mean curves of recruitment and the significance of 
the need for developing functions for conversion of mean- 
unbiased parameters to median-unbiased parameters 
(and vice versa) for other spawner-recruit models (e.g., 
Ricker spawner-recruit model; Hilborn, 1985). In various 
studies, bias adjustment of recruitment has been imple- 
mented differently, but no study has clearly demonstrated 
the strengths and weaknesses of different bias adjustment 
methods (Walters, 1990; Chen, 2004; Yin and Sampson, 
2004; Methot and Taylor, 2011; Subbey et al., 2014). We 
recommend further work on bias adjustment to derive 
conversion functions for other spawner-recruit models 
and to provide clear guidance on which estimation process 
(mean-unbiased or median-unbiased) might be preferred 
under different situations. 
Second, we established that ad hoc bias adjustment of 
recruitment can be implemented in EMs that do not have 
the bias adjustment feature (Suppl. Fig. 2) (online only). 
The ad hoc adjustment affects recruitment and fishery 
management parameters, such as MSY-based reference 
points. In this study, we found that the AMAK and ASAP 
produced estimates of RO and MSY-based reference points 
that are similar to the true values, if the estimates from 
those models were adjusted from a median-unbiased rela- 
tionship to a mean-unbiased relationship. 
Limits and future research 
In addition to the specific recommendations coming from 
the issues found in this study, we think the comparison 
design could be extended to address other specific needs, 
such as quantifying the value of estimation of time-varying 
parameters as random effects (e.g., numbers at age, selec- 
tivity, and F), estimation of spawner-recruit parameters, 
data weighting, spatial structure, and other attributes 
to the performance of EMs. Growth was assumed to be 
known in this study because not all EMs have the capa- 
bility to estimate growth. We can further compare EMs 
with more complicated cases, such as those that involve 
estimating growth within the assessments or using lower 
quality weight-at-age data. It would also be useful to con- 
duct comparisons across life history patterns (e.g., pat- 
terns of long-lived versus short-lived species or patterns 
of demersal versus pelagic species), but further work on 
development of more complex OMs for simulation testing 
would be required. Punt et al. (2020) outlined essential 
features that should be considered for the next-generation 
stock assessment model, and they highlighted the impor- 
tance of simulation testing in evaluation of estimation 
performance. Continued development of the OM used in 
this study through addition of essential features would 
result in an OM that can serve as an independent test 
bed to validate existing models as well as next-generation 
stock assessment models. 
The comparison framework used in our study focused 
on age-structured models. Other age-structured stock 
assessment models that were not included in this study 
can be evaluated by using the comparison framework and 
creating connection files that automatically write input 
files, run the model, and save standard outputs. In addi- 
tion, the comparison framework can be further applied 
to include other types of stock assessment models (e.g., 
surplus production, length-based, and catch-only models). 
