202 
Fishery Bulletin 109(2) 
S a , = 100 
x 
( 1 ) 
simulation models was computed by using absolute dif- 
ferences according to the following equation: 
where X a and X s are quantities from the assessment 
and simulation, respectively. 
In contrast, for steepness (h) and catchability (q ; ) the 
absolute discrepancy (S abs ) between the assessment and 
X.. 
( 2 ) 
Figure 2 
Two selectivity functions (logistic and double normal) used in 
simulation models to evaluate interactions between mortality 
and selectivity. 
50000- 
— 45000- 
(fi 
CD 
g 5 40000- 
o 
c 
0 
1 
is 25000 
35000" 
30000- 
20000 ' 
15000" 
10000 - 
5000- 
o- 
1960 
1970 
1980 
1990 
Year 
2000 
— r- — i r 
2010 2020 
Figure 3 
Time series of spawning outputs from simulation (Sim) and 
stock synthesis (SS3) assessment models from run 1. Closed 
circles are median simulation outputs and open circles are 
median assessment outputs. Assessment outputs are barely 
visible because most of them overlap simulation outputs. Lines 
are 2.5% and 97.5% of quantiles from simulation (dashed 
lines) and assessment (solid lines) model outputs. 
To test the congruence of the simulation and assess- 
ment models, 3000 runs were conducted by using the 
default setup between the simulation and assessment 
models (Table 2). Note that in the default setup 
(run 1), simple logistic selectivity was used in both 
the simulation and assessment models. Likewise, 
M was constant and correctly specified in both 
models. Thus, no model specification errors existed 
in fits of the default model. For all other simula- 
tion scenarios, 500 simulations were conducted. 
Early testing runs indicated that 500 simulations 
were sufficient to capture the range of outputs. 
Median values from the simulation-assessment 
runs were then computed along with 2.5% and 
97.5% of percentiles. 
Performance of the assessment models was also 
measured by using two performance statistics. 
The first statistic measured the percentage of 
SS3 runs that were completed (% run completed). 
Runs were considered completed whenever the 
program finished estimation, regardless of how 
or if the assessment model produced sensible 
results. Incomplete runs were those when the 
program stopped in the middle of the procedure 
without producing any SS3 outputs. The second 
performance statistics (% maximum gradient 
component [MGC] satisfied) measured the per- 
centage of runs that not only were completed, but 
also satisfied the convergence criteria with MGC 
less than 0.05 and a positive-definite Hessian 
matrix. It should be noted that even when MGC 
was <0.05 there was no assurance that the model 
had reached a global optimum. The threshold 
value for the MGC was set to be 0.05. The choice 
of this value was based on earlier testing runs, 
in which the default setup (run 1) was used and 
the result with the MGC of 0.05 was the same as 
that from other testing runs with smaller MGC 
values (e.g., 0.001). 
Results 
Testing simulation models 
Simulation models were tested by using the default 
setup in the operating model and the assessment 
model (run 1). As expected, the time series of 
median spawning output, as well as their 2.5% 
and 97.5% percentiles, between the simulation 
and assessment models matched very well (Table 
3; Fig. 3). Frequency plots of the estimated dif- 
ferences in virgin spawning output ( B 0 ), deple- 
tion, and OFL between simulation and assessment 
models showed that the differences were very 
