24 
Fishery Bulletin 119(1) 
for both the full sample and subsamples. Plots of mean age 
with 95% confidence intervals were created, with each plot 
containing a 1:1 line, and are commonly used to examine 
age bias (Campana, 2001). We also produced plots of sam- 
ple sizes by age. We explored the data to determine if the 
location of the processing plant, area, or month affected 
differences in aging for Atlantic and Gulf menhaden by 
comparing the distribution of the samples used in this 
study to the distribution of samples used in production 
aging of the long-term sampling program for processing 
plant, area, and month. 
Several statistical tests were used to determine if the 
age estimates were significantly different between the 2 
devices, between readers, and within readings of the same 
device and reader. The significance level (a) of 0.05 was 
used. Multiple approaches were needed to assess error, 
agreement, bias, precision, and symmetry (Campana 
et al., 1995); therefore, multiple tests are described and 
justified here. 
Average percent error (APE; Beamish and Fournier, 1981; 
Campana et al., 1995), Chang’s average coefficient of varia- 
tion (ACV; Chang, 1982), and percent agreement (PA) were 
calculated to determine precision and agreement within the 
age data (Campana, 2001). All of these statistics are com- 
monly used to compare age estimates. The APE and ACV 
can be artificially inflated by bias (Campana, 2001) and 
are related. In general use, APEs <5% are acceptable, as 
are ACVs of <5%; Campana (2001) calculated an average 
APE of 5.5% and an average ACV of 7.6% across studies. 
Values higher than those averages may indicate that struc- 
tures are difficult to interpret or that readers lack training 
(Morison et al., 1998). 
Three tests were used to determine symmetry within the 
paired age estimates between readers, by the same reader 
on the same device, and by the same reader on different 
devices: Bowker’s (Bowker, 1948; Hoenig et al., 1995), Evans 
and Hoenig’s (Evans and Hoenig, 1998), and McNemar’s 
(McNemar, 1947; Hettmansperger and Mckean, 1973) 
tests. These tests were fairly robust in simulation testing 
(McBride, 2015), but Evans and Hoenig’s test performed 
best. Bowker’s test can detect bias with low APE and is 
unpooled, but results from this test have a higher incidence 
of type II errors (McBride, 2015). Evans and Hoenig’s test 
can detect bias at higher APEs and is pooled on the diago- 
nal (McBride, 2015). Evans and Hoenig (1998) found that 
their test was more powerful than Bowker’s test and more 
general than McNemar’s test. Finally, McNemar’s test is a 
pooled test sensitive to small differences on one side of the 
diagonal (McBride, 2015). Each of these tests provide differ- 
ent information on precision and are not meant to corrobo- 
rate results of the other tests. 
Simultaneous multinomial confidence intervals were 
estimated to determine if the multinomial distributions 
differed significantly between age readings from the dif- 
ferent instruments (Sison and Glaz, 1995; Zar, 1999). The 
intent of the use of these intervals was to determine if the 
age estimates and other information that would be pro- 
vided to a stock assessment would be similar between the 
devices used for aging. Ultimately, the end products to 
which the age estimates contribute are the stock assess- 
ments for Atlantic and Gulf menhaden. Therefore, if the 
overall age compositions produced for those assessments 
do not differ by the instrument used, either device could 
be used to successfully provide age compositions. The mul- 
tinomial confidence intervals were calculated by using the 
MultinomialCI package (vers. 1.1; Villacorta, 2019) in R 
(vers. 3.5.3; R Core Team, 2019), and 95% confidence inter- 
vals were provided for each age. 
Measurements of distances on scales made with the 
3 methods (microscope, Eberbach projector, and blue 
cards) for subsamples were examined separately for the 
Atlantic and Gulf menhaden for each distance (e.g., focus 
to first annulus), excluding the final measurement from 
the focus to the edge of the scale. Box plots were gener- 
ated to compare across measurement types. A repeated 
measures analysis of variance was used to determine if 
the distance measurements among the 3 methods for each 
increment were significantly different, given an o of 0.05. 
Results 
Total sample sizes for Atlantic and Gulf menhaden col- 
lected from landings were 1317 and 1569, respectively 
(Table 1). After exclusion of samples, 1119 and 1307 sam- 
ples remained for Atlantic and Gulf menhaden, respec- 
tively. Atlantic menhaden samples were from sampling 
in 2013 (n=605) and 2017 (n=418) and from a reference 
collection (n=96). Gulf menhaden samples were from 
sampling in 2005 (n=759) and 2017 (n=499) and from a 
reference collection (n=49). Samples were excluded at a 
greater rate when examined with the Eberbach projec- 
tor than with the microscope (Table 1). Scales of Atlantic 
menhaden were excluded 14% of the time when observed 
with the Eberbach projector and 8% of the time when 
examined with the microscope. Scales of Gulf menhaden 
were excluded 12% of the time when observed with the 
Eberbach projector and 10% of the time when examined 
with the microscope. A higher percentage of scale subsa- 
mples (n=200) were excluded when they were examined 
with the Eberbach projector than with the microscope 
(Table 2). For both Atlantic and Gulf menhaden, neither 
location of processing plant nor area nor month were 
more indicative of differences in age than the other vari- 
ables, with an even distribution of differences across 
those variables; therefore, we do not provide those spe- 
cific results here. 
Both the APE and ACV for age data from the full sam- 
ple sets for each species were below the level considered 
a benchmark for aging of fish (6%; Morison et al., 1998). 
The APEs of the age estimates for all of the samples of 
Atlantic menhaden and Gulf menhaden were 2.7% and 
3.5%, respectively (Table 1). The APEs for age estimates 
for samples in the reference collections were as low as 
0.8% for Atlantic menhaden and as 0.0% for Gulf menha- 
den. The APEs for age data on Atlantic menhaden were 
3.5% for samples taken in 2013 and 1.8% for samples col- 
lected in 2017. The APEs for age data on Gulf menhaden 
