Barlow and Berkson: Evaluating methods for estimating rare events with zero heavy data 
353 
to represent the first simulated set. An analogous pro- 
cedure was used if the first simulated set in a stratum 
had bycatch. In scenarios with clumped sets, we as- 
signed the 5 simulated sets in a clump with attributes 
from SEFSC-observed sets from a single trip. There- 
fore, SEFSC-observed trips with fewer than 5 sets were 
eliminated from consideration. The remaining trips 
were sorted into 6 groups: 0 sets with take, at least 1 
set with take and 4 sets without take, at least 2 sets 
with take and 3 sets without take, at least 3 sets with 
take and 2 sets without take, at least 4 sets with take 
and 1 set without take, and at least 5 sets with take. 
For the first clump in a stratum, an SEFSC-observed 
trip that had at least as many sets with and without 
take as the sets simulated in the clump was selected at 
random. SEFSC-observed sets from that trip were also 
selected at random to match the simulated number of 
sets with and without take. 
Also, variable values for additional simulated sets 
in a stratum were selected from SEFSC-observed sets 
that occurred close in time and space to each other. We 
calculated the distance from the SEFSC-observed set 
chosen to represent the first simulated set in a stra- 
tum to all other SEFSC-observed sets, indexed by s in 
the following equation. Because SEFSC-observed sets 
that were closer to the first-selected set should have 
a greater probability of selection, the reciprocal of the 
distance formula was used. 
Distance s = , ^ = . 
sJ(Date-Date s f + ( Lat-Lat s f + ( Long-Long s )~ 
In scenarios with uniformly random sets, the distance 
values were used to calculate probabilities of selection 
for each SEFSC-observed set. These probabilities were 
used to select SEFSC-observed sets to represent the re- 
maining simulated sets within a stratum. The same set 
could not be selected multiple times within a stratum. 
The probabilities of selection were recalculated from the 
first set in subsequent strata. 
In scenarios with clumped sets, the average distance 
value for SEFSC-observed sets within a trip was cal- 
culated, and probabilities of selection were calculated 
for each SEFSC-observed trip. We assigned variable 
values to the second clump of simulated sets in a com- 
putational group by tallying the number of simulated 
sets with take in the clump and using the calculated 
probabilities to select an SEFSC-observed trip with the 
corresponding number of sets with take. Once an SEF- 
SC-observed trip was selected, the required numbers of 
sets with take and sets without take were selected ran- 
domly from the trip. This algorithm continued through 
all set clumps until the next stratum was reached. The 
same trip could not be selected multiple times in the 
same stratum, and the probabilities were recalculated 
for the next stratum. 
To select the best-fitting GLM, we first fit a saturated 
model and then performed a stepwise procedure based 
on Akaike information criterion (AIC) values. The re- 
sulting model was used to predict the number of sea 
turtles caught on unobserved sets. The GLMs were fit- 
ted in R software, vers. 2.14.1 with the glm2 and glm. 
nb packages (R Development Core Team, 2011). The 
glm2 package was used because some models that fail 
to converge with the glm package may have greater 
stability with glm2. The glm.nb package is a modifica- 
tion of the glm package with an additional parameter 
for a negative binomial GLM. We built the simulation 
model in Microsoft Visual Basic 2010 Express (Microsoft 
Corp., Redmond, Washington). 
Delta-lognormal method In contrast, the only informa- 
tion besides the observed bycatch required for the delta- 
lognormal estimation method was the number of hooks 
per set. The number of hooks per set was simulated 
according to the procedure we used to assign explana- 
tory variable values to sets for the GLM. To estimate 
bycatch on unobserved sets, we multiplied the mean 
observed bycatch rate by the total simulated number 
of hooks. 
Estimating bycatch at 2 spatiotemporal scales The 
SEFSC estimates bycatch in each quarter-area stratum 
and sums the estimates across strata to obtain a total 
annual estimate. In 1999, the SEFSC investigated how 
pooling data across strata before estimation affects the 
bycatch estimate (Yeung, 1999). Bycatch point esti- 
mates were relatively insensitive to pooling, but estimate 
precision improved considerably. The only pooling cur- 
rently done by the SEFSC occurs when a stratum has no 
observed sets. If a stratum has no observed sets, then the 
mean bycatch rate of that stratum from previous years is 
used. Pooling data obscures variation among strata, but 
it increases the sample size on which bycatch estimates 
are made. Thus, pooling data addresses the problem of 
little or no observer coverage and wide confidence inter- 
vals. To evaluate the efficacy of a pooling procedure, we 
pooled simulated data across all quarter-area strata first 
and then made an estimate of total annual bycatch. We 
compared this procedure to estimating bycatch in each 
stratum and summing estimates across strata to obtain 
a total annual estimate. 
Evaluating estimation method performance 
An estimation method performs well if point estimates 
are unbiased and precise. The performance of each 
estimation method was evaluated under each spatial 
scenario. Incorporating the different potential sea turtle 
distributions, set distributions, estimation methods, and 
spatiotemporal scales of estimation produced 30 poten- 
tial models: 5 spatial scenarios with 6 estimates each 
(Fig. 3). Each of the 30 potential models was simulated 
1000 times. 
We assessed the accuracy of an estimation method by 
estimating bycatch in 1000 simulations of each spatial 
scenario, calculating the relative error for each simula- 
tion, and identifying the median relative error for the 
estimation method in that spatial scenario. If an esti- 
