Perkins and Edwards: A mixture model for estimating discarded bycatch 



339 



where y is the sample mean. In other cases, the sim- 

 plified forms for the MLE of E[Y] are slightly more 

 complex (Eqs. 7 and 8), and Equations 9 and 10 no 

 longer apply. We did derive expressions, analogous 

 to Equation 9, for the variance of Equations 7 and 8 

 in terms of the three model parameters p, a, and fi. 

 However, these formulae are so complex as to be of 

 no practical use in estimation, and no expression 

 analogous to Equation 10 seems possible. 



Rounding errors in the observations 



The NBAZ model used in this study comprises two 

 components. As noted in the description of the model, 

 zero values derived from the NB component can be 

 interpreted as observations of small amounts of dis- 

 card, rounded down to zero, whereas zero values from 

 the probability mass component can be interpreted as 

 exact zeros. This interpretation is based on the assump- 

 tion of an underlying continuous distribution for posi- 

 tive discard amounts (e.g. a gamma distribution) upon 

 which rounding errors have been superimposed. 



One consequence of this interpretation is that the 

 mean amount of discard that should be associated 

 with "true zeros" is zero, and the mean amount that 

 should be associated with "NB zeros" is nonzero. 

 Thus, strict adherence to this interpretation of zeros 

 leads to the conclusion that Equation 4 may be an 

 underestimate of E[Y]. However, if we assume a 

 strictly decreasing underlying distribution for posi- 

 tive discard, symmetric rounding of amounts larger 

 than one-half ton would tend to increase the esti- 

 mate. In the absence of a specific model for the round- 

 ing errors, we did not attempt to correct for any bias 

 due to rounding. 



The EM algorithm for maximizing likelihood 



We used a quasi-Newton algorithm to maximize like- 

 lihood for the parameters p, a, and /.i. A useful alter- 

 native for mixture models, including "added zero" 

 distributions, uses the EM algorithm to maximize 

 likelihood (e.g, McLachlan and Basford, 1988; Lam- 

 bert, 1992). In situations with many covariates, it 

 provides a well-behaved alternative to the high-di- 

 mensional gradient search required by general opti- 

 mization algorithms. The algorithm can be imple- 

 mented by using standard regression techniques for 

 generalized linear models. We applied the EM algo- 

 rithm to the NBAZ using a combination of logistic 

 regression to maximize likelihood for p and quasi- 

 likelihoodNB regression for// and a (Lawless, 1987). 

 However, the logistic regression failed to converge 

 for the current data because the ML estimate of p 

 for log sets was zero. 



Alternative models considered 



We considered but rejected two alternatives to the 

 NBAZ model: 1) the A-distribution (a mixture of a 

 probability mass at zero with a lognormal [Aitchison, 

 1955; Pennington, 19831); and 2) a T-distribution 

 mixed with a probability mass at zero (Coe and Stern, 

 1982). Both have been used in similar cases where 

 the data to be analyzed have contained large num- 

 bers of zeros. The A-distribution assumes that the 

 natural logs of the positive observations are distrib- 

 uted normally, or can be so transformed, and this 

 assumption was not plausible. The data in this analy- 

 sis were rounded to the nearest ton and the mode of 

 the positive observations was one ton. Thus, no trans- 

 formation could bring these data to even approxi- 

 mate normality. The gamma mixture model was not 

 appropriate for the current data because maximum 

 likelihood estimation for a highly skewed gamma 

 distribution depends heavily upon small (near zero) 

 observations. In this study, all observations in that 

 region were rounded to either zero or one, implying 

 a large relative measurement error and therefore 

 potentially poor accuracy. Another more fundamen- 

 tal reason why we rejected these two models was that 

 both models mix a continuous distribution on the 

 positive numbers with a probability mass at zero and 

 assume that observations from each component re- 

 main distinguishable. In the current data set, small 

 positive observations are grouped together with zero 

 observations, and using an NB in the mixture allows 

 the model to distinguish between "true zeros" (ac- 

 tual absence of discard) and "rounded zeros" (discard 

 so small that it was ignored or missed). 



Conclusions 



The methods developed here were used to model fish- 

 eries discard data which were rounded to integer 

 values and which included widely varying numbers 

 of zero observations, depending on one or more 

 covariates. The usual models for integer-valued data 

 (e.g. the Poisson distribution) did not fit the data at all 

 well because of the extreme skewness of some of the 

 observed distributions. The NBAZ is more flexible than 

 the standard models and provided a much better fit. In 

 general, the model is applicable to any set of integer- 

 valued data which exhibit a large proportion of zero 

 observations combined with long positive tails. Both 

 categorical and continuous covariates may be used. 



Modelling these data with a parametric probabil- 

 ity distribution allowed us to describe patterns in 

 the discard in some detail, for example, in estimat- 

 ing the percentage of "true zeros" in the data. Addi- 



