Pennington Estimating the mean and variance from highly skewed marine data 



503 



Number per haul 



Figure 2 



Frequency plot of the number per tow of juvenile Arcto- 

 Norwegian cod from a 1989 midwater trawl survey in the 

 Barents Sea iHelle, 19941. There were also 62 zeros (n = 161) 

 which are not included in the plot. 



moderate sample sizes. The median can be much 

 smaller than the mean for a skewed distribution, and, 

 therefore, the sample estimators are not only less 

 efficient but will underestimate the true values of 

 the parameters most of the time (Pennington, 1983; 

 McConnaughey and Conquest, 1992; Conquest et al., 

 1996). The sampling distribution of sf is much more 

 skewed than is the sampling distribution of x , which 

 is the reason that sj often greatly underestimates 

 its expected value more often than does .v( Pen- 

 nington, 1986). The sample estimators are unbiased, 

 even though most of the time the estimates are low 

 and are very high for the occasional sample that con- 

 tains a huge catch (McConnaughey and Conquest, 

 1992). 



Discussion 



The estimators of abundance, based on the lognor- 

 mal model, perform as expected on real survey data 

 if the underlying model for the nonzero values is a 

 lognormal distribution. The estimates are more pre- 

 cise, and the occasional huge catch does not affect 

 the estimates nearly as much as it does the sample 

 average (see also McConnaughey and Conquest, 



1992). The A-estimators treat these large catches as 

 part of the distribution, as a reflection of how fish 

 are actually distributed spatially, eliminating the 

 need to handle them as "outliers," that is, to discard 

 the points arbitrarily in an analysis of the data. Since 

 all models only approximate reality, an advantage 

 in using lognormal-based estimators for marine data 

 is that they appear to be fairly robust to deviations 

 from the model (Blackwood, 1991; Pennington, 1991; 

 Conquest et al., 1996). 



The A-estimators can be much more efficient than 

 the sample estimators but lose this advantage for 

 small samples (see Smith, 1988; Fig. 1). Thus for 

 stratified surveys in which the region is divided into 

 many relatively small strata and only a few stations 

 are selected in each stratum, little would be gained 

 by using the A-estimators (Smith, 1988). Only a slight 

 gain in precision is usually achieved by increasing 

 the number of strata beyond 6 (Cochran, 1977). Con- 

 sequently it appears that a better survey design 

 would be one that has larger strata with at least 20- 

 30 stations in each stratum (Fig. 1). Not only would 

 this design improve the efficiency of the A-estima- 

 tors but it would then be possible to exploit optimal 

 sample allocation schemes that may be more efficient 

 (Gavaris and Smith, 1987; Polacheck and Volstad, 



