498 



AbStraCt.-The spatial distribution 

 of marine organisms is highly patchy. 

 Because of this patchy distribution. 

 data from marine abundance surveys 

 are highly skewed and have a large 

 variance. Compounding the problem of 

 estimating the mean abundance from 

 such data, is that occasionally a rela- 

 tively huge catch will occur. These large 

 catches are not "outliers" but do domi- 

 nate the estimates of the mean and 

 variance. A lognormal model of the non- 

 zero survey values (a A-distribution) is 

 used to model survey data. The estima- 

 tors, based on the lognormal model, ap- 

 pear to be much more efficient for ma- 

 rine data than the usual sample esti- 

 mators. In particular, the lognormal- 

 based estimators provide reasonable es- 

 timates for data sets that contain a very 

 large catch. The properties and effi- 

 ciency of the A-distribution estimators 

 are examined and the techniques are 

 applied to various marine data sets. 



Estimating the mean and variance from 

 highly skewed marine data 



Michael Pennington 



Woods Hole Laboratory 



National Marine Fisheries Service. NOAA 



Woods Hole. Massachusetts 02543 



Manuscript accepted 3 April 1996. 

 Fishery Bulletin 94:498-505 1 1996). 



Characteristically, the observed dis- 

 tribution of abundance data gener- 

 ated by marine surveys has a large 

 variance, is highly skewed to the 

 right, and contains a substantial 

 proportion of zeros. Because of this 

 large variability, the sample mean 

 has a low level of precision even for 

 relatively intensive surveys (Gross- 

 lein, 1971; God0, 1994; Pennington 

 and Godo, 1995). A common prob- 

 lem in the analyses and interpreta- 

 tion of skewed survey data, is that 

 a single immense catch may account 

 for 50% or more of the total catch 

 during a survey (Sissenwine, 1978; 

 Dew, 1990; McConnaughey and 

 Conquest, 1992; Bowering and 

 Brodie, 1994). These extreme val- 

 ues not only greatly affect the esti- 

 mate of the mean but also of the 

 variance (Otto, 1986). As McCon- 

 naughey and Conquest (1992) ob- 

 served, although these large values 

 cause much uncertainty for man- 

 agement, they reflect the spatial 

 distribution of the species and are 

 not outliers that should be dis- 

 carded. In practice, the use of more 

 efficient sampling schemes or esti- 

 mators is the only realistic way to 

 increase survey precision; the total 

 number of samples that can be 

 taken is limited by the high cost of 

 sampling at sea (Gunderson, 1993). 

 One possible way to increase the 

 precision of survey estimates is to 

 model the observed distribution of 

 catches and exploit the model's 

 properties to develop more efficient 

 estimators of population param- 



eters (see, e.g. Pennington, 1983; 

 MacLennan and MacKenzie, 1988; 

 Lo et al., 1992; McConnaughey and 

 Conquest, 1992; Conquest et al., 

 1996; Stefansson, in press). For ma- 

 rine data, the distribution of the 

 nonzero values is often well approxi- 

 mated by a lognormal distribution 

 (e.g. Pennington, 1983; Smith, 1988; 

 McConnaughey and Conquest, 

 1992; Conquest et al., 1996). Myers 

 and Pepin (1990) found that of the 

 69 marine data sets they examined, 

 only 5 differed significantly from the 

 lognormal distribution. Thus the 

 lognormal model has been used as 

 a basis for developing survey abun- 

 dance estimators (e.g. Pennington, 

 1983, 1986; Lo et al., 1992; McCon- 

 naughey and Conquest, 1992) and 

 for estimating commercial catch 

 (Conquest et al., 1996). 



It is not surprising that marine 

 abundance data often appear to fol- 

 low a lognormal distribution. The 

 factors that determine abundance 

 over a region seem to have a multi- 

 plicative effect. When this is the 

 case, survey data will be approxi- 

 mately lognormally distributed by 

 the central limit theorem (see, e.g. 

 Aitchison and Brown, 1957). More 

 generally, the lognormal model has 

 been useful for analyzing a wide 

 range of ecological data. As Dennis 

 and Patil (1988) put it: "Ecological 

 abundance data are intrinsically 

 positive, with a few enormously 

 high data points typically arising in 

 every study. The lognormal distri- 

 bution is an ideal descriptor of such 



