Pennington: Estimating the mean and variance from highly skewed marine data 



499 



data with a positive range, right skewness, heavy 

 tail, and easily computed parameter estimates." 



To estimate efficiently the mean of skewed marine 

 survey data and to be able to assess its precision, I 

 examined an estimator based on a lognormal model 

 of the distribution. I present the estimator's theo- 

 retical efficiency, assess its performance by applying 

 it to several real marine data sets, and give methods 

 for constructing confidence intervals. 



Statistical methods 



Suppose the nonzero catches generated by a survey 

 are lognormally distributed, i.e. the logged values are 

 normally distributed. If the distribution contains a 

 proportion of zeros, then it is called a A-distribution 

 (Aitchison and Brown, 1957). If zeros do not occur, 

 then it is the usual lognormal distribution. 



Estimating the mean and variance of the 

 A-distribution 



As is the case for any distribution, the sample aver- 

 age, x , and variance, s^, are unbiased estimators of 

 the mean and variance of the A-distribution. Because 

 of the properties of the lognormal distribution, the 

 minimum variance unbiased estimators (denoted by 

 c and d) of the mean and variance of the A-distribu- 

 tion are given by (Aitchison and Brown, 1957) 



exp(y)g m (sV2), m> \ 



and 



//; 

 n 



n 

 0, 



™ e xp<2.y>k,<2s 2 )- ^i 

 n U-l 



m=l 



m =0 



(1) 



g n 



m 



in 



m = 1 

 (2) 



where n is the number of observations, m is the num- 

 ber of nonzero values, y = ln(.r) , y, and s 2 are the 

 sample mean and variance of the logged nonzero 

 values, x ] denotes the single untransformed value 

 when m equals one, an<\g m (t), which is a function of 

 m and t (e.g. t = s 2 12 in Equation 1), is defined by 



gjt)=l+ 



m 



-t + 



2-i „,' 



m 

 (m-l) 2> " ] 



(3) 



J=2 



m'(m + l)(m+3)..Am +2j -3) j\ 



Estimating the variance of x and c 



Again for the A-distribution, the sample mean, x, 

 and c are both unbiased estimators of the mean. Like- 

 wise, the sample variance, s z , and d are unbiased 

 estimators of the population variance. If x is used 

 to estimate the mean, then s 2 x ln, the sample vari- 

 ance divided by the sample size, is an estimate of 

 the variance of .v. But s~ can be a very inefficient 

 estimator compared with d, and, therefore, it is fre- 

 quently recommended that din be used to estimate 

 the variance of v (Aitchison and Brown, 1957). The 

 minimum variance unbiased estimator of the vari- 

 ance of c is given by (Pennington, 1983) 



var es( (c): 



exp(2y) — g 2 Js 2 /2)- 



m-1) ( m - 2 



\sJ - 



n-l 



m-1 



o. 



m > 1 



m = 1 



m =0 



(4) 



If m = n, i.e. there are no zeros, then Equations 1, 2, 

 and 4 reduce to the usual estimators for the lognor- 

 mal distribution. 



Relative efficiency of x and c 



For the two estimators of the mean, x and c, the one 

 with the smallest variance is the most efficient esti- 

 mator. The formulas in the last section give estimates 

 of the variance based on the particular sample drawn 

 from the distribution. The expected or true variance 

 of x is (Aitchison and Brown, 1957) 



var(x) 



exp(2^ + CT 2 ) 



{p[exp(cT 2 )-l] + p(l-p)}, (5) 



where u is the mean and o is the standard deviation 

 of the log-transformed nonzero values, and p is the 

 proportion of nonzeros. Smith ( 1988) derived the ex- 

 pected value of var est (c), which, in the same notation 

 as above, is given by 



