FISIIEKV BLLLETIN: VOL. 86, NO. 4 



where N{t) is the vector of unobserved population 

 sizes. The concern here is not whether the multi- 

 plicative or additive form of the model is more cor- 

 rect, but rather what elements of the model are data, 

 what are parameters, and how to calculate appro- 

 priate estimates of each. 



If there were all-seeing observers, both the popu- 

 lation process A/^(0 and the observation process n{t) 

 would be available as data. Equations (1) and (2) 

 imply a sequence of conditional probability distribu- 

 tions /,v(A^(^ + 1) |A^(0, e,) and f,Mt)\ Nit), d^), 

 where d^, Q., are parameters of the distributions to 

 be estimated. Assuming additive gaussian errors, 

 then 01 would include the mean vector FN{t) and 

 a covariance matrix that can be calculated recursive- 

 ly (see below). The mortality rate m serves as a con- 

 straint on the form of the estimates of the mean vec- 

 tor, much as in the theory of regression. Similarly, 

 the parameters for the observation process are a 

 mean vector H(t)N{t) and a covariance matrix, 

 where H{t) in this case depends on the parameters 

 q{t) that constrain the estimates of the mean vec- 

 tor. With an all-seeing observer, both. n{t) and N{t) 

 are realized values of random vectors and hence, are 

 the data to be used to estimate the unknown param- 

 eters of the distributions, 6-^ and 02- 



Thus, the unobserved population sizes are most 

 appropriately treated as missing data. The estima- 

 tion scheme proposed by Colhe and Sissenwine 

 (1983) treats the N{t)as parameters to be estimated. 

 Little and Rubin (1983, 1987: sec. 5.4) showed that 

 treating missing data as parameters in likelihood 

 equations does not produce maximum likelihood 

 estimates of the parameters unless the proportion 

 of missing data approaches zero as the sample size 

 increases. This is because much of the asymptotic 

 theory of maximum likelihood estimation depends 

 on the number of observations becoming large, 

 relative to the number of parameters. Little and 

 Rubin (1983, 1987) showed that for a regression-like 

 situation, the bias due to treating data as param- 

 eters can be quite large. 



The alternate approach discussed by Little and 

 Rubin (1983, 1987) is to integrate out the missing 

 data from the complete data likelihood and maximize 

 this function over the parameters as usually defined 

 in estimation theory. This is the approach taken by 

 Shumway and Stoffer (1982a), who used the EM 

 (expectation-maximization) algorithm of Dempster 

 et al. (1977) and Kalman filtering to derive maxi- 

 mum likelihood estimates for the parameters of the 

 model and minimum mean square error estimates 

 of the missing data. 



I can explain some of the problems with esti- 

 mating Equations (1) and (2), using the likelihood 

 of Equation (3). Under the gaussian assumptions of 

 the model, the complete data log-likelihood is given 

 by (Shumway and Stoffer 1982a) 



log I I 



1 



(iV(0) - M(0))^ S 



- 1 



(iV(0) - M(0) 



^loglQI -I I(iV(0 - FN{t - l)y 

 2 2 / = i 



Q-i (Nit) - FN(t - 1)) 



rp 1 



- log I i? I - - J.{n{t) - H{t)N{t)y R-^ in{t) 

 2 2 < = i 



H{t)N{t)) 



(4) 



whereF = ifil a.ndH{t) = qit)I. Similarly, the com- 

 plete data log-likelihood in Equation (3) by substi- 

 tution is 



-111 Nit) - FNit - 1) I |- + I I nit) 



t=\ 



- Hit)Nit) 



(5) 



Collie and Sissenwine (1983) noted that their estima- 

 tion scheme assumes that the process and observa- 

 tion errors have the same variance. However, from 

 Equations (4) and (5) it can be seen that they make 

 the stronger and unlikely assumption that both the 

 errors in the population dynamics and the errors in 

 the observation process are uncorrected. Further, 

 we can see from Equation (4) that when the Nit) 

 are treated as parameters, the estimates of the Nit) 

 depend on the observed data nit) for t = 1,T. 

 Following Shumway and Stoffer (1982a), the ex- 

 pected log-likelihood conditioned on the observed 

 data comprises three parts: a term due to estimating 

 the expected value of the initial population size. 



618 



