Blick and Hagen: Use of agreement measures and latent class models to assess the reliability of classifying tfiermally marked otolitfis 



ditional latent classes may be added (Christenson ct al., 

 1992; Forniann, 1994), e.g. a third class of otoliths from 

 ambiguous sources. 



In the previous discussion concerning three or more 

 readers, we implied that readers were different individu- 

 als. This need not be so; what is required are three or more 

 independent readings. If it were possible for the same in- 

 dividual to read the same otolith more than once, indepen- 

 dently, then the number of different readers could be re- 

 duced. If independence could not be met, the dependence 

 could be modeled, as discussed above. 



Another critical assumption, but one that should be met 

 most of the time, is that the individual accuracy rates 

 are known to be either greater than or less than the 

 error rates (e.g. %|h > ^wm ^^'^ %-|W ^ %|W' which im- 

 plies that ^Tj^iH and JT^,-^ are either greater than or less 

 than 0.5) because of an inherent symmetry in the problem 

 that results in the same likelihood function being gener- 

 ated when the error rates are switched with the accuracy 

 rates. 



Computation Formulas for estimating \'and its standard 

 error are straightforward (Fleiss, 1981). Estimates can 

 also be obtained from several software packages including 

 PROC FREQ in SAS (SAS Institute, 1989). 



Maximizing either of the likelihood functions for the 

 LCMs requires a numerical procedure. The most straight- 

 forward is to use an optimization routine such as "Solver" 

 in Excel (Microsoft Corporation, 1993) or "nlminb" in S- 

 PLUS (Statistical Sciences, 1995). Alternatively, the EM 

 algorithm (Dempster et al., 1977; Dawid and Skene, 1979; 

 McLachlan and Krishnan, 1997) can be easily used. The 

 simplicity of the EM algorithm follows from the recogni- 

 tion that the LCM is an example of a finite mixture prob- 

 lem, specifically, in this case, a mixture of multivariate 

 Bernoulli distributions with mixing parameter p (Everitt, 

 1984). Use of the EM algorithm for such mixture prob- 

 lems in fisheries is well documented, e.g. for stock compo- 

 sition estimates (Millar, 1987; Pella et al., 1996) and for 

 age-length keys (Kimura and Chikuni, 1987). A more ef- 

 ficient alternative to the EM algorithm is to use iteratively 

 reweighted least squares (Agresti, 1990). This method is 

 relatively easy to implement in software such as PROC 

 NLIN in SAS (SAS Institute, 1989). Perhaps the most di- 

 rect and efficient way would be to use LCM software. We 

 are not aware of any routines for LCMs in any major 

 statistical package at present, but several independent 

 LCM packages exist (for a review, see Clogg, 1995; and for 

 an Internet listing see http://oui-world.compuserve.com/ 

 homepages/jsuebersax/index.htm). 



As with many maximum likelihood problems, where nu- 

 merical methods must be used, complications can arise. 

 Constraints may at times be needed to ensure that pa- 

 rameter estimates fall in acceptable intervals (e.g. [0,1] 

 for p and [0.5,1] for the ;r's). Also the likelihood function 

 may have local maxima, which means that several runs 

 with varying starting values may be necessary to identify 

 the global maximum. Finally, estimates of standard er- 

 rors may entail additional computing. PROC NLIN in SAS 

 provides asymptotic (i.e. large-sample) standard errors. 



Jackknife and bootstrap estimates are relatively easy to 

 program, the jackknife being much less computationally 

 intensive. 



Finally, the Bayesian programs discussed in Joseph et 

 al. (1995) can be found at http://www.epi.mcgill.ca/Josepli/ 

 software. html. 



Examples 



The first example analyzes the results of three readers 

 examining 570 chum otoliths. The samples were taken 

 from a common location, and the readers were familiar 

 with the patterns. Each reading was made without knowl- 

 edge of prior readings. The data, along with pairwise k 

 estimates and the LCM parameter estimates (using PROC 

 NLIN in SAS; see appendix for code) are presented in 

 Table 3. 



These results indicate that the third reader is signifi- 

 cantly (a=0.05) less able to correctly identify a hatchery 

 mark when it is present and that there are no significant 

 differences among readers in their ability to detect a wild 

 mark when it is present. These conclusions are readily ap- 

 parent from the table of results, and although the pairwise 

 K"'s are consistent with these results, they are more dif- 

 ficult to interpret. With the variance due to sampling es- 

 timated to be (0.7379X1 - 0.7379)/(570 - 1) = 0.0003399, 

 misclassification error contributes only 0.36% to the total 

 variance. 



The second example consists of two readers with four 

 spatial strata. Samples were obtained from sockeye salm- 

 on caught in four neighboring Alaskan gillnet fisheries 

 in central Southeast Alaska. The data and the LCM esti- 

 mates are shown in Table 4. These estimates indicate that 

 the readers are not statistically different in their ability to 

 detect hatchery marks, whereas the second reader is bet- 

 ter able to distinguish wild marks. With eight parameters 

 and 12 df there are 4 df available for a goodness-of-fit test. 

 Pearson's chi-square yields 4.83, which with 4 df, has a 

 p-value of 0.306, thus indicating an acceptable model fit. 

 Misclassification error contributes from about 8% to 14% 

 to the total variance in the estimates of the proportion of 

 hatchery stock. 



Design considerations 



Design of an otolith reading program is complicated by 

 misclassification error. An important consideration is the 

 precision of the estimates, in particular the precision of the 

 estimate ofp. Table 5 shows the asymptotic standard error 

 of p for various combinations ofp, /r^iH' ^^^ ^wiv! f'"' '-^e 

 three-reader model with unknown accuracies, and the one-, 

 two-, and three-reader models with accuracies assumed 

 known. Although this table is derived for a sample of 1000 

 otoliths, the ratio of any two standard errors within the 

 table would be the same for any sample size (assuming the 

 sample size is large enough to approximate the asymptotic 

 conditions). It is evident that misclassification inflates the 

 standard error over the usual binomial case (right-most 

 column). The table also makes clear the increase in the 

 uncertainty of estimating p when the accuracies also have 



