Fishery Bulletin 100(1) 



hatchery fish, and this would increase the rate of false 

 negatives. Differences between readers in skill and train- 

 ing level, and how they process otoliths, can add to the un- 

 certainty in estimating the accuracy of the readings and 

 the rates of false positives and negatives. 



Otolith marking generally takes place without any sec- 

 ondary marking, such as fin-clipping or coded-wire-tag- 

 ging; therefore the accuracy of a reading cannot directly 

 be determined through conventional methods that make 

 use of a "gold standard" (known origin sample) or other 

 error-free classification methods. To ensure that the in- 

 formation provided to the Alaskan fisheries managers is 

 accurate, each otolith is independently examined by two 

 readers, and a third reading is used to resolve differenc- 

 es between the first two readings. The resolved readings 

 are used to estimate the contribution of hatchery fish, 

 and the presumption of accuracy is based on the premise 

 that, through multiple readings, all marked fish are ei- 

 ther correctly identified or that errors, if present, are in- 

 consequential. Developing the analytical tools to deter- 

 mine the veracity of that assumption is the objective of 

 this investigation, and by establishing such tools, quality 

 control standards for recovering thermal marks can be 

 developed. 



In developing the tools to measure the quality of otolith 

 readings, three questions are addressed: 



1 How to assess the reliability of otolith readings when 

 no standards are available. 



2 How to estimate the proportion of hatchery marks when 

 there is disagreement between two or more readers. 



3 How the precision of the estimate of the proportion is 

 influenced by classification error 



We discuss two approaches: 1 ) indices of agreement typi- 

 cally used in reliability studies, and 2) latent class models 

 where classification errors are estimated for each reader 

 even though the true error rate is considered unknown. 

 The data requirements and their attendant assumptions 

 are presented for each approach. The methods are illus- 

 trated by examining among-reader comparisons of chum 

 salmon (Oncorhynchus keta) and sockeye (Oncorhynchus 

 nerka) salmon otoliths collected from programs that moni- 

 tor inseason contributions of hatchery fish in several com- 

 mercial fisheries in Southeast Alaska (Hagen et al., 1995). 

 The results are used to provide recommendations for mon- 

 itoring the quality of otolith readings for thermal marking 

 programs. 



Table 1 



Notation used to show the cross-classification of a sample 

 of fi otoliths by two readers to either hatchery (H) or wild 

 stock (W) assignment. Row and column sums are indicated 

 by the subscript "." 



2 is infallible (or is considered a "gold standard"), unbiased 

 estimates of the accuracy and error rates of reader 1 and 

 the proportion of hatchery stocks (p) are given by 



'^HlH ~ "hh/"h- '^WjH ~ "\VH ^ " H - 1 ■'''hIH 



'^wjw ~ "vv\\7" w- '^Hiw ~ "hw I " w = 1~ ■'^w|w 

 P = "h/". 



(where, for example, ;r\v|n refers to the probability that 

 reader 1 classifies an otolith as W when its true state is H). 

 These estimates reflect the fact that reader 2 is infallible; 

 the accuracy rates CThih' '^wiw' ^"d the error rates CTwifi- 

 Tc■^^,^^) are conditional on the numbers of hatchery or wild 

 stock otoliths as determined by reader 2. 



No standard available 



If a standard is not available, an unbiased estimate of 

 p can be obtained if the accuracy rates for reader 1 are 

 known. The estimate is 



p* = ("n/«+^wr 



I'/f'^HI 



H|H 



' W|W 



1), 



where n■^^ is the number of otoliths classified as hatchery 

 otoliths. If the accuracy rates are estimated, thenp* will 

 no longer be unbiased, but will be much less biased than 

 the estimator n■^^ln and will in general have a much 

 smaller mean-squared error (Rogan and Gladen, 1978). 

 For a Bayesian approach to this problem, see Viana et al. 

 ( 1993 ) and Joseph et al. ( 1995). 



Methods 



Standard available 



A sample of /i otoliths, which are examined by two readers, 

 can be cross-classified as hatchery (H) or wild stock (W) 

 as in Table 1. Suppose we wish to estimate the accuracy 

 rate (probability of making a correct classification) or con- 

 versely, the error rate ( probability of making a wrong clas- 

 sification). If we know nothing about reader 1, but reader 



Agreement measures When accuracy rates are unavail- 

 able, statistics that measure "agreement" between readers 

 are often calculated (e.g. Fleiss, 1981). One such index is 

 simply the proportion of observed agreement (P„), defined 

 as 



:(/(. 



 )ln. 



Another index, called kappa (k), corrects P„ for the degree 

 of agi'eement that is expected by chance alone. It is defined 

 as 



