Blick and Hagen: Use of agreement measures and latent class models to assess the reliability of classifying thermally marked otoliths 



assumption to be checked because there will now be extra 

 degrees of freedom to assess goodness-of-fit (there are 3 

 df. but only one parameter. p, needs to be estimated). Esti- 

 mates of p can still be obtained with one reader, but there 

 can be no check of the assumptions. Also, there can be a 

 significant increase in uncertainty in the estimate in using 

 only one reader. 



Discussion 



There are numerous classification problems in fisheries 

 that require the judgment of trained individuals. In many 

 of those situations no "gold standard" is available to test 

 those judgments, and it becomes necessary to apply other 

 methods to determine the veracity of the classifications. 

 Reading thermally marked otoliths is a particularly good 

 example of this problem because thousands of classifica- 

 tion decisions are needed each year to provide estimates of 

 hatchery contributions. 



The common approach for assessing the quality of the 

 readings, in the absence of having samples of known origin, 

 has been to collect independent and multiple readings on 

 the samples, and to presume that agreement between read- 

 ings can serve as a proxy for reading accuracy. Agreement 



indices such as k" are very easy to compute, and they have 

 utility in that they can serve as flags to indicate reading 

 problems. However, as was shown here, they also suffer dif- 

 ficulties in interpretation. Also, the indices in themselves 

 do not provide inferences about the relative skill of differ- 

 ent readers in pulling out a particular set of patterns. 



Latent class models provide an approach with readily 

 interpretable quantities for a modest computational cost. 

 Classification accuracies or errors are direct, meaningful 

 parameters unlike an index of agreement. In addition, es- 

 timates of p are available. These models can be readily ex- 

 tended to the case of more than two outcomes, e.g. multiple 

 hatchery marks. These models could also be useful in oth- 

 er applications, such as in aging fish or in the identifica- 

 tion of any character for which there is no "gold standard" 

 (e.g. field identification of species or sex). A somewhat sim- 

 ilar analysis has been proposed for aging (Richards et 

 al., 1992), although the link to LCMs was not discussed. 

 LCMs can handle fairly complicated situations, including 

 ordered classes (Croon. 1990), continuous manifest vari- 

 ables, and parameter constraints (see Clogg, 1995, and 

 Krzanowski and Marriott, 1995. for reviews). 



We have not discussed the Bayesian approach to these 

 problems in great detail, but we believe it has much to 

 offer in that it can incorporate prior information, either 



