FISHERY BULLETIN: VOL. 76, NO. 2 



the unknown sample. Thus, imbalance among the 

 misclassified fish will recur, unless the expected 

 accuracy of classification is very good (near 100%). 

 We have devised a method to correct for this. 



Based upon the results of classification of the 

 known test group, the classification matrix, C, is 

 estimated: 



C = 



C\\ Ci2 



21 •'22 



- C„i C„2 



In 



2" 



c 



where c,j is an estimate of the fraction of fish 

 allocated to class i belonging to class j, such 



that 2 Cjj = 1.0, Vy. (Note that for each 7 the 

 i = i 



c,j 's are a set of estimated multinomial prob- 

 abilities and that each test sample size should be 

 adequate.) If the discrimination is error-free, C 

 would be an identity matrix. The adjustment of a 

 priori probabilities causes the initially estimated 

 classification matrix to evolve to the point where 



CT =Rf such thatT ^R,. 



The ith component of the vector T is the fraction of 

 fish in the test group from test sample i (class /), 

 and the ith component of the vector J?, is the 

 fraction offish in the test group allocated to class i 

 by the adjusted polynomial discriminant method. 

 The test samples comprising T are not indepen- 

 dent of the classification scheme since they are 

 used to determine the a priori probabilities used in 

 the decision rule. Hence, the estimated prob- 

 abilities in the classification matrix may not be 

 unbiased. However, we did chi-square tests that 

 show elements of the classification matrix are not 

 significantly different when estimated with either 

 the test samples used to determine the a priori 

 probabilities or a second independent test group. 

 Thus, we prefer to use only one test group to de- 

 termine the a priori probabilities and to estimate 

 the elements of the classification matrix because 

 the test sample sizes will be larger (and the var- 

 iance of the Cy's smaller) if we do not subdivide 

 the fish available. 



Now, let u , be the fraction of fish in a sampled 

 group that belong to the ith class. The vector U is 

 then unknown except for the obvious side condi- 



tion 2 u,- = 1. The classification matrix now 



; = 1 



operates on U to give: 



CU =i?/ 



where the ith component of/?,, is the fraction of 

 fish in the unknown sample allocated to the ith 

 class. Since C is estimated, R ,, is known and since 

 C is usually nonsingular, we can estimate U by 



U =C ' R^. 



Each point estimate iu,) obtained will have some 

 variance. This variability will depend upon the 

 accuracy with which fish from class i are classified, 

 the accuracy with which the elements of C are 

 estimated, and variance due to sampling error en- 

 countered when obtaining the unknown sample. 

 Thus, if any u , is small, then its estimate ( u , ) may 

 be negative. Such solutions are meaningless. In 

 such cases the classes with negative solutions 

 should be dropped (assume such u, ~ 0) and the 

 analyses repeated. 



We did simulation work to evaluate the classi- 

 fication matrix correction procedure for the two- 

 and three-class situations. Five hundred simu- 

 lated experiments were done for each situation. 

 For the two-class case the average error of the 

 classification results was 0.100 while that of the 

 corrected estimates was 0.055. In 84% of the exper- 

 iments the corrected estimate was closer to the 

 true value than classification result. For the 

 three-class case the average error of the classifica- 

 tion results was 0.127 while that of the corrected 

 estimates was 0.054. In 89% of the experiments 

 the corrected estimate was closest to the true 

 value. The results of these simulations show that 

 the classification correction procedure improves 

 estimates of the true proportion of a class present. 



This classification matrix correction procedure 

 will reduce to the correction procedure developed 

 for the two-class case by Worlund'' in the following 

 manner: 



*A similar relationship and a least squares solution technique 

 is given by Worlund and Fredin (1962). 



■'Worlund, D. D. 1960. A method for computing the variance of 

 an estimate of the rate of intermingling of two salmon popula- 

 tions. Unpubl. manuscr., 13 p. Bur. Commer. Fish., Biol. Lab., 

 Seattle, Wash. 



418 



