280 



FISHERY BULLETIN OF THE FISH AND WILDLIFE SERVICE 



tive abundance or composition of a mixed popu- 

 lation is desired. In this case there is little 

 interest in the individuals. 



If we are interested in classifying individuals, it 

 is possible to adjust the classification region to 

 reduce the chance of making errors. Those 

 individuals that fall close to the division line 

 (75.52) are the cause of the largest percentage of 

 misclassifieations. If some of these are not 

 classified, the errors can be reduced. This 

 amounts to dividing the sample into three groups: 

 Hudson River shad, Connecticut River shad, and 

 those that could be either with about equal prob- 

 ability. This third group consists of fish which 

 remain unclassified because there is insufficient 

 information upon which to make a positive identi- 

 fication. If only those fish with a Y less than 

 70.94 are called Connecticut shad and those witli 

 a Y greater than 74.10 are called Hudson River 

 shad, the probability of misclassifying a Con- 

 neticut shad would be equal to the area under the 

 normal curve from 74.10-70.94 = 3.16 to infinity. 

 The corresponding normal deviate is 1.78 and 

 the area above this value is 3.7 percent. Thus by 

 not classifying approximately one-half of the 

 sample, the number of wrong classifications is 

 reduced to 3.7 percent. 



The area of indecision could be extended even 

 wider to further reduce the chance of error; how- 

 ever, if this procedure is carried too far, fish from 

 other rivers might introduce a bias that would 

 have to be considered. The assumption was 

 made earlier that only fish from the Hudson and 

 Connecticut Rivers were present in the sample; 

 however, any fish that do not belong to one of 

 these populations will be classified as though they 

 did. Therefore, any appreciable number of fish 

 from other rivers would cause additional errors. 

 From the tagging experiments mentioned pre- 

 viously, it would appear that a very small per- 

 centage of shad present off the New Jersey coast 

 do not belong to one of these two populations. If 

 this is of the order of 5 percent, it might have 

 little effect if all of the fish were classified. If a 

 large portion of the sample remains unclassified, 

 the errors introduced by these fish may be more 

 harmful than those due to misclassifying fish from 

 the two populations. 



Estimates of the relative abundance of a mixed 

 population can also be obtained. Three methods 

 of accomplishing this will be presented. The 



most obvious is to use the discriminant function 

 to classify each fish in the sample and then esti- 

 mate the composition of the population from the 

 composition of the sample. If there are only two 

 populations present, this method may be quite 

 satisfactory , but it does contain a bias. If a 

 fishery is sampled which contains individuals from 

 only one of these rivers, 19 percent of these fish 

 would be classified as coming from the other race 

 and the estimated composition would be 19 and 

 SI percent. Thus there would be a bias of 19 

 percent. If the region is modified so that the 

 relative abundance is estimated from the indi- 

 viduals which are more likely to be classified 

 correctly, then this bias will be reduced. By using 

 the region Hudson>74.10>Unclassified>70.94> 

 Connecticut the estimated composition of a sample 

 which contains only Hudson River fish is 



50 



-=-93.5 percent 

 50 + 3.7 



for a bias of 6.5 percent. If there are equal 

 numbers of Hudson and Connecticut River fish 

 present in a sample, then the errors of classification 

 would cancel and the bias would be zero. The 

 maximum bias would occur when a sample is 

 composed of fish from only one river. 



Another way of removing the bias is to assume 

 that the error of classification in the sample is the 

 same as the error in the discriminant function 

 (i. e., 19 percent). Then the number of fish 

 classified as Hudson River fish consists of 0.19 N c 

 and (1-0.19) N H or N H =0.19N C +(1-0.19)N H 

 where N c and N H are the numbers present in the 

 population. Similarly for those classified as Con- 

 necticut River fish the following relation exists: 



N C =(1-0.19)N C + 0.19N H 



Substituting sample values (N c and N H ), these 

 two equations can be solved for N c and N H which 

 can be used to determine the relative abundance. 

 A third estimate is obtained by using the follow- 

 ing formula (Rao 1952, p. 300) 



P = 



Xh-X s 



x H — x c 



where X H , X c and X s are the averages of the 

 discriminant function for the Hudson River, the 

 Connecticut River and the sample of the mixed 



