STATISTICS IN CLASSIFYING RACES OF SHAD 



279 



The proportion of Connecticut River fish which lie 

 in the area under the normal curve from — °° to 

 72.52 is equal to the probability of a normal deviate 

 of 



72.52-70.94 1.58 



1.78 



= 0.89 



The probability of this normal deviate is 0.81 

 (table 8); therefore, the error of misclassification 

 for the Connecticut River population is 19 percent. 

 This is also the error of misclassification for the 

 Hudson River population. This function will 

 correctly classify 81 percent or approximately 3 

 percent more than the simpler function first 

 investigated. 



Rao (1952) presents a test of significance to 

 determine if the calculated discriminant function 

 is better than some other assigned function. If 

 the assigned function is: 



z=x 1 +x 2 +x 3 +x 4 +x fi - r x 6 



then 



where 



V(Z) 



Di= 



(Z.-Z, ) a 

 V(Z) 



=V(X 1 )+V(X 2 )+V(X 3 )+V(X 4 )+V(X 5 ) + 

 V(X 6 )+2 cov (X,X a )+2 cov (X,X,) + 

 . . . +2 cov (X 6 X 6 ). 



Using values of \v„. 



Dl= 



12.9521 



(5.0771 



= 2.131 



To test if this function is as reliable as the one 

 derived from the data, the following must be 

 calculated: 



l+N 1 N 2 D7(N 1 +N 2 )(N, + N 2 -2) 

 l+N,N,D|/(N l +N0(N,+N,-2) 



004X91! 



(195)(193) V ; 



, (104)(91) 



1+- — - (2.13) 



(195) (193) V 



■1=0.169 



U(N. + N a -l-p) (0.169)(18 8) 

 r= — — = — — ==o.oo 



p-1 5 



F is a variance ratio with (p— 1) and (Ni + No— 

 ]) — 1) degrees of freedom. In the above instance, 



F has 5 and 188 degrees of freedom. This is a 

 highly significant value indicating that the calcu- 

 lated function is significantly better than the 

 simpler function. 



Since the above discriminant function was 

 based on the 1939 Hudson River sample, and the 

 1945 Connecticut River sample, the 1940 Hudson 

 River sample (table 12, appendix) can be used 

 to demonstrate how the function works. Values 

 for the Hudson River sample of 1940 were sub- 

 stituted in the formula: 



Y = 0.785X 1 + 0.577X 2 + 0.871X 3 + 0.234X 4 + 

 1.731X 5 . 



The resulting distribution of Y is tabulated in 

 table 9. It can be seen that only 16 out of the 105 

 values are below 72.52, which is very close to the 

 19 percent expected. The mean Y for this 

 sample is 74.25, which is in close agreement with 

 the value of 74.10 obtained for 1939. 



Table 9. — Frequency distribution of the discriminant func- 

 tion Y=0.785X 1 + 0.577X 2 +0.87lX i +0.g84X 4 + 1.731X s 

 for tlic 1940 Hudson River sample 



There are a number of assumptions upon which 

 the preceding techniques are based. The two 

 populations have to be multivariate normal 

 populations with equal variances and covariances. 

 It is assumed that the samples are large since 

 sample values are substituted for population 

 values when the discriminant function is calcu- 

 lated. There can be only two populations 

 present, and any future individual that is to be 

 assigned to one of these populations must belong 

 to one of them. Of course if a third population 

 is present with characters considerably different 

 from the two original populations, it may be 

 apparent that it represents a third group when 

 the discriminant function is used. 



The calculated discriminant function can be 

 used for two different types of situations. In 

 some studies one is interested in individuals (for 

 example, to obtain scale samples) and would like 

 to be certain that the fish chosen are from an 

 assigned population. In other studies, the rela- 



