characters. For more he recommends an alter- 

 nate which makes the nature of Ur apparent to 

 those not acquainted with matrix analysis . He 

 starts with the pooled estimates of the intra- 

 group correlations^ ii and standard deviations. 

 Then he constructs a table of the normalized 

 mean values Xj * ~ Xp for the p characters 

 in each group- -in other words the difference of 

 each mean from the grand mean for the char- 

 acter divided by the intragroup standard 

 deviation. The normalized mean values are 



then transformed to values Y, Y , which 



1 P 



are uncorrelated, and subsequently to other 



values yi " " " y D , which have unit standard 

 deviation. The general formulas are 



Y = x - a 



P P PP-1 



- a P l Y l < 17 > 



a.. = b^ ji_ i-1 



V(Yj) 

 1 

 b ij = Aij — 2T a.. b ; 



t = j-1 

 i-1 



V(Y.) = Aii g 



y, 



j=i 



Yi 



jt "It 



a., b.. 



(18) 



(19) 



(20) 



(21) 



V(Yj) 



The values a., and b^- are convenient inter- 

 mediate steps in the computations, V(Yj) is the 

 variance of Yj. and y^ is the final transformed 

 value of the normalized mean . 



The meaning of the computations is more 

 readily apparent from the simplified formulae 

 for the first two transformed means. The first 

 is 



yi = Y. 



(22) 



1 1 



or no transformation at all. Then the general- 

 ized distance D between two samples using the 

 first character is merely the difference between 

 the normalized means. The second is 



/ 



y 2 y v < Y 2> 



= Y 2 = X 2 



A 21 Y 1 



(23) 



or the second character is reduced by the amount 

 of correlation with the first and then adjusted to 



unit standard deviation by dividing by the square 

 root of the variance (the standard deviation) . 

 The subsequent formulae become much more 

 complicated because of the need for accounting 

 for all of the intracharacter correlations. The 

 reader is referred to Rao (1952) and Mahanalobis 

 et al. (1949) for a complete explanation. 



The applications of the generalized dis- 

 tance have all concerned studies which used 

 counted characters or measured characters con- 

 sidered to be independent of total size . In other 

 words, no regression was involved. In our tuna 

 studies, where regression is involved, we sub- 

 stitute the estimated mean length at a given size 

 for the mean, the standard deviation from regres- 

 sion for the standard deviation, and the intra- 

 group partial correlation coefficient (independent 

 of total length) for the simple correlation . The 

 multivariate analysis of tuna populations will be 

 the subject of ensuing papers. 



THE SAMPLING PROBLEMS 



Before we turn to a comparison of mor- 

 phological and tagging studies it is necessary to 

 discuss the sampling problems. No matter how 

 good our statistical treatment of the data, the 

 inferences which we draw can be no better than 

 the sampling. Here is a problem of special dif- 

 ficulty not always carefully considered by those 

 concerned with tuna morphometric data. Most 

 of our discussion is based on the recent treat- 

 ment of sampling by Cochran (1953) and Cochran 

 et al. (1954). 



A good sample must be a random one or 

 some modification which does not change the 

 basic principle that every individual in the popula- 

 tion has an equal or known chance of selection. 

 A random sample is a mathematically precise 

 concept. Its importance is becoming widely 

 recognized because not quite random samples 

 are found to be unreliable . 



If we are to obtain precisely a random 

 sample we must first accurately specify the popula- 

 tion which is to be sampled. In the usual taxonomic 

 study this cannot be done for the biological popula- 

 tion because the limits are not known and the 

 purpose of the study is frequently to describe them. 

 We can, however, call the biological populations 



24 



