VENRICK: PERCENT SIMILARITY: THE PREDICTION OF BIAS 



APPENDIX 



Derivation of Formulae fori and a 2 (I), the Percent Similarity Index and 



Its Variance 



The Percent Similarity Index 

 General Case 



In the equation defining the percentage similarity 

 index, 



1 - 0.5 



£I£(P,,!) 



1=1 



E(P,, 2 ) 



(1) 



where n is the total number of species in samples 1 

 and 2,p, i andp, 2 are the proportions of species i in 

 samples 1 and 2, and the expression \p lA — p i2 \ is the 

 range (w) of a sample of size two and its expected 

 value can be related to the standard deviation of the 

 underlying normal population by the equation a, = 

 0.8862 w, (Dixon and Massey 1969, table A-8b (2)). 

 Thus 



£[lp u -A, 2 l] = <r(p,.)/0.8862. 



Substitution of this expression in Equation (1) 

 gives 



1=1- 0.5642 Z <7(p,). 



(2) 



The proportional abundance of species i,p it is the 

 ratio of the abundance of that species, x t (or p,) to the 

 total number of individuals in the sample, T (or z). 

 The variability of p, is a function of the variance of x, 

 and the variance of T. When variances are small rela- 

 tive to mean values, the variance ofp, may be approx- 

 imated by 



tffa) = \f-tf(x,) - 2p t ra 2 {x„ T) + rfa> (T) |/r» (3) 



and 



a (p) = {[rV(*,) - 2p. l v<j\xlT)+p^{T)\/r i Y l (4) 



(Yates 1953; the equation may also be derived using 

 the differential theory of variances, or delta method, 

 Seber 1973). 



The substitution of Equation (4) into Equation (2) 

 gives an equation for/: 



Single Sample Case 



In order to estimate / from a single sample, some in- 

 dependent method of estimating a 2 (x), (^{T), and 

 a 2 {x l T) must be available. In the following deriva- 

 tion, two assumptions are made: 1) The variance can 

 be expressed as a function of the mean, e.g., cr 2 (x,) ~ 

 (q) (jut,-); and 2) species are independently distributed 

 so that (f(x,,T) = a 2 (x t ). Values of x, and T from a 

 single sample are unbiased estimates of p., and r. 



When the above approximations are introduced in- 

 to Equation (4), the expression for a{p,) becomes 



and 



aip) = [(q/DiTx.-xf)}* 



Z aip,) = Z \(q/T){Tx,-x]T (6) 



i=i i'=i 



The accuracy of Equation (6) was examined over a 

 spectrum of values of q and T using computer simu- 

 lation. Associations of 10 species with prescribed 

 means and variances were sampled 10 times. The 

 abundances of the species in each sample were con- 

 verted to proportions and, for each species, the stan- 

 dard deviation of these proportions within the 10 

 samples was calculated. These observed standard 

 deviations were then summed over all species to give 



n 



one simulated value of Z a(p). For comparison, the 



1=1 



observed values of jc, and Tfrom each sample and the 

 prescribed value of q were entered into Equation (6) 



n 



to give 10 predicted estimates of 1%). Each set 



1=1 



of 10 samples was repeated 10 times in a run. Over 44 



runs, sampling associations with a broad range of 



diversities and values of q from 0.1 to 50, the mean 



n 



relative error and bias of the estimate of Z a(p) 

 were 2.9 and —2.5%, respectively. 



Substitution of Equation (6) into Equation (2) gives 

 an expression for? in which all parameters may be es- 

 timated from one sample: 



/= 1 



0.5642 



v 



Zl 

 1=1 



\zW{x) 



- 2/x,ra 2 (x, , T) + p&(T) \ }"' 



(5) 



1= 1 - 0.5642 (q/r)' 



z 



1=1 



(Tx, - xj)\ 



The factor 0.5642 is expected to be increased some- 

 what by the demonstrated bias in the estimation of 



385 



