420 FACER [CHAP. 19 



require normalization of the data. Based on these Tryon (1955) has published 

 a grouping procedure which involves matching matrix columns by eye. The 

 matching is then tested by determining how well correlation within and between 

 clusters reproduces the original correlations between individual variables. 



One result of all methods based on correlation coefficients is that they 

 separate species whose abundances are negatively correlated or uncorrelated 

 even though these species may be very often or always found together. Such 

 separations are reasonable in the case of the materials for which the methods 

 were developed — anatomical, sociological or psychological measurements — 

 but, for the purpose of identifying assemblages, it seems better to the writer to 

 use methods which put together species which are frequently part of each 

 other's environment and then to look at abundance relations within these 

 groups. This results if presence and absence are used as the basis of the index. 



For this purpose, no doubt the simplest indices are based on those proposed 

 by Jaccard (1912) and Sorensen (1948) for measuring the similarity between 

 floras. Both have a theoretical range from to 1 and measure the proportion 

 of total occurrences of two species which are co-occurrences: Jaccard's, c/(a + 

 b-c); Sorensen's, 2c/(a + b); where a is the number of occurrences of species 

 A, b is the same for species B and c is the number of joint occurrences of the 

 two species. Both indices have several serious limitations : if a and b are not 

 equal, the maximum possible value is less than 1 ; even if a and b are equal, 

 the maximum observable value of the index will not be 1 unless both species 

 are very abundant or the samples are very large because species of moderate to 

 low abundance will often be absent from small samples by chance ; neither 

 takes account of the number of occurrences, a value of 0.5 based on ten occur- 

 rences of each species is not distinguished from one based on several hundred 

 occurrences of each species although the latter is certainly more meaningful ; 

 as long as c and the sum of a and b remain constant, the values of the indices 

 do not change with changes in the relative sizes of a and b, yet the probability 

 of co-occurrence of randomly distributed organisms (probably also of non- 

 randomly distributed organisms) decreases as the frequencies a and b depart 

 from equality ; there is at present no way of calculating an expected value for 

 either index. The preceding list of limitations suggests that, despite their 

 simplicity, these two indices are apt to be misleading. 



Indices based on a 2 x 2 contingency table and the associated x 2 value avoid 

 many of the difficulties outlined above (Cole, 1949). They have, however, 

 certain properties which make them unsuitable as a basis for grouping : if two 

 species always occur together and also are found in most of the samples taken , 

 they will be unassociated by this procedure although they should certainly be 

 grouped together on the basis of being a frequent part of each other's environ- 

 ment ; conversely, if two species seldom occur together but are sufficiently rare, 

 they will be found to be associated. In order to overcome these drawbacks, 

 Fager (1957) proposed a modified version of the 2x2 table. Correspondence 

 with W. H. Kruskal has made it evident that the parent distribution of the 

 modified table is not the simple hypergeometric and, therefore, significance 



