24 



Swofford (1993) for definitions and descriptions of each]. These indices can refer either 

 to the fit of individual characters or of the data matrix as a whole (where they are referred 

 to as ensemble indices) to a given tree topology. Unless specified otherwise, the goodness- 

 of-fit statistics quoted herein always refer to the optimal, and not consensus solutions of 

 an analysis. 



The utility of the CI (and presumably the HI) is limited by it being inflated by 

 autapomorphic features (which can be corrected for as is done herein), as well as being 

 dependent on both the number of states a character possesses and the size of the data set 

 (Farris 1989; Wiley et al. 1991; Swofford 1993). Both the RI and the RC have been 

 designed to avoid these shortcomings; however, this latter property does allow the calcu- 

 lation of expected CIs for data sets of various sizes (see Sanderson & Donoghue 1989), 

 and hence a means to more objectively judge the quality of a solution. 

 Although Swofford (1993) indicates that the HI behaves slightly differently when 

 multistate taxa are interpreted with the "polymorphic" option (as change is now allowed 

 within the taxon terminals), this appears to be true for the other three indices as well. 

 Presumably, this derives in part from PAUP's failure to designate multistate ancestral nodes 

 under this option (see above). Therefore, identically polymorphic taxa within a clade will 

 each gain their identical second states by convergence within their respective terminals, 

 rather than via inheritance from a similarly polymorphic common ancestor. However, as 

 this scenario is the proper interpretation for distantly related taxa, the overall effect on a 

 given index will be dependent on the distribution of polymorphisms among the taxa. 

 It should be pointed out that these indices are merely different ways to indicate levels of 

 homoplasy in a solution. Unfortunately, the tendency in phylogenetic studies based on a 

 parsimony criterion is to automatically equate increased homoplasy with a poorer solution. 

 However, it is reasonable to expect that different groups will be characterized by different 

 levels of homoplasy, so that a high level of homoplasy may be diagnostic of the group 

 under study, rather than of a poor solution. Therefore, these indices should really be limited 

 to comparing different solutions for the same group. 



The bootstrap (Felsenstein 1985) 



The bootstrap is a non-parametric statistical procedure adopted for use in phylogenetic 

 analysis by Felsenstein (1985). It aims to infer the variability of an unknown distribution 

 (the true phylogeny) from which data were taken (the characters) by resampling with 

 replacement from the data. By taking a large number of replicates, one can estimate the 

 confidence interval of the original unknown distribution. Groups that are supported by a 

 large number of characters will be found in most solutions. The bootstrap frequency 

 indicates the proportion of all solutions that a particular clade was found in. 

 Despite its widespread use, the bootstrap has shown some problems in its adaptation to 

 phylogenetic analysis. [It apparently has a larger problem in that, despite concerted effort, 

 it has never been demonstrated to be a valid technique for those applications in which it 

 is supposed to be used (L.R. Linton pers. comm.).] These problems derive largely from 

 the key assumption that the data be independently drawn and identically distributed (i.e., 

 a representative, random sample of all possible characters) (Felsenstein 1985; Sanderson 



