Seyoum et al.: Genetically determined population structure of Lachnolaimus maximus in the southeastern United States 
445 
Data analysis 
Standard genetic measures and distances Data files for 
use in GENEPOP, vers. 4.3 (Rousset, 2008) were gen- 
erated from fragment sizes recorded with the Micro- 
satellite Toolkit add-on, vers. 3.1.1 (Park, 2001; avail- 
able at website) for Microsoft Excel (Microsoft Corp., 
Redmond, WA); this GENEPOP data file was converted 
to other formats as needed with the conversion tool 
PGDspider, vers. 2.0. 1.9 (Lischer and Excoffier, 2012). 
Pairwise genetic distances (Egx) between sampling ar- 
eas (Weir and Cockerham, 1984) were estimated with 
10,000 permutations with the software program GE- 
NETIX (Belkhir et ah, 2000). Departures from Hardy- 
Weinberg equilibrium were determined with GENE- 
POP. Sequential Bonferroni corrections were applied to 
multiple tests of hypotheses (Rice, 1989). Observed ( H 0 ) 
and expected heterozygosity (H e , with and without a 
bias correction), averaged over all loci, were obtained 
from GENETIX (Belkhir et ah, 2000). Null allelism was 
investigated by using the randomization test of Guo 
and Thompson (1992) and the 17-test statistic of Ray- 
mond and Rousset (1995), with the software program 
ML-NullFreq (available at website). For each locus, mi- 
crosatellite variation was quantified in terms of genetic 
diversity, number of alleles, and allelic richness — a di- 
versity measure that corrects for differences in sample 
size, with the program FSTAT, vers. 2. 9. 3. 2 (Goudet, 
2001). Chi-square tests were performed to determine 
whether sampling areas differed significantly from the 
previously described standard genetic measures. 
Genetic structure Genetic data from specimens collect- 
ed from the 9 sampling areas were examined with 3 
analytical approaches. The first was based on principal 
coordinate analysis (PCA) to discriminate genetic clus- 
ters within the data by using the program GenAlEx, 
vers. 6.5 (Peakall and Smouse, 2006, 2012). The data 
were plotted at the first 2 primary coordinates on the 
basis of pairwise Egx values for sampling areas (Lat- 
ter, 1972) computed without sample size bias correction 
(uncorrected Egx; Nei, 1973) and with sample size bias 
correction (corrected Egx; Nei and Roychoudhury, 1974; 
Nei, 1987) with the software POPTREE2 (Takezaki et 
ah, 2010). 
The second method of examining the genetic struc- 
ture was based on analysis of molecular variance 
(AMOVA) as implemented in the software program AR- 
LEQUIN, vers. 3.5. 1.3; 100,000 permutations (Excoffier 
and Lischer, 2010). Essentially a method to determine 
the strength of the PCA groupings, AMOVA assesses 
the best grouping of sampling areas into clusters. In 
the a priori hierarchical approach with AMOVA, cor- 
relations among genotypes at various levels are parti- 
tioned as E-statistics. Initially, the a priori hierarchical 
structure that was analyzed was based on the genetic 
groupings revealed by PCA. To find the greatest Egx 
between groupings, we constructed 2 combinations of 
3-clusters that placed sampling area 6 in either cluster 
1 or cluster 2 on the basis of corrected or uncorrected 
Egx values indicated by PCA. After this analysis, Egx 
values for the 2-cluster combinations were also as- 
sessed by omitting one cluster at a time. The propor- 
tions of variation were computed among clusters (E cx)> 
within clusters (Fqq), and within sampling areas (Egx), 
and the E-statistic was assessed by the permutation 
method of Excoffier et al. (1992). 
The third analytical approach was based on the 
Bayesian population-assignment algorithm as imple- 
mented in the program STRUCTURE, vers. 2.3.4 
(Pritchard et al., 2000). With this algorithm, individu- 
als were probabilistically and proportionally assigned 
to one or more genetic clusters (K) in a manner that 
minimized Hardy- Weinberg and linkage disequilibria 
among their multilocus genotypes. For K= 1 through 
K= 9, 10 simulations were conducted by using 2 million 
Markov chain Monte Carlo replicates after a burn-in 
period of 1 million runs. We adopted the admixture 
model and the independent allele frequency option to 
minimize the chance of overestimating the number of 
clusters present in the data (Pritchard et al., 2009). 
We used STRUCTURE HARVESTER, vers. 0.6.93 
(Earl and vonHoldt, 2012) with each of the previously 
described replicate runs to compute the ad hoc statis- 
tics L (K) and A K so that we could determine the most 
plausible base value for K clusters ( i.e. , the upper-level 
hierarchy). UK) denotes the log probability of the data 
at a given modeled K value; A K is based on the rate of 
change in L (K) between successively modeled K values. 
Simulation studies (Evanno et al., 2005) have shown 
that A K provides the most accurate indication of ge- 
netic structure under a variety of modeling conditions. 
We then used CLUMPP, vers. 1.1.2 (Jakobsson and 
Rosenberg, 2007) to determine the optimal alignment 
for replicate analyses and mean genomic membership 
coefficients across replicate runs for sampling areas 
and individuals. The coefficients of the CLUMPP out- 
put were plotted with Microsoft Excel. 
Mantel test To determine whether genetic relation- 
ships among sampling areas conformed to a pattern 
of genetic isolation by distance (Wright, 1943; Malecot, 
1955), we computed the Mantel correlation coefficient 
(r) between Egx and geographic distance (measured in 
kilometers) with the program GenAlEx, vers. 6.5 (Peak- 
all and Smouse, 2006, 2012). The significance of r was 
tested by using 9000 random permutations. 
Effective population size The effective population size 
(N e ) of each cluster was estimated with the program 
NeEstimator, vers. 2 (Do et al., 2013) under the model 
option with the Burrows method to estimate linkage 
disequilibrium (Hill, 1981; Waples, 2006). This ap- 
proach has been shown to give generally unbiased es- 
timates of linkage disequilibrium from which estimates 
of N e can be derived (Robinson and Moyer, 2012) with 
95% confidence intervals on the basis of the parametric 
procedure of Waples (2006). Bias due to low-frequency 
alleles was avoided by estimating N e from alleles with 
frequencies greater than 1% and 2% and also by omit- 
