118 
Fishery Bulletin 112(2-3) 
For each locus, k, the observed relative frequencies, 
Pik and qq k , of the 2 SNP alleles were calculated for 
each population, i, in the training set. These values 
then were used to compute the expected probability of 
misassignment, P(MiSjj k ), between every pair of popu- 
lations i and y with only a single locus k: 
P(Misjj k ) = 0.5 [§(p!kSpjk)pik 2 + S(Pik<7ik^Pjk<7jk)2pik9ik 
+ §(<?ik^jk)<?ik 2 + 
SfPik-PjklPjk 2 + 5(pikQ , ik-Pjk0jk)2pjk<?jk 
+ 8((?ik>qjk)<?jk 2 ], 
for all k where 5(x) = 1 if the condition x is true and 0 
if otherwise. 
The values of P(Misij k ) were used to rank the loci 
for their suitability for resolving between populations i 
and j in GSI; a lower P(Misij k ) indicates better resolv- 
ing power. 
The rankings derived from P(MiSy k ) values were 
combined with other criteria in a nonautomated process 
to select the final panel of loci (Table 2). Each SNP as- 
say was evaluated for scorability and evidence of Har- 
dy- Weinberg disequilibrium and linkage disequilibrium 
(LD). Assays with overly dispersed clusters, more than 
3 clusters, or inadequate spacing between clusters were 
excluded. Loci with significant deviations from equilib- 
rium expectations also were removed. SNPs with large 
differences in allele frequencies between populations 
are particularly effective for GSI, whereas SNPs with 
high minor allele frequencies (MAFs) are most useful 
for parentage analysis (Anderson and Garza, 2006). 
The remaining 168 loci were then ranked by their 
MAFs in hatchery populations to be included in pedi- 
gree reconstruction studies (see Discussion section). 
Previous simulations indicated that about 100 loci 
with an MAF >0.2 would be required to achieve the 
necessary statistical power to assign parentage with 
sufficiently low false-negative and false-positive rates 
(Anderson and Garza, 2006). However, the observed 
MAFs for many loci were in fact >0.2 (and as high as 
0.5), indicating that the desired statistical power could 
be achieved with fewer loci. Therefore, we selected the 
70 loci with the highest MAF in the Feather River 
population, the primary target for subsequent parent- 
age investigations. We then used the P(Misjj k ) rank- 
ings to select 25 additional loci that were useful for 
distinguishing between difficult-to-resolve populations 
and reporting units. Finally, an assay to discriminate 
between Chinook and Coho salmon was included as the 
96th assay for genotyping with the Fluidigm 96.96 Dy- 
namic Arrays. 
Population genetics analyses 
The 7669 samples that were not in the training set for 
locus selection were genotyped with the final panel of 
96 SNPs and used as the holdout set in subsequent 
power analyses (see the next section). This holdout 
set also was used for standard population genetics 
analyses. We tested each locus-population pair for de- 
viations from Hardy-Weinberg equilibrium (HWE) with 
the complete enumeration method (Louis and Demp- 
ster, 1987) in GENEPOP software, vers. 4.0 (Rousset, 
2008). Similarly, in each population, all pairwise locus 
combinations were investigated for LD. Default Markov 
chain parameters were used, except for the number of 
batches, which was increased to 500 to reduce the stan- 
dard error to acceptable levels (<0.02; Rousset, 2008). 
Genetic differentiation (Fst) was estimated (with 
0 of Weir and Cockerham, 1984) between all pairs of 
populations with the software package GENETIX, vers. 
4.05 (Belkhir 5 ). The data set was permuted 1000 times 
to determine the significance of F sx estimates. Phylo- 
geographic trees were constructed with the chord dis- 
tance (DCE) of Cavalli-Sforza and Edwards (1967) and 
the neighbor-joining algorithm in the software package 
PHYLIP, vers. 3.69 (Felsenstein 6 ) and were visualized 
with Dendroscope software, vers. 3.2.10 (Huson et ah, 
2007). Majority-rule consensus values were calculated 
from 10,000 bootstrap samples of the data through the 
use of the PHYLIP component CONSENSE. The Fqj 
values and genetic distances computed are expected to 
provide an inflated estimate of the isolation between 
populations because the SNP loci used in our analyses 
were not a random sample from the genome; some SNP 
loci were chosen for their power in resolving specific 
population pairs in our baseline. Nonetheless, these es- 
timates are useful for assessment of the relative genetic 
differentiation among the populations described here. 
Power analyses 
We used 3 different methods to assess the power of 
the SNP baseline for GSI. First, we performed a self- 
assignment analysis, and subsequently we generated 
and analyzed simulated mixtures with 2 different 
procedures. 
In self-assignment analysis, allele frequencies for 
each potential source population generally are esti- 
mated from the samples. Then, for each individual, 
the probability that its genotype would occur in each 
population (assuming Hardy-Weinberg and linkage 
equilibria) is calculated, and the individual is assigned 
to the population for which its genotype probability is 
highest. We used the likelihood method of Rannala and 
Mountain (1997), implemented in the software gsi_sim 7 
(Anderson et ah, 2008), to compute the genotype prob- 
5 Belkhir, K., P. Borsa, L. Chikhi, N. Raufaste, and F. Bonhom- 
me. 1996-2004. GENETIX 4.05, logiciel sous WindowsTM 
pour le genetique des populations. Laboratoire Genome, 
Populations, Interactions, CNRS UMR 5000, LTniversite de 
Montpellier II, Montpellier, France. [Available from http:// 
kimura.univ-montp2.fr/genetix.] 
6 Felsenstein, J. 2005. PHYLIP (Phylogeny Inference Pack- 
age), vers. 3.6. Department of Genome Sciences, Univ. Wash- 
ington, Seattle. [Available from http://evolution. genetics. 
washington.edu/phylip.html.] 
7 Available from http://swfsc.noaa.gov/textblock.aspx7Division 
=FED&ParentMenuID=54&id= 12964. 
