Clemento et al.: Evaluation of a single nucleotide polymorphism baseline for genetic stock identification of Oncorhynchus tshawytscha 113 
discriminate salmon populations, including mitochon- 
drial DNA polymorphisms (Cronin et ah, 1993), minis- 
atellites (Beacham et ah, 1996; Miller et ah, 1996), 
microsatellites (Seeb et ah, 2007; Moran et ah, 2013), 
amplified-fragment length polymorphisms (Flannery et 
ah, 2007) and, most recently, single nucleotide polymor- 
phisms (SNPs; Smith et ah, 2005a, 2005b; Aguilar and 
Garza, 2008; Narum et ah 2008; Abach'a-Cardoso et ah, 
2011; Clemento et ah, 2011). 
Genetic stock identification (GSI) typically proceeds 
in 2 steps. First, samples are collected from potential 
source populations and genotyped with a set of genetic 
markers in order to estimate population allele frequen- 
cies. These genotypes are called the “baseline.” Then, 
data from individuals sampled from a mixed-stock col- 
lection (called a “mixture”) and genotyped with the 
same set of genetic markers are compared with the 
baseline to estimate the relative proportions of indi- 
viduals that came from each of the represented source 
populations. Single individuals of unknown origin also 
can be assigned to specific populations. Maximum like- 
lihood or Bayesian methods typically are used to carry 
out GSI inference (Smouse et ah, 1990; Pella and Ma- 
suda, 2000). 
For the first large-scale baseline for GSI of Chinook 
Salmon allozyme markers were used (Teel et ah 1 ), but 
technical and logistical issues limited their future ap- 
peal. The allozyme database was supplanted in Canada 
by a microsatellite baseline developed by the Depart- 
ment of Fisheries and Oceans (Beacham et ah, 2006) 
and more broadly by a microsatellite baseline database 
developed through a large, international collaboration 
(Seeb et ah, 2007). This collaboration required enor- 
mous effort to standardize data across laboratories 
because microsatellite allele names and sizes usually 
are not consistent between different laboratories and 
genotyping equipment. 
The Seeb et ah (2007) microsatellite baseline has 
been an effective tool for GSI but has a number of dis- 
advantages: genotyping and scoring of microsatellites is 
labor-intensive; genotyping error rates can be relatively 
high, making the 13 microsatellites in that baseline in- 
adequate for applications such as pedigree reconstruc- 
tion (Anderson and Garza, 2006; Garza and Anderson 2 , 
Abadia-Cardoso et ah, 2013); missing data rates also 
1 Teel, D. J., P. A. Crane, C. M. Guthrie III, A. R. Marshall, D. 
M. Van Doornik, W. Templin, N. V. Varnavskaya, and L. W. 
Seeb. 1999. Comprehensive allozyme database discrimi- 
nates Chinook salmon around the Pacific Rim. NPAFC docu- 
ment 440, 25 p. [Available from Alaska Department of Fish 
and Game, Division of Commercial Fisheries, 333 Raspberry 
Rd., Anchorage, AK 99518.] 
2 Garza, J. C., and E. C. Anderson. 2007. Large scale parent- 
age inference as an alternative to coded-wire tags for salmon 
fishery management. In PSC genetic stock identification 
workshop: Logistics Workgroup final report and recom- 
mendations; Portland, OR, 15-17 May 2007 and Vancouver, 
Canada, 11-13 September 2007, p. 48-55 p. [Available from 
Pacific Salmon Commission, 600-1155 Robson St., Vancouver, 
BC V6E 1B5, Canada.] 
can be quite high; and, finally, any new laboratory that 
wishes to use that baseline must undertake a costly 
standardization process. Additionally, it now has been 
demonstrated that SNPs, despite typically having only 
2 alleles per locus, do have sufficient power to be em- 
ployed successfully in a GSI context with a modest num- 
ber of genetic markers (Smith et ah, 2007; Narum et ah, 
2008; Templin et ah, 2011; Larson et ah, 2013). 
Early simulation studies indicated that the bi-allel- 
ic nature of SNPs would make them less useful than 
highly polymorphic microsatellites for population dis- 
crimination (Bernatchez and Duchesne, 2000; Kalin- 
owski, 2004). However, SNPs are located throughout 
the genome and may be discovered in genetic regions 
with higher than average divergence (Nosil et ah, 
2009), increasing their utility for GSI. Moreover, SNPs 
do not suffer from many of the disadvantages of us- 
ing microsatellites: SNP markers are amenable to the 
automated, high-throughput genotyping required for 
large projects; SNP genotyping error rates are very low, 
making them suitable for pedigree reconstruction; and, 
importantly, SNP assays typically do not require stan- 
dardization between labs and, therefore, a SNP base- 
line is immediately useful to any group or agency that 
genotypes a mixture sample with the markers used in 
that baseline (Seeb et ah, 2011). 
Here, we describe the development and evaluation of 
a new baseline of SNP marker data for Chinook Salm- 
on in the southern part of their native range for use in 
ecological investigation in the California Current large 
marine ecosystem (and its tributaries) and in fisheries 
managed by the Pacific Fishery Management Council 
(PFMC). We introduce a panel of 96 SNP markers and a 
baseline of more than 8000 salmon from 68 populations 
of Chinook Salmon ranging from California to Alaska 
and a single collection of Coho Salmon (Oncorhynchus 
kisutch) from California. We describe the procedures 
used to select these SNP markers from among a larger 
number of candidates and document the resulting pat- 
terns of genetic differentiation between various popu- 
lations. We evaluate the power of this new baseline 
for GSI by both self-assignment (genetic identification 
of the most likely population of origin) and simulated 
mixture analyses, focusing on stocks commonly encoun- 
tered in PFMC fisheries. Finally, we analyze 2090 fish 
sampled in 2010 from the sport and commercial fisher- 
ies off the coast of California and compare the results 
of these analyses with the coded wire tag (CWT) data 
from these fish to demonstrate the effectiveness of this 
baseline for classifying individuals to specific manage- 
ment units. 
Materials and methods 
Baseline populations 
Populations were selected for inclusion in the new 
baseline to provide broad geographic coverage across 
