Clemento et al.: Evaluation of a single nucleotide polymorphism baseline for genetic stock identification of Oncorhynchus tshawytscha 125 
the mixture proportions of Central Valley fall, Central 
Valley winter, California Coast, Klamath River, and 
Rogue River reporting units (Fig. 2) were no more vari- 
able than were estimates that would have been obtained 
if every fish had carried an unambiguous reporting-unit 
tag. Estimates of mixing proportions for Central Valley 
spring, North California/South Oregon, and Mid Oregon 
Coast reporting units were somewhat more variable but 
appeared to be nearly unbiased. In the ocean fishery 
sample, assignments of more than 1000 individuals to 
reporting unit, determined with our baseline, were high- 
ly concordant (98.95%) with the CWTs recovered from 
the same fish. This SNP baseline, therefore, represents 
an important addition to the technologies available to 
managers and researchers. 
Methodological considerations 
Management of salmon fisheries in the Pacific Ocean 
off North America can be roughly divided into 3 fisher- 
ies by region: California and Oregon fisheries, managed 
by the Pacific Fishery Management Council (PFMC); 
Washington, British Columbia, Canada, and southeast- 
ern Alaska fisheries, subject to the international Pacific 
Salmon Treaty, reporting to and regulated by the Pa- 
cific Salmon Commission; and fisheries farther north 
and west in Alaska that are managed by the state, 
with salmon bycatch under the purview of the North 
Pacific Fishery Management Council. The genetic base- 
line described here was designed primarily to identify 
fish caught in PFMC ocean fisheries and in ecological 
investigations in the southern portion of the California 
Current ecosystem and its associated tributary rivers 
and streams. We have shown that it performs well in 
this area but, because of an ascertainment strategy 
during SNP discovery that included individuals from 
the Columbia River and British Columbia (Clemento 
et ah, 2011), the baseline also has sufficient statisti- 
cal power to identify the source of some fish from else- 
where in the North American range of this species. 
We observed high rates of self-assignment to report- 
ing unit for all regions represented in the baseline, 
although some reporting units clearly were composed 
of populations with minimal differentiation from each 
other. Moreover, the utility of our baseline could be ex- 
tended effectively by simply genotyping the same panel 
of SNPs on additional populations in those regions, de- 
spite the reduced heterozygosity and mean number of 
alleles (Table 1), and presumably statistical power in 
our baseline, for populations from Canada and Alaska. 
Other SNP baselines for Chinook Salmon also have 
been described or are being constructed. Templin et al. 
(2011) described a 45 SNP locus baseline for popula- 
tions in the northern and western parts of the Chinook 
Salmon range, designed primarily for GSI of popula- 
tions from western and southcentral Alaska. This same 
baseline was used also to probe the seasonal distribu- 
tion and migration pattern of Chinook Salmon in the 
Bering Sea and North Pacific Ocean (Larson et al., 
2013). Despite the presence of 14 populations from 
California, Oregon, and Washington in that baseline, 
Larson et al. (2013) appropriately emphasized that 
resolution of those southern populations is sufficient 
only for broad-scale assignments. Similarly, Warheit et 
al. 3 described the marker selection for eventual devel- 
opment of a SNP baseline for application to fisheries 
managed by the Pacific Salmon Commission. 
Although the existence of multiple regional base- 
lines is likely to expand, it still will benefit the entire 
community of fishery managers and scientists to care- 
fully design marker panels with as much overlap as 
possible. It is conceivable that 2 or 3 panels of 96 SNPs 
could provide the level of resolution needed for identi- 
fication throughout the range of Chinook Salmon. Al- 
ternatively, as next-generation sequencing techniques 
mature, genotyping-by-sequencing (GBS) approaches 
might yield data for GSI at a lower cost than that with 
current genotyping techniques. A GBS approach could 
be used to simultaneously genotype all of the SNPs in 
each of the regional baselines, allowing mixed-stock 
analysis throughout the range of this species. 
Inclusion of the species-diagnostic marker and Coho 
Salmon sample in the baseline provided insight into 
the prevalence of misidentification of Coho Salmon in 
ocean fisheries. In the 2010 fishery off California, 7 fish 
sampled as Chinook Salmon were found to be Coho 
Salmon. Without methods to identify Coho Salmon, 
the baseline would assign them with erroneously high 
confidence to a northern, low-heterozygosity Chinook 
Salmon population (data not shown). This problem is 
characteristic of most statistical methods for perform- 
ing GSI: if an individual’s true population of origin is 
not included in the baseline, then even if all the pop- 
ulations in the baseline are very poor candidates for 
that fish’s origin, that fish might still be assigned with 
high posterior probability to one of the populations. 
This situation occurs when one population is much 
more likely to be the population of origin, than any of 
the other incorrect populations, even if it is not a likely 
origin for that individual on an absolute scale. 
We introduced a simulation-based z-score method, 
implemented in gsi_sim, to identify fish that likely 
have not originated from populations in the baseline. 
An alternative, Bayesian nonparametric approach to 
dealing with fish from populations not in the baseline 
identifies those fish and estimates the allele frequen- 
cies in their (unrepresented) source population (Pella 
and Masuda, 2000). That approach is appropriate 
particularly when large numbers of fish are sampled 
from each of the populations that are not included in 
the baseline and when the unrepresented populations 
are quite divergent from all of the populations in the 
baseline. 
We chose the z-score approach over the Bayesian 
nonparametric approach for 3 main reasons: 1) it is 
computationally fast and simple (there are no conver- 
gence problems that might be difficult to detect); 2) our 
baseline was sufficiently comprehensive for stocks that 
