45 
Abstract — A new method of finding 
the optimal group membership and 
number of groupings to partition 
population genetic distance data is 
presented. The software program Par- 
titioning Optimization with Restricted 
Growth Strings (PORGS), visits all 
possible set partitions and deems 
acceptable partitions to be those 
that reduce mean intracluster dis- 
tance. The optimal number of groups 
is determined with the gap statis- 
tic which compares PORGS results 
with a reference distribution. The 
PORGS method was validated by a 
simulated data set with a known dis- 
tribution. For efficiency, where values 
of n were larger, restricted growth 
strings (RGS) were used to bipar- 
tition populations during a nested 
search (bi-PORGS). Bi-PORGS was 
applied to a set of genetic data from 
18 Chinook salmon (Oncorhynclius 
tshawytscha ) populations from the 
west coast of Vancouver Island. The 
optimal grouping of these populations 
corresponded to four geographic loca- 
tions: 1) Quatsino Sound, 2) Nootka 
Sound, 3) Clayoquot +Barkley sounds, 
and 4) southwest Vancouver Island. 
However, assignment of populations 
to groups did not strictly reflect the 
geographical divisions; fish of Barkley 
Sound origin that had strayed into 
the Gold River and close genetic simi- 
larity between transferred and donor 
populations meant groupings crossed 
geographic boundaries. Overall, stock 
structure determined by this parti- 
tioning method was similar to that 
determined by the unweighted pair- 
group method with arithmetic aver- 
ages (UPGMA), an agglomerative 
clustering algorithm. 
Manuscript submitted 26 March 2008. 
Manuscript accepted 5 September 2008. 
Fish. Bull. 107:45-56(2009). 
The views and opinions expressed 
or implied in this article are those 
of the author and do not necessarily 
reflect the position of the National 
Marine Fisheries Service, NOAA. 
Dividing population genetic distance data 
with the software Partitioning Optimization 
with Restricted Growth Strings (PORGS): 
an application for Chinook salmon 
(Oncorhynchus tshawytscha), 
Vancouver Island, British Columbia 
John R. Candy (contact author ) 1 
R. Gregory Bonnell 2 
Terry D. Beacham 1 
Colin G. Wallace 1 
Ruth. E. Withler 1 
Email address for contact author: John.Candy@dfo-mpo.gc.ca 
1 Molecular Genetics Laboratory 
Department of Fisheries and Oceans, Pacific Biological Station 
3190 Hammond Bay Road 
Nanaimo, British Columbia, Canada V9T 6N7 
2 Oceans and Habitat Enhancement Branch 
Department of Fisheries and Oceans 
4166 Departure Bay Road 
Nanaimo, British Columbia, Canada V9T 4B7 
Genetic diversity in salmon species 
is thought to be maintained through 
high homing fidelity, which limits 
gene flow between spawning sites 
(Ricker, 1972; Quinn and Dittman, 
1990). As a general rule, populations 
that are geographically close tend to 
be genetically similar, creating natu- 
ral clusters of similar populations. 
Identification of genetically similar 
salmonid populations is important 
for fisheries management initiatives 
directed at conserving genetic diver- 
sity (Riddell, 1993; Waples et ah, 
2001). Consequently, managers are 
faced with the challenge of defining 
the number and size of these genetic 
groups. Furthermore, determining 
valid groupings of populations at a 
fine scale allows managers to make 
informed decisions regarding harvest 
levels and population-enhancement 
strategies. For British Columbia Chi- 
nook salmon (Oncorhynchus tshaw- 
ytscha) populations, genetic markers 
have been used to determine genetic 
distance between populations and to 
provide considerable power for defin- 
ing regional stock structure (Teel et 
ah, 2000; Beacham et al., 2006a). 
Clustering or grouping data are 
useful in many disciplines; as a re- 
sult there is a wide assortment of 
methods available for representing 
data, measuring proximity between 
data elements, and grouping elements 
(e.g., Jain et al., 1999). For Pacific 
salmon, population-specific allelic fre- 
quencies are ascertained from spawn- 
ing ground samples by using genetic 
markers at a number of loci. From 
these allelic frequencies, a metric of 
overall genetic difference between 
populations is used to estimate pair- 
wise genetic distances. Three com- 
monly used distance measures are 
Nei’s distance, D s (Nei, 1987), Nei’s 
modified Cavalh-Sforza chord distance 
D a (Cavalli-Sforza and Edwards, 
1967; Nei et al., 1983), and Weir and 
Cockerham’s (1984) estimator of F sP 
the coancestory coefficient 9. Once a 
distance measure is selected, a prox- 
imity matrix is created which shows 
genetic distance between each pair of 
populations. 
Clustering is often used to group 
populations, either by merging small 
clusters into larger ones (agglomera- 
tive) or by splitting larger clusters 
