Candy et al : Dividing population genetic distance data by partitioning optimization 
51 
indicated that considerably more iterations would be 
required to visit all set partitions. 
Compared with random set partitions, PORGS re- 
duces the number of cost function evaluations by elimi- 
nating redundant cluster combinations. However, if 
PORGS passes through all set partitions of the Chinook 
salmon data set, it would generate 5.8 x 10 12 evaluations 
of the cost function (19 th Bell number). Compared to the 
simulated data, where 72 = 10, the Chinook salmon data 
set requires a much larger search (Fig. 5). Evidently, 
the number of combinations grows rapidly with an in- 
creasing number of populations, but not as fast as n 
factorial. (Knuth, 2005). An exhaustive search would 
still take a long time to execute, even with recent ad- 
vances in computer processing speed. However, with 
bi-PORGS, the largest bipartition occurred for the first 
cluster with 77 = 19 members, where 262,142 evaluations 
were generated. 
During the bi-PORGS analysis for each value of k, the 
cost function minimizes the mean intracluster distance, 
then sums the means across all clusters. Because we 
were using the co-ancestory coefficient, 0, as a distance 
measure for the Chinook salmon data set, minimized 
cost function was referred to as the “mean sum theta” 
(1,6). For k = l to 19, bi-PORGS values (\CF) represent 
optimal membership for each of these groups (Fig. 6A). 
The two most northerly Chinook salmon populations, 
Marble River and Colonial River (within the Quatsi- 
no Sound grouping), formed the first bipartition when 
(20=0.23). When k = \ (20=0.11) the remaining 
single cluster divided into 1) Nootka Sound, 2) 
Clayoquot+Barkley sounds, and 3) southwest Vancou- 
ver Island groups. Next, San Juan separated from the 
southwest Vancouver Island group, and the Quatsino 
group of Marble River and Colonial River split at k = 6 
(20=0.076). When eight groupings were optimized (20 = 
0.046) the Robertson-Creek-derived populations sepa- 
rated from Gold River and Nahmint River populations, 
as well as the Clayoquot Sound populations; and Sarita 
River populations split from the southwest Vancouver 
Island populations. The Burman River population split 
from the Nootka Sound populations at /c = 10 (20=0.029), 
and Clayoquot Sound populations (Tranquil River and 
Kennedy River) separated from the Barkley Sound 
populations (Gold River and Nahmint River). At /c =12 
(20=0.019) Thornton Creek split from Robertson Creek 
and Somas River, and the Gold and Nahmint Rivers 
split apart. At /e = 14 (20=0.011) Sooke River split from 
the Nitinat River and Toquart River populations, and 
Tranquil River and Kennedy River populations split. 
The last few remaining splits separated Tahsis River 
and Conuma River, Nitinat River and Toquart River, 
and finally Somas River and Robertson Creek popula- 
tions (Fig. 6A). 
As with the simulated data, the relationship be- 
tween the number of groups and bi-PORGS evalua- 
tions decreases monotonically with increasing values 
of k (Fig. 6B); however, unlike the simulated situation, 
there appears to be more than one optimal point. The 
gap statistic indicates that the first optimum num- 
— PI 
P10 
Quatsino 
Sound 
Clayoquot 
Barkley 
Sound 
South- 
west 
Vancouver 
Island 
Ar=3 
Figure 3 
Dendrograms derived from (A) simulated genetic dis- 
tance data clustered by the unweighted pair-group 
method using arithmetic averages (UPGMA) and from 
(B) genetic distance data for 18 populations of Chinook 
salmon ( Oncorhynchus tshawytscha) from the west coast 
Vancouver Island clustered by using the co-ancestory 
coefficient 0. The dotted lines indicate corresponding 
groupings determined by bi-partitioning optimization 
using restrictive growth strings (bi-PORGS). 
