158 
Fishery Bulletin 99(1 ) 
and Teel, 1990), as would stock-mixture samples if the 
populations segregated (McKinnell et al., 1997). The 
Bayes method opens the way for checking that the models 
fit. Lack of fit is indicated if the observed samples are 
unusual realizations of the Bayes posterior predictive 
distribution (Gelman et al., 1995). Test statistics should 
be designed to detect suspected problems in stock-mix- 
ture analysis: unrepresentative samples, unsatisfactory 
priors, presence of extra stocks, etc. In particular, the hap- 
lotype, allele, or genotype counts in the actual baseline 
samples should not be outliers of their corresponding pre- 
dictive distribution. When violations to the assumptions 
are detected, the posterior distribution of stock propor- 
tions and baseline genetic parameters would be mislead- 
ing. New samples drawn by improved design, or alternate 
sampling models, could be needed to make the stock-mix- 
ture analysis trustworthy. 
Samples are easily drawn from the posterior predictive 
distribution. The kth predictive baseline sample for the 
HAG counts in a stock is simply a multinomial sample 
with size equal to that of the actual sample and with prob- 
abilities equal to the HAG RFs in the /eth posterior sam- 
ple. The Mh predictive stock-mixture sample is obtained 
in two steps. First, a multinomial sample of M individuals 
identified to stock is drawn, with probabilities equal to the 
stock proportions in the /?th posterior sample. Second, the 
stock-mixture genotype of each individual is generated bv 
sequentially drawing the HAGs of the multiple characters 
by using the HAG RFs for its stock from the kth. posterior 
sample. 
Applications 
Two applications are considered next to illustrate use of 
the Bayesian method. In the first application, large num- 
bers of mtDNA haplotypes are present in the baseline and 
stock-mixture samples and pose special difficulty in analy- 
sis. The fairly common availability of mtDNA data makes 
this application of general interest. In the second applica- 
tion, only one of two populations in a stock mixture could 
be sampled separately. The Bayesian solution for the miss- 
ing baseline samples from the second population should be 
of special interest to biologists concerned with assessing 
stock mixtures of anadromous and resident populations 
in streams (Busby et al., 1996; Michael, 1983), and of gen- 
eral interest for extensions to the standard stock-mixture 
analysis. 
Example 1 : mtDNA samples from harbor porpoise (Pho- 
coeno phocoena) of the northwest Atlantic Ocean (Rosel 
et al., 1999) Rosel et al. ( 1999 ) obtained mtDNA sequence 
data for samples from four summer breeding popula- 
tions — Gulf of Maine-Bay of Fundy, Gulf of St. Lawrence, 
Newfoundland, and West Greenland — of harbor porpoises 
in the northwest Atlantic and from a wintering group 
along the mid-Atlantic states. The authors were reason- 
ably certain that the wintering group comprised one or 
more of the summer populations. Because of special con- 
servation concerns for the Gulf of Maine-Bay of Fundy 
population, the authors wished to determine if it alone 
could have been the wintering group. Contingency table 
analysis of the mtDNA haplotype frequencies indicated 
only that the Gulf of Maine-Bay of Fundy population was 
almost surely not alone (P<0.06), if at all present. Rosel 
et al. (1999) used a stock-mixture analysis by the CML 
method to attempt to delimit the population contributions 
better with the mtDNA data. Here the Bayesian method is 
applied to the same data for comparison. Summer sample 
sizes for each of the populations were between 40 and 80 
individuals, and the winter sample size was 41. A total of 
67 distinct haplotypes was observed in the summer sam- 
ples, and the winter sample of 41 individuals included 
an additional 8 singleton haplotypes previously unseen. 
Among the total of 253 individuals of all samples, the five 
most numerous haplotypes were represented by 45 (18%), 
42 (17%), 15 (6%), 9 (4%), and 7 (3%) individuals. Most 
haplotypes were sporadic in samples; the most common 
counts in the summer and winter samples being 0 and 1. 
The occurrence of a few fairly common and many scarce 
haplotypes is characteristic of mtDNA data (Xu et al., 
1994) and poses special difficulty in estimation. For exam- 
ple, under the CML method, stock-mixture haplotypes 
contributed by a particular stock have another apparent 
source if absent from its baseline sample. 
Four chains of samples were generated by data augmen- 
tation with both the empirical Bayes and pseudo-Bayes 
methods for specifying the baseline prior “count” param- 
eters. The total prior “sample size,” [i. computed by 
the methods was 22 (pseudo-Bayes) and nearly 2000 (em- 
pirical Bayes). An initial pilot chain of 235 samples was 
analyzed by using the Fortran implementation of gibbsit, 
which indicated that chains of 2012 samples should be 
run (given <7=0.975, r=0.02, and s=0.95). The four chains 
were begun with diverse values for population propor- 
tions: one chain was begun for each population, with it 
composing 0.95 of the stock mixture and the other three 
populations composed equal parts (thirds) of the remain- 
der (0.05). The four chains had mixed sufficiently, or con- 
verged, by their second halves so that the Gelman-Rubin 
shrink factors were less than 1.03 for any one population. 
The samples from the second halves were pooled to rep- 
resent 4024 draws from the posterior distribution. Predic- 
tive baseline samples were generated from the posterior 
samples for haplotype RFs, and indicated lack of fit only 
for the empirical Bayes method (Fig. 1). Therefore, only 
the posterior distribution from the pseudo-Bayes method 
will be described further. Parameters for population pro- 
portions computed from the posterior sample (Fig. 2) in- 
clude the mean, mode, median, standard deviations, and 
equal-tail bounds of posterior intervals (Table 1). Condi- 
tional maximum likelihood estimates for the winter sam- 
ple were computed for comparison, along with bootstrap 
evaluation of their precision from 1000 resamplings. Cor- 
responding statistics of the bootstrap sample for the CML 
method are the means, standard errors, and 95% confi- 
dence bounds (Table 2). This CML analysis differs from 
that of Rosel et al. (1999) by using 1) the counts of all 
individual haplotypes instead of pooling to form subsets 
with larger counts, and 2) an alternate method for con- 
structing confidence bounds. Rosel et al. (1999) used the 
