154 
Fishery Bulletin 99(1 ) 
If a prior draw of p ~ D(a 1 ,a 2 , . . . , a c ) (the notation “x ~ f" 
means “x is distributed as the probability density or prob- 
ability function f”) was obtained for the stock proportions 
of a stock mixture, and then a stock-mixture sample of size 
M was drawn such that the individuals could be correctly 
identified to stock origin, their counts, Z=(z v z 2 , . . . , z c ), 
would have a conditional multinomial distribution, 
M' 
7t(Z | p,M) = - Pi Pt— Pc, 
y I? I . . . y i 
*l'*2 m *c m 
or Z\p ~ MultiM , p ). The posterior for p, given Z, would 
be the Dirichlet (computational convenience), p | Z ~D(z 1 + 
a v . . . , z c + a c ). Notice that the prior parameters enter 
the posterior density in parallel with the sample counts 
and therefore could be viewed as counts obtained before 
the stock mixture was sampled (additional data) (sec. 3.5, 
Gelman et al., 1995). In fact, the mixture individuals are 
identified to stock origin (with unavoidable random error) 
during each cycle of the data augmentation algorithm 
later when samples are generated from the posterior. With 
the stock origins identified at a cycle, the uncertainty in 
p is described by the Dirichlet posterior with parameters 
equal to the sums of stock counts and prior parameters 
(z, +a-). 
With equal values summing to 1 assigned to its param- 
eters or “prior counts,” cq = a 2 - . . . = a c - c -1 , the Dirichlet 
prior meets our initial requirements. Specifically, the den- 
sity is defined over the stock composition simplex, and the 
additional data, which is neutral in the sense of favoring 
equal stock proportions (mean stock proportions are c _/ ), 
would be equivalent to adding just a single individual to 
the stock-mixture sample. Means, variances, and covari- 
ances (substitute z i + c ] for a, in Eq. 3) of the resulting 
posterior distribution of p | Z, D(z l + c -1 , . . . , z c + c _1 ), ap- 
proximate closely with increase of stock-mixture sample 
size, the observed stock proportions, their estimated vari- 
ances, and their estimated covariances, respectively, from 
standard frequentist analysis of the multinomial sample, 
Z. Therefore, given the stock assignments of the mixture 
individuals, the posterior distribution for p will be a rea- 
sonable description of its uncertainty for both Bayesian 
and frequentist statisticians. 
Prior for genetic parameters given baseline samples, 
n(Q\Y) 
The genetic compositions — haplotype or multilocus geno- 
type RFs — of the separate stocks are determined by their 
RFs of haplotypes, alleles, or genotypes, Q An estimate of 
Q from the baseline samples must be used in place of the 
unknown Q to estimate the stock genetic compositions. 
When baseline samples are large, the observed and unbi- 
ased value of Q, together with measures of precision, may 
be sufficient to anchor the stock-mixture analysis. Com- 
monly, baseline sample sizes are more limited and some 
tradeoff between bias and precision (sec 1.4.2 of Carlin 
and Louis, 1996; Bishop et al., 1975) in estimation of Q 
may well be advisable. The essential idea is to shrink the 
observed RFs of HAGs for individual stocks toward cen- 
tral values that are more reliably determined and are 
consistent with the genetic similarity of the stocks. An 
informative Bayes prior distribution for these unknown 
genetic parameters underlying the stock-mixture sample 
can be derived from the baseline samples and would pro- 
vide for such shrinkage. The statistical modeling begins 
with the allele RFs at a single locus but applies equally 
to haplotypes, alleles, or genotypes. Later, the modeling is 
extended to cover multiple loci. 
The Bayesian scenario begins with an imaginary experi- 
ment in which the RFs of the T distinct alleles for a single 
locus are drawn for each of the c baseline stocks (sec. 3.7 of 
Lange, 1997). Denote the resulting unobserved RFs for the 
ith stock by q=(q iv q i2 ,-.., q iT ). The draws from the stocks 
are independent and from a common Dirichlet probability 
density, which is the Bayes prior for baseline sampling, n(q l ) 
=Z)(/3 j,j3 2 , . . . , P T ). The justification for the Dirichlet prior 
for baseline sampling parallels that for the earlier stock- 
mixture composition prior, n (p), that is, computational con- 
venience and its simple interpretation as additional data. 
Next, baseline samples of n 1 ,n 2 ,...,n c alleles of the locus 
are available from the c stocks. The counts of the different 
alleles — y.=(y . ^ i2 , . . . ,y lT ) for the ith stock — have the mul- 
tinomial distribution, Multin •, q : ), and therefore the base- 
line posterior for the unknown allele RFs in each stock is 
also a Dirichlet distribution, q,\y t ~ D(P 1 +y iV P 2 +y l2 , ... , 
/f T +y (T ). The posterior mean of q,\y, can be written as 
a weighted average of the observed and prior mean RFs 
(Bishop et al., 1975; Sutherland et al., 1975), 
E(q„ I A 37 ) = (y„ + p , )/(«, +/?.) = 
n, + p. 
Ll 
n, 
P • [A 
n,+p.\ p. j 
t = 1,2,.. ,,T, 
(4) 
where the observed RF is y (> /rq, its prior mean is P,/ p., 
and p. = Y.Pr If the baseline sample is missing ( n=0 ), the 
posterior mean equals the prior mean. Otherwise, the pos- 
terior mean ranges between the observed and prior mean 
RFs (as a function of /3. >0). Shrinkage from the usual esti- 
mator of q r the observed allele RFs, toward the prior mean 
increases with the prior “sample size,” p ., but so does bias 
in estimates of the allele RFs. Therefore, the magnitude of 
the prior parameters should be no larger than necessary 
to satisfactorily control estimation error. 
Although the choice of a Dirichlet baseline prior was 
partly for convenience, the resulting posterior density has 
good properties. The posterior mean is a reliable estima- 
tor for the unknown allele RFs: it is strongly consistent, 
becomes unbiased for large baseline sample size, and mod- 
erates the extremes of the usual estimates — the observed 
RFs — among baseline stocks. All posterior means for the 
allele RFs are positive, so that absence of an allele from 
a stock’s baseline sample implies it is only rare and was 
missed in sampling rather than it is nonexistent. 
The values for the baseline prior parameters, P V P 2 , . . . , 
P T , have been arbitrary. To complete the specification of the 
baseline posterior, which will serve as the stock-mixture 
