550 



Abstract — Molecular markers have 

 been demonstrated to be useful for 

 the estimation of stock mixture pro- 

 portions where the origin of indi- 

 viduals is determined from baseline 

 samples. Bayesian statistical meth- 

 ods are widely recognized as provid- 

 ing a preferable strategy for such 

 analyses. In general, Bayesian esti- 

 mation is based on standard latent 

 class models using data augmenta- 

 tion through Markov chain Monte 

 Carlo techniques. In this study, we 

 introduce a novel approach based on 

 recent developments in the estimation 

 of genetic population structure. Our 

 strategy combines analytical integra- 

 tion with stochastic optimization to 

 identify stock mixtures. An important 

 enhancement over previous methods 

 is the possibility of appropriately han- 

 dling data where only partial baseline 

 sample information is available. We 

 address the potential use of nonmo- 

 lecular, auxiliary biological informa- 

 tion in our Bayesian model. 



A Bayesian method for identification of 

 stock mixtures from molecular marker data 



Jukka Corander (contact author) 



Pekka Marttinen 



Samu Mantyniemi 



Department of Mathematics and Statistics 



P,0 Box 68 



Fin-00014 



University of Helsinki 



Helsinki, Finland 



Email address for J, Corander; lukka coranderig'helslnkifi 



Manuscript submitted 13 February 2005 

 to the Scientific Editor's Office. 



Manuscript approved for publication 



14 December 2005 by the Scientific Editor. 



Fish. Bull. 104:550-558 (2006). 



Stock mixture analysis using multi- 

 locus genotypes of fish is recognized 

 as a versatile tool in fisheries man- 

 agement. The efficiency of combining 

 polymorphic molecular markers, such 

 as microsatellites, with a model-based 

 approach to estimate stock mixtures, 

 has been clearly demonstrated in the 

 literature (Kalinowski, 2004; Reyn- 

 olds and Templin, 2004). Since the 

 beginning of the 21*^' century, Baye- 

 sian methods have largely replaced 

 the earlier applied maximum likeli- 

 hood approach based on latent class 

 mixture models (Pella and Masuda, 

 2001). A similar trend has been true 

 for the estimation of genetic popula- 

 tion structure in general (e.g., Pritch- 

 ard et al., 2000; Corander et al., 2003, 

 2004; Beaumont and Rannala, 2004). 

 For an earlier approach to mixture 

 analysis with incomplete information 

 about source populations, see Smouse 

 et al. (1990). 



Bayesian methods for estimation 

 of stock mixtures has generally been 

 based on exploitation of data augmen- 

 tation through Markov chain Monte 

 Carlo (MCMC), where latent origins 

 of caught individuals and values of 

 the other model parameters are suc- 

 cessively simulated from the corre- 

 sponding posterior distributions. Such 

 an approach is capable of avoiding 

 certain estimation problems caused 

 by missing data and rare alleles, 

 which severely affect the maximum 

 likelihood method. However, because 

 of numerical deficiencies, there are 

 situations where the MCMC based 

 method for the latent class mixture 

 model may easily fail to provide ap- 



propriate estimates. First, in the 

 presence of very small groups of in- 

 dividuals representing some stock 

 sources, the posterior distribution of 

 the origins for these particular indi- 

 viduals and the corresponding poste- 

 riors of the source allele frequencies 

 will typically comprise a high level of 

 uncertainty. Consequently, the result- 

 ing MCMC simulation error in the 

 estimates may be considerable. Sec- 

 ond, when there are baseline samples 

 available only for a subset of potential 

 stock sources, estimation of origins is 

 not feasible (Pella and Masuda, 2001). 

 Use of the standard approach with a 

 fixed number of sources, based on the 

 available baseline samples, may easily 

 lead to spurious estimates when there 

 are individuals representing several 

 additional sources in the data. Simi- 

 larly, it is difficult to detect outlier 

 individuals with the latent class ap- 

 proach with a fixed number of sources 

 (Pritchard et al., 2000) because they 

 are unlikely to be identified in the 

 MCMC simulation for data sets of 

 moderate to large size. Third, under 

 partial baseline information, it is dif- 

 ficult to appropriately infer a suitable 

 number of stock sources to represent 

 a particular data set. 



Partition-based Bayesian alterna- 

 tives to latent class models for iden- 

 tification of genetic mixtures with- 

 out baseline samples have recently 

 been introduced (Dawson and Belkhir 

 2001; Corander et al., 2003, 2004). 

 Corander et al. (2003, 2004) used an 

 analytical integration strategy com- 

 bined with stochastic search meth- 

 ods to make Bayesian estimation 



