Corander et al A Bayesian method for identification of stock mixtures from molecular marker data 



555 



categorized in this respect as neutral (no biasing effect; 

 Table 4, A and B), positive (strengthens the inference; 

 Table 4C), negative (biases the inference; Table 4, D 

 and E). 



Our results indicate that commonly occurring levels 

 (<5'*) of missing marker data do not inhibit the ability 

 of our method to detect the correct stocks, assuming 

 that the missing values are randomly distributed over 

 loci and individuals (Table 5). As an overall conclusion 

 from the simulations, it is clear that the genetic dis- 

 similarities of the stocks matter most for identification 

 performance. When baseline samples are available for 

 all stocks, most individuals can be correctly assigned 

 to their origin even when the genetic distance between 

 the stocks is negligible (such as between Tornionjoki 

 and lijoki rivers). Usefulness of the conditional poste- 

 rior probabilities for characterization of the allocation 

 uncertainty is exemplified in Table 6. 



The number of inferred putative stocks was in general 

 well in accordance with the underlying true number and 

 there was no tendency to overestimate k. However, when 

 the number of available marker loci was decreased to 

 five (Table 7), the probability of obtaining additional pu- 

 tative stocks was slightly increased. Because it is widely 

 known that the level of polymorphism of the markers 

 affects their usefulness in origin identification, it is 

 difficult to specify very clear boundaries with respect 

 to the amount of loci necessary for an acceptable per- 

 formance of any assignment method. It is important to 

 notice that an acceptable characterization of uncertainty 

 inherently depends on the real biological context in a 

 particular modeling situation. As a simple rule of thumb 

 for our method, we would suggest that N i<,Q might be 

 regarded as an insufficient value for reliable estimation. 

 However, when auxiliary information is available such 

 that the sample data can be grouped before analysis (as 

 in Table 8), the statistical power to detect correct ori- 

 gins and k increases considerably. This situation would 

 correspond to a geographical sampling scheme where 

 the individuals assigned to the same sampling unit are 

 caught simultaneously at a specific location. 



Discussion 



We have introduced a novel Bayesian method for an 

 investigation of stock mixtures using molecular marker 

 data by suitably modifying existing partition-based 

 Bayesian models for estimation of genetic population 

 structure. To enable smooth applicability, the imple- 

 mentation is made freely available in a user-friendly 

 software. One particular advantage of our method is the 

 possibility of appropriately analyzing data in a situation 

 where only partial baseline information is available for 

 the potential stocks. Use of an analytical integration 

 approach enhances considerably the numerical perfor- 

 mance when the stock mixture structure is challenging 

 (e.g., in the presence of small stocks for which no base- 

 line samples have been collected). 



Contrary to the earlier Bayesian methods introduced 

 in Corander et al. (2003, 2004), we have exploited a con- 

 siderably less computationally intensive strategy that 

 is based on stochastic optimization instead of MCMC 

 simulation. To obtain stable estimates for moderate to 

 large data sets, many long parallel MCMC chains would 

 be needed, but the process for obtaining these chains 

 often is not feasible under a single CPU architecture. 

 Our intelligent search strategy, instead of the random 

 search used in MCMC, seems to resolve this problem 

 very efficiently. A disadvantage of stochastic optimiza- 

 tion compared to optimization with MCMC is that a 

 statistically consistent estimate of the number of stocks 

 contributing to the sample cannot be derived. Neverthe- 

 less, our novel method has performed satisfactorily in 

 this respect under realistic sampling scenarios. We are 

 currently exploring possibilities for using intelligent 

 proposals in MCMC and an online-based parallel imple- 

 mentation of the method, both of which would provide 

 an ideal framework for biologists using molecular data 

 in stock mixture estimation. 



The most relevant biological assumptions used in our 

 approach are HWE and nonlinkage of the marker loci. 

 The latter assumption is generally valid, at least ap- 

 proximately, for the microsatellite markers often used in 



