Abstracts; 

 Informatics 



Computational Support for the Mapping of Complex 

 Genomes 



Karl M. Sirotkin. Daniel S. Caugherty. David C. Tomey. and Frederic R. Fairfield 

 Center for Human Genome Studies. Group T-IO, Mail Stop K710. Los Alamos National 

 Laboratory, Los Alamos. NM 87545 

 (505) 667-7510. FTS 855-0479 



At Los Alamos National Laboratory (LAND, we are determining how to construct 

 contigs (contiguous segments of DNA) from fingerprint data and are making tools to 

 evaluate a number of methods to optimize the rate of contig extension and completion 

 of a map. From the fingerprint data generated at LANL, contigs are assembled by using 

 calculated overlap-probability values for pairs of cosmid clones. These overlap 

 probabilities are determined from a statistical model that incorporates the positions of 

 the fingerprint data on a gel and the reproducibility of this data. Extensions of these 

 results will include calculations of the likely structure of a contig (including clone 

 positions and partial restriction maps) and analysis of the position-dependence of the 

 chromosome- 16 fingerprint data. 



To evaluate mapping methods, we have created a set of modular programs that 

 detemiine the limitations on contig size when a particular mapping strategy is used. 

 A person using these programs can create a test genomic segment, generate clones from 

 this segment, extract fingerprint data from each clone, test various strategies for 

 determining pairwise overlap between clones, reassemble the genome from these 

 overlaps, display the resulting contigs, and evaluate the success of the reassembly. The 

 programs have been structured to evaluate many mapping strategies and to assemble 

 contigs from current data. To facilitate the use of these programs by others, both 

 program and data files are readable by the user, and the same symbolic parameter names 

 are used throughout the data and program files. In addition, we have designed the 

 program structure so that it is easy to tailor the installation for individual users. 



These modeling programs have been implemented in two phases. The first phase used 

 only exact fingerprint data: in the second phase, there were controlled levels of error in 

 the fingerprint data. By using exact data, four different strategies reassembled similar 

 percentages of a genome segment into contigs, although the exact contigs were 

 different. By combining information from all of these methods, the coverage of the 

 genome by contigs increased. With errors in the fragment lengths, it is not yet clear 

 which strategies for reassembly of fragments into contigs make better use of the data. 

 Small changes in the experimental errors seem to have large effects on contig 

 generation. With this project, we hope to optimize map construction rates by making 

 contigs, monitoring our progress, and circumventing potential problems with map 

 completion. 



104 



