Applications of Logic Programming and Parallel 

 Computation in Genetic Sequence Analysis 



Ross Overbeek, Ian Foster, and Steve Winker 



Mathematics and Computer Science Division, Argonne National Laboratory, 



Argonne, IL 60439 



(312) 972-7856, FTS 972-7856 



Several applications of logic programming and parallel computations to genome 

 research problems are being pursued in close collaboration with G. Church (Harvard 

 Medical School), C. Woese and G. Olsen (University of Illinois at Urbana), and M. 

 Liebman (Amoco Technology Co.). ( 1 ) For interpretation of sequencing gels, a program 

 has been developed that mimics human expert behavior and includes extensive 

 backtracking to explore sets of alternative interpretations. (2) For multiple sequence 

 alignment, a new algorithm has been devised. It implements several heuristic methods 

 to produce alignments of up to 500 sequences with lengths of 2000 nucleotides. With 

 the use of bilingual programming (logic programming with C routines that handle well- 

 defined computationally intensive subcomputations), prototype algorithms could be 

 rapidly tested and executed on multiprocessors. (3) For prediction of secondary 

 structure manifesting in RNAs, an approach based on covariance analysis has been 

 developed that heavily utilizes multiple sequence alignment. Current applications are 

 the explorations of possible interactions between 5S, I6S, and 23S RNAs. (4) For the 

 problem of constructing databases with very diverse data types, uses of logic 

 programming are being explored. To increase capacities to handle large data sets in 

 disk, usual logic programming environments have been extended by the addition of 

 predicates. The result is a prototyping environment that can manage the increasing 

 volume of sequence data. (5) For comparisons of relative cost and performance, 

 sequence similarity search algorithms are being compared on several commercially 

 produced multiprocessors. These include the shared memory MIMD (multiple 

 instruction, multiple data) machines, SIMD (single instruction, multiple data) machines 

 [similar to the Connection Machine™ (Thinking Machines, Inc.) and DAP™ (Active 

 Memory Technology)], and distributed memory machines. Logic programming 

 technology is also being utilized here to explore solutions that are portable over a wide 

 range of machine environments. 



103 



