Abstracts: 

 Informatics 



Identification of Genes in Anonymous DN A Sequences 



C. A. Fields and C. A. Soderlund 



Center for Advanced Computing in Molecular and Cellular Biology, Computing 



Research Laboratory, New Mexico State University, Las Cruces, NM 88003-0001 



(505)646-2848 



The objective of this project is the development of practical software to automate the 

 identification of genes in anonymous DN A sequences from the human and other higher 

 eukaryotic genomes. A prototype automated sequence analysis system, gm2, has been 

 implemented in the programming language of C for Unix version 4.2. This system 

 accepts for input; DNA sequences; consensus matrices for locating splice sites, 

 translational start sites, and polyadenylation sites: match-quality cutoffs for consensus 

 searches; and base frequency and codon usage standards for coding regions and introns. 

 It produces, as output, schematic models of the possible genes contained in the sequence 

 that show the locations of the coding sequences, introns, and control signals. Exten- 

 sively tested on sequences in the 10-kb size range containing known genes of up to 

 10 exons, gnil is capable of generating complete and correct analyses showing all 

 possible alternative splicing patterns. Run times for such analyses on a Sun 3/60 

 workstation range from less than 1 min to about 45 min and depend on the stringency 

 of the search parameters used. Current effort is focused on implementing procedures 

 for analyzing sequences that contain only partial genes and on implementing a more 

 efficient algorithm for first-pass analysis using low-stringency parameters. 



100 



