GENOMIC SEQUENCING 
George M. Church, Ph.D., Assistant Investigator 
Several laboratories are sequencing small ge- 
nomes (1-15 Mbp) from each phylogenetic king- 
dom. Comparisons of these sequences will define 
consensuses for most classes of protein domains, 
evolutionary conservation and evolutionary change. 
Up to 20-fold higher substitution rates in noncon- 
served compared with coding nucleotides allow the 
discrimination of random open reading frames from 
those encoding proteins that confer a selective 
edge. Operon and regulon gene organization can 
reveal physiological relationships and hence con- 
tribute clues to possible functions for the newly 
discovered genes. The genome closest to comple- 
tion is Escherichia coli, with 20% of its 4.7 Mbp 
completed by 2,000 biologists. Toward the goal of 
completing the E. coli and Salmonella typhimur- 
ium genome sequences, over 1,700 films have been 
produced in the past year by methods described 
below (see also Science 240:185-188). About 10% 
have been digitally scanned and proofread using 
the sequencing reading and assembling software 
REPLICA. In collaboration with Drs. Ken Rudd 
(Food and Drug Administration) and Jim Ostell (Na- 
tional Library of Medicine) and co-workers, an E. 
coli genetic map/restriction map/DNA sequence 
database has been created. 
I. New DNA-sequencing Methods. 
In multiplex DNA sequencing, 480 sequencing 
reaction sets, each tagged with specific oligonucleo- 
tides, are run on a single gel in 12 pools of 40 and 
transferred to a membrane. Up to 75 such mem- 
branes are hybridized simultaneously. The resulting 
sequence images are digitized and sequence inter- 
pretations are superimposed on the enhanced two- 
dimensional images for editing. The computer pro- 
gram (REPLICA) uses internal standards from multi- 
plexing to establish lane alignment and lane-spe- 
cific reaction rules by discriminant analysis. Images 
with overlapping data can be viewed side by side to 
facilitate decision making, by integrating automatic 
base assignment routines with high-resolution 
image display and interactive multisequence align- 
ments. 
II. Specific Genomic Regions. 
Genomic regions of special interest in many cases 
may be easily included in large-scale projects. These 
regions have been selected by genetic complemen- 
tation, DNA-protein interactions, or subtractive hy- 
bridization. The largest contiguous sequence ob- 
tained by the above methods is 18 kbp (from 140 
kb of raw data) covering the Salmonella cobalamin 
biosynthesis operon originally selected by comple- 
mentation (in collaboration with Dr. John Roth). 
Regions of putative DNA-protein interactions have 
been cloned en masse by virtue of differential en- 
zyme cutting and sequenced. One site studied in 
detail appears to be responsive to pyrimidines. 
Those few regions conserved between two distant 
genomes or different between two closely related 
genomes may be selected out of the whole by a few 
rounds of differential hybridization. Work has 
begun to study the genomic sequence differences 
between pathogenic and nonpathogenic strains of 
E. coli and Shigella (in collaboration with Drs. 
Donald Straus and Fred Ausubel). 
Dr. Church is also Assistant Professor of Genetics 
at the Harvard Medical School. 
175 
