NC State University

Department of Computer Science Colloquia 2001-2002

** Cross-listing a speaker from the Bioinformatics Seminar Series **

Date:  Tuesday, November  6, 2001
Time: 12:00 Noon (talk)
Place: Toxicology Conf. Room 304, Toxicology Building, NCSU Centenial Campus (click for courtesy parking request)

Speaker: Robert Mau, University of Wisconsin-Madison

Strategy for multiple whole genome alignment of several E. Coli and Y. Pestis Strains

Abstract: We present a time-and-space efficient algorithm that aligns large sub-intervals of DNA sequence of closely related organism into segments we call backbone. Backbone represents sequence that can be putatively identified with the genome of the most recent common ancestor of the aligned organisms.

The method is ideally suited for aligning a group of genomes possessing large regions of high sequence similarity, punctuated with numerous instances of horizontally transferred sequence. Not surprisingly, this fits the profile of Enterobacteriaceae, the primary focus of our sequencing effort. Our methodology automatically identifies and places large relative inversions, translocations, and inverted translocations. A recent implementation has successfully aligned four strains of E.coli: K-12 MG1655, K-12 W3110, O157:H7 EDL933, O157:H7 Sakai(Enterohaemorraghic PEC), and CFT073 (Uropathogenic EC). The prototype software finished in under 5 CPU minutes and used less than 7 Megs of memory. Key is a novel representation of the genomic match coordinates that partitions maximal matches into disjoint groups called Canonical Maximal Exact Match(MEM) Equivalence Classes.

We sketch how to finish this "rough draft" into a full fledged multiple alignment.  Furthermore, we describe how some not-so-minor modifications will allow extension of the basic method to include more divergent genomes.

Short Bio: Robert Mau received his Ph.D. in Statistics from the University of Wisconsin - Madison in 1996.  He is currently an assistant scientist at the E. Coli Genome Project.  His research topics include: Whole Genome Alignment of Related Bacteria, Applications of modified Suffix Arrays, Phylogenetic Inference, and the application of Markov chain Monte Carlo theory.

Host:   J. Thorne, Department of Statistics, NCSU
 

Colloquia Home Page.