Archived Talks
University of Pittsburgh Department of Biomedical Informatics Lecture Series
“Application of Signal Processing and Language Processing to Bioinformatics and Biomedical Informatics”
Speaker: Madhavi Ganapathiraju, PhD
Carnegie Mellon University
Friday, October 12, 2007
11:00 AM – 12:00 NOON
Room M-184 VALE [?]
200 Meyran Avenue
Abstract: In this talk I will introduce example applications of signal and language processing in their “home areas”. Then I present how, mature algorithms from these areas, together with machine learning, can be applied to knowledge discovery and information extraction from biological sequences. As examples, two unrelated applications of this approach in bioinformatics will be presented (1) genome signature discovery and (2) transmembrane segment prediction in protein sequences.
I will describe the (http://www.cs.cmu.edu/~blmt/source/) Biological Language Modeling Toolkit that I developed for efficient processing of genome/proteome sequence data. “N-gram” comparisons across genomes using this toolkit showed that biological sequence language differs from organism to organism, and has resulted in identification of genome signatures. Next, we addressed the question of what forms a “word” or a “vocabulary” in protein sequence language. Conclusions from this study, in combination with a text-summarization approach and wavelet signal processing technique were applied to study a biologically important but technically challenging and unsolved problem of predicting transmembrane segments. I have used my background in a novel way to develop an algorithm, called(http://linzer.blm.cs.cmu.edu/tmpro/) TMpro, which reduced error rate of prediction by 30-50% compared to previous state-of-the-art methods. The protein sequence analysis method, that of decomposing a primary sequence into its property sequence for analysis, was also novel: following our work, it has already been applied towards protein remote homology detection and protein classifications by others.
If time permits, I will discuss examples in Biomedical Informatics, which can be studied using signal, language and informatics approaches.
For more information: jxc3@pitt.edu or 412.647.7113