Brown University | Center for Computational Molecular Biology

Brown University

Center Home

The Center

Research Areas

Bioinformatics Tools

Courses

Events

Affiliated Programs

Executive Committee

Publications

Undergraduate Study

Graduate Study

Lecture Videos

News Archive

Open Positions

CCMB Distinguished Lectures Series 2008-2009

_______________________________________________________ Events

To receive CCMB seminar announcements by email, sign up for the computational biology mailing list by sending email to listserv@listserv.brown.edu with the message body "subscribe computational-biology"

CCMB Distinguished Lecture Series
Daphne Koller Stanford University Gene Regulation and Individual Genetic Variation: From Networks to Mechanism

Gene expression data of genetically diverse individuals (eQTL data) provide a unique perspective on the effect of genetic variation on cellular pathways, and help identify both cellular mechanisms and polymorphisms with phenotypic effect. However, the large number of possible hypotheses regarding regulatory interactions makes it difficult to correctly determine true regulatory relationships and causal polymorphisms. Intuitively, we have many cues for selecting among the plausible hypotheses: we might favor polymorphisms that are more conserved, that lead to biochemically significant amino acid change, or that reside in genes involved in regulatory functions. But how do we know how much weight to attribute to these different characteristics? This talk describes a novel model, called Lirnet, for identifying regulatory networks from eQTL data. Lirnet automatically learns from eQTL data how to weight regulatory characteristics and induce a regulatory potential for candidate sequence variations. Lirnet assesses these weights simultaneously to learning a regulatory network, finding weights that lead to a more predictive network. Lirnet can flexibly use any regulatory features,including sequence features that are available for any sequenced organism, and automatically learn their weights in a dataset-specific way. This feature, combined with Lirnet's ability to learn the importance of these features automatically, makes it especially advantageous for mammalian systems, where many forms of prior knowledge used in simple model organisms are incomplete or unavailable. We apply Lirnet to eQTL data in yeast, mouse, and human, and provide statistical and biological results demonstrating that Lirnet produces significantly better regulatory programs than other recent approaches, and can also help identify specific causal sequence variation within a large, linked chromosomal region. We also present novel hypotheses suggested by Lirnet in both yeast and mouse. We also describe ways in which Lirnet can be used to help elucidate the relationships between genotype and phenotype.

Wednesday, April 29, 2009
12:00 Noon
CIT Building, Room 241 – SWIG Boardroom

Refreshments will be served at 11:45 pm

CCMB Distinguished Lecture Series
Richard K. Wilson Washington University, School of Medicine Director, Genome Sequencing Center Human Genome Sequencing: Disease and Discovery

New technology recently has facilitated the complete sequencing of individual human genomes. As the cost and efficiency of this approach continues to improve, we can envision a powerful new means for the study of genes and other genome elements and mechanisms that underlie cancer and other human diseases. I will discuss some of the discoveries made to date with emerging genome sequencing technologies, and how these methods will allow us to better understand both basic biology and human disease.

Wednesday, April 15, 2009
12:00 Noon
CIT Building, Room 241 – SWIG Boardroom

Hosted by: Benjamin Raphael
Refreshments will be served at 11:45 pm

CCMB Distinguished Lecture Series
Isaac S. Kohane Harvard Medical School Chair, Informatics Program Why Information Science and Information Technology in the Genomic Era is Central to the NIH’s 2009 "Stimulus" Efforts

Large numbers of subjects are needed to obtain reproducible results relating disease characteristics to rare events or weak effects such as those measured for common genetic variants. Similarly large numbers are required to identify adverse events in currently marketed pharmaceuticals, identify new constellations of disease, and measure efficacy and quality in healthcare. Addressing the challenge of studying these large numbers will require use of information technology in ways that recognize the centrality of information processing at the heart of healthcare and biomedical research. This will be illustrated by reviewing our experience in three domains: a) genomic and pharmacovigilance studies of National Center for Biomedical Computing entitled “Informatics for Integrating Biology and the Bedside” (i2b2). b) mining the Internet for just-in-time public health intelligence (e.g. Healthmap.org) and c) using personal health records to allow patients greater autonomy in healthcare and greater participation and benefit from the research on their own data and biomaterials. These cases will illustrate why the recently announced NIH funding arising from the American Recovery & Reinvestment Act of 2009 has these technologies at their core.

Wednesday, April 1, 2009
4:00 pm
CIT Building, Room 241 – SWIG Boardroom

Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm

CCMB Distinguished Lecture Series
Anders Krogh University of Copenhagen The Bioinformatics Centre Department of Biology Fast searching of DNA sequences with position weight matrices applied to next-generation sequencing data

Several next-generation sequencing techniques produce very large numbers of short sequences (reads), which needs to be mapped to a genome. Most existing methods use fast indexing and match the reads with up to N mismatches, where N=2 is typical. However, often a sequence read comes with a quality score for each nucleotide, which can be translated to a probability of error. In this talk, I describe how such probabilities can be used directly in the search through position weight matrices, and a data structure that makes it feasible to use the weight matrices in the search. Finally, I show results on simulated data.

Wednesday, March 18, 2009
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Charles Lawrence
Refreshments will be served at 3:45 pm

CCMB Distinguished Lecture Series
David Mathews University of Rochester Medical Center The Statistical Mechanics of RNA Structure Prediction

This talk will introduce the importance of RNA and the use of partition functions to predict the ensemble behavior of RNA structure formation. RNA secondary structure is the set of canonical base pairs (A-U, G-C, and G-U) in the structure. A set of nearest neighbor parameters, derived from experiments, exist for predicting the stability, as measured by free energy change, of a given secondary structure. The nearest neighbor parameters can be used in conjunction with dynamic programming algorithms to find the lowest free energy structure or the probabilities of all possible pairs in the folding ensemble.

I will discuss our recent work with predicting structures that maximize expected accuracy, where expected accuracy is defined as the sum of the base pairing probabilities for pairs and the single-stranded probabilities for unpaired nucleotides. Maximizing expected accuracy improves the quality of structure prediction.

I will also discuss our recent work with predicting effective siRNA sequences using a full equilibrium approach. An siRNA can silence the expression of message RNA by hybridizing to the target and directing cleavage via the RNA interference pathway. Here we use the statistical mechanics of hybridization to select sequences that are most likely to lead to effective silencing of a given message.

Wednesday, October 15th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Charles Lawrence
Refreshments will be served at 3:45 pm

CCMB Distinguished Lecture Series
Sean Eddy Janelia Farm Howard Hughes Medical Institute HMMER: a new generation of homology search software

Abstract: Database homology searching may be the most important application in computational molecular biology, and since the 1990s, BLAST has been our main workhorse. Since BLAST's introduction, theoretical advances have been made in applying full probabilistic inference to homology searches by using hidden Markov model (HMM) approaches. These methods have been deployed in some important niches, notably in protein domain analysis (as in the Pfam and SMART databases). More general adoption has been limited by the fact that the popular HMM implementations (including my HMMER software) are slow; they use dynamic programming algorithms without heuristic acceleration, which results in running times comparable to Smith/Waterman as opposed to BLAST. I will describe progress on HMMER3, a new generation of HMMER that aims to more fully deploy probabilistic inference technology on homology searches, while at the same time attaining BLAST's speed. I will describe HMMER3's statistical inference framework, its probabilistic model of local sequence alignment, new statistical theory for log-likelihood ratio scores summed over all alignments that extends Karlin/Altschul theory for optimal alignment scores, and an implementation of HMMER3's core algorithms that has accelerated HMMER3 200-fold relative to HMMER2. HMMER3's prototypes are currently faster than WU-BLAST, while being more sensitive than HMMER2.

Wednesday, October 1st, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom