CCMB Distinguished Lectures Series 2008-2009
_______________________________________________________ Events
To receive CCMB seminar announcements by email,
sign up for the computational biology mailing list
by sending email to listserv@listserv.brown.edu with
the message body "subscribe computational-biology"
CCMB
Distinguished Lecture Series |
Daphne Koller
Stanford University
Gene Regulation and Individual Genetic Variation:
From Networks to Mechanism |
|
Gene expression data of genetically diverse individuals (eQTL data) provide a
unique perspective on the effect of genetic variation on cellular pathways,
and help identify both cellular mechanisms and polymorphisms with phenotypic
effect. However, the large number of possible hypotheses regarding
regulatory interactions makes it difficult to correctly determine true regulatory
relationships and causal polymorphisms. Intuitively, we have many cues for
selecting among the plausible hypotheses: we might favor polymorphisms that
are more conserved, that lead to biochemically significant amino acid change,
or that reside in genes involved in regulatory functions.
But how do we know how much weight to attribute to these different
characteristics? This talk describes a novel model, called Lirnet, for
identifying regulatory networks from eQTL data. Lirnet automatically learns
from eQTL data how to weight regulatory characteristics and induce a
regulatory potential for candidate sequence variations. Lirnet assesses these
weights simultaneously to learning a regulatory network, finding weights that
lead to a more predictive network. Lirnet can flexibly use any regulatory
features,including sequence features that are available for any sequenced
organism, and automatically learn their weights in a dataset-specific way.
This feature, combined with Lirnet's ability to learn the importance of these
features automatically, makes it especially advantageous for mammalian
systems, where many forms of prior knowledge used in simple model
organisms are incomplete or unavailable. We apply Lirnet to eQTL data in
yeast, mouse, and human, and provide statistical and biological results
demonstrating that Lirnet produces significantly better regulatory programs
than other recent approaches, and can also help identify specific causal
sequence variation within a large, linked chromosomal region. We also
present novel hypotheses suggested by Lirnet in both yeast and mouse. We
also describe ways in which Lirnet can be used to help elucidate the
relationships between genotype and phenotype.
Wednesday,
April 29, 2009
12:00 Noon
CIT Building, Room 241 – SWIG Boardroom
Refreshments will be served at 11:45 pm
CCMB
Distinguished Lecture Series |
Richard K. Wilson
Washington University, School of Medicine
Director, Genome Sequencing Center
Human Genome Sequencing: Disease and Discovery |
|
New technology recently has facilitated the complete sequencing of individual
human genomes. As the cost and efficiency of this approach continues to
improve, we can envision a powerful new means for the study of genes and
other genome elements and mechanisms that underlie cancer and other human
diseases. I will discuss some of the discoveries made to date with emerging
genome sequencing technologies, and how these methods will allow us to
better understand both basic biology and human disease.
Wednesday,
April 15, 2009
12:00 Noon
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Benjamin Raphael
Refreshments will be served at 11:45 pm
CCMB
Distinguished Lecture Series |
Isaac S. Kohane
Harvard Medical School
Chair, Informatics Program
Why Information Science and Information Technology in the Genomic Era is Central to the NIH’s 2009 "Stimulus" Efforts |
|
Large numbers of subjects are needed to obtain reproducible results relating disease characteristics to rare events or weak effects such as those measured for common genetic variants. Similarly large numbers are required to identify adverse events in currently marketed pharmaceuticals, identify new constellations of disease, and measure efficacy and quality in healthcare. Addressing the challenge of studying these large numbers will require use of information technology in ways that recognize the centrality of information processing at the heart of healthcare and biomedical research. This will be illustrated by reviewing our experience in three domains: a) genomic and pharmacovigilance studies of National Center for Biomedical Computing entitled “Informatics for Integrating Biology and the Bedside” (i2b2). b) mining the Internet for just-in-time public health intelligence (e.g. Healthmap.org) and c) using personal health records to allow patients greater autonomy in healthcare and greater participation and benefit from the research on their own data and biomaterials. These cases will illustrate why the recently announced NIH funding arising from the American Recovery & Reinvestment Act of 2009 has these technologies at their core.
Wednesday,
April 1, 2009
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm
CCMB
Distinguished Lecture Series |
Anders Krogh
University of Copenhagen
The Bioinformatics Centre
Department of Biology
Fast searching of DNA sequences with position weight matrices applied to next-generation sequencing data |
|
Several next-generation sequencing techniques produce very large numbers of short sequences (reads), which needs to be mapped to a genome. Most existing methods use fast indexing and match the reads with up to N mismatches, where N=2 is typical. However, often a sequence read comes with a quality score for each nucleotide, which can be translated to a probability of error. In this talk, I describe how such probabilities can be used directly in the search through position weight matrices, and a data structure that makes it feasible to use the weight matrices in the search. Finally, I show results on simulated data.
Wednesday,
March 18, 2009
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles
Lawrence
Refreshments will be served at 3:45 pm
CCMB
Distinguished Lecture Series |
David
Mathews
University of Rochester Medical Center
The Statistical
Mechanics of RNA Structure Prediction |
|
This talk will introduce the importance of RNA
and the use of partition functions to predict the
ensemble behavior of RNA structure formation. RNA
secondary structure is the set of canonical base
pairs (A-U, G-C, and G-U) in the structure. A set
of nearest neighbor parameters, derived from experiments,
exist for predicting the stability, as measured
by free energy change, of a given secondary structure.
The nearest neighbor parameters can be used in
conjunction with dynamic programming algorithms
to find the lowest free energy structure or the
probabilities of all possible pairs in the folding
ensemble.
I will discuss our recent work with predicting
structures that maximize expected accuracy, where
expected accuracy is defined as the sum of the
base pairing probabilities for pairs and the single-stranded
probabilities for unpaired nucleotides. Maximizing
expected accuracy improves the quality of structure
prediction.
I will also discuss our recent work with predicting
effective siRNA sequences using a full equilibrium
approach. An siRNA can silence the expression of
message RNA by hybridizing to the target and directing
cleavage via the RNA interference pathway. Here
we use the statistical mechanics of hybridization
to select sequences that are most likely to lead
to effective silencing of a given message.
Wednesday,
October 15th, 2008
4:00 pm
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles
Lawrence
Refreshments will be served at 3:45 pm
CCMB
Distinguished Lecture Series |
Sean
Eddy
Janelia Farm
Howard Hughes Medical Institute
HMMER: a new generation
of homology search software |
|
Abstract: Database
homology searching may be the most important application
in computational molecular biology, and since the
1990s, BLAST has been our main workhorse. Since
BLAST's introduction, theoretical advances have
been made in applying full probabilistic inference
to homology searches by using hidden Markov model
(HMM) approaches. These methods have been deployed
in some important niches, notably in protein domain
analysis (as in the Pfam and SMART databases).
More general adoption has been limited by the fact
that the popular HMM implementations (including
my HMMER software) are slow; they use dynamic programming
algorithms without heuristic acceleration, which
results in running times comparable to Smith/Waterman
as opposed to BLAST. I will describe progress on
HMMER3, a new generation of HMMER that aims to
more fully deploy probabilistic inference technology
on homology searches, while at the same time attaining
BLAST's speed. I will describe HMMER3's statistical
inference framework, its probabilistic model of
local sequence alignment, new statistical theory
for log-likelihood ratio scores summed over all
alignments that extends Karlin/Altschul theory
for optimal alignment scores, and an implementation
of HMMER3's core algorithms that has accelerated
HMMER3 200-fold relative to HMMER2. HMMER3's prototypes
are currently faster than WU-BLAST, while being
more sensitive than HMMER2.
Wednesday,
October 1st, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
|