|
CCMB Seminar Series 2007-2008
_______________________________________________________ Events
CCMB Seminar Series |
Richard Watson
Natural Systems Group
Southampton University
Compositional Evolution:
Alternative algorithms underlying natural evolutionary adaptation |
|
Abstract:
Darwin's theory of evolution can be understood as a simple algorithm, a formal step-by-step
procedure or mechanism, that produces adaptation in biological systems. Computer
science studies a broad range of algorithms and seeks to understand which algorithms
work best for which types of problems. Is the algorithm that Darwin described the
only algorithm relevant to natural evolution, and if there are others, can they
solve adaptive problems that are not solvable with Darwin's model?
It is well known
that there have been numerous events of horizontal gene transfer, important cases
of genetic encapsulation and symbiogenesis, and occasional 'major evolutionary transitions'
where "entities that were capable of independent replication before the transition
can replicate only as part of a larger whole after the transition" (Maynard Smith
and Szathmary). It has been suggested before that these events present a challenge
to Darwinian gradualism because the results of these events are not small genetic
changes, but this somewhat misses the important point. What is algorithmically interesting
about these phenomena is not that the genetic changes are large rather than small,
but that the genetics involve the union of one genetic lineage with another lineage
evolved in parallel rather than the sequential modification of a single lineage
that Darwin described. Using algorithmic concepts we can understand why it makes
a difference to evolve things in parallel and then bring them together, rather than
evolve things sequentially in a single lineage. In computational terms the difference
couldn't be more fundamental: Darwinian linear incremental improvement is analogous
to 'stochastic local search', a very basic form of optimisation, but 'compositional'
mechanisms, as I refer to them, are analogous to 'divide and conquer' optimisation,
a fundamentally different class of algorithm based on problem decomposition.
Wednesday, October 31st, 2007
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom
Hosted by: Daniel Weinreich
CCMB Seminar Series |
Daniel Weinreich, Ph.D
Ecology and Evolutionary Biology
Center for Computational Molecular Biology
Brown University
The Combinatorics of Molecular Evolution |
|
Abstract:
The number of mutational trajectories between ancestral and derived sequences grows
doubly-exponentially with gene length. I hypothesize however that the formal properties
of fitness landscapes impose constraints on the subset of trajectories that may
simultaneously be selectively accessible, even absent any biological data. The permutahedron
is a discrete mathematical object that represents the space of all possible mutation
trajectories and thus provides a framework in which to explore these constraints
and the properties that unite selectively accessible trajectories.
Wednesday, September 26th, 2007
4.00pm, CIT Bldg, Room 241 SWIG Boardroom
CCMB
Seminar Series |
Ward
Wheeler, Ph.D.
American Museum of Natural History
New York, NY
Kolmogorov Complexity,
Phylogenetic Analysis,
and the Unity of Systematic Methods" |
|
Abstract:
Ideas of computational complexity are applied to
phylogenetic analysis which yield a new, more
fundamental optimality criterion. This criterion,
Minimum Descriptive Length, has far reaching
implications for systematics and the context
in which we view other optimality criteria such
as parsimony, likelihood, and posterior probability.
Wednesday, December 5th, 2007
4:00 p.m.
CIT Bldg, Room 241 - SWIG Boardroom
Host: David Rand
CCMB
Seminar Series |
Zhijin Wu, Ph.D.
Center for Statistical Sciences
Center for Computational Molecular Biology
Brown University
Expanding the Dynamic
Range of Gene Expression Measures on DNA Microarrays |
|
Abstract: The
DNA microarray technology has been widely used
in measuring gene expression levels in biomedical
research. The potential ability of monitoring tens
of thousands of genes simultaneously makes the
microarray approach an efficient tool. However,
the raw measurements are noisy. The data is a combination
of specific binding and background noise. A considerable
proportion of genes appear constantly expressed
because the variation in signal is small and seemingly
within the variation of background noise alone.
The dynamic range of the gene expression level
is thus limited to highly expressed genes. This
is a crucial limit because often more than half
of the genes in an experiment do not reach the
level that differential expression can be reliably
detected. Our previous work has shown that probes
have sequence specific background levels and probe
specific background adjustment can improve the
dynamic range. Since large number of observations
on the same probe is usually not available in a
single study, information is borrowed across probes
within each sample to estimate the probe specific
background. The accumulation of microarray
data in public data depository has enabled us to
approach the problem from a different angle. With
a large number of samples, we now observe data
on the same probe across a large variety of experimental
conditions. We start with a database of human gene
expression arrays of 500 samples. Based on the
expression profile on each gene across experiments,
we evaluate each probe's tendency to non-specific
binding and ability to track target expression
variation. We demonstrate further extension of
the dynamic range for gene expression measures.
Wednesday,
January 23rd, 2008
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom
CCMB
Seminar Series |
Nathan Edwards
Georgetown University, Washington DC
Improving the Reliability
of Peptide Identification by Tandem Mass Spectromety
for Clinical Proteomics and Genome Association |
|
Abstract: Peptide
identification by tandem mass spectrometry is the
dominant proteomics workflow for protein characterization
in complex samples. Coupled with well established
high-throughput proteomics workflows, tandem mass
spectrometry search engines make identifying the
major constituent proteins in clinical samples
straightforward. Driven by increasingly sensitive
protein chemistry protocols and mass spectrometers,
and a new perspective on the importance of alternative
splicing and coding SNP protein isoforms, however,
the shortcomings of the existing tools are becoming
more and more apparent.
We use a variety of
computational techniques to improve the reliability
of peptide identification analyses, as we seek
to address the limitations of current tandem mass
spectrometry search tools. First, we aggressively
enumerate an inclusive set of potential peptide
sequences from transcript evidence, particularly
ESTs, to ensure that evidence of novel, unexpected,
or unannotated protein isoforms is not missed by
tandem mass spectrometry search engines. We use
a novel compression technique to ensure that the
resulting sequence database can be searched quickly
and easily using existing tools, and demonstrate
that novel peptides, representing coding SNPs,
alternative splicing, and novel mutations, can
be observed in publicly available datasets. Second,
we apply hidden Markov models to spectral matching
of tandem mass spectra of previously identified
peptides, improving on the sensitivity and specificity
of peptide identification by sequence database
search engines and traditional spectral matching
techniques. Lastly, we post-process search engine
peptide identification results using an unsupervised,
model-free, result-combining machine-learning approach
that achieves superior sensitivity and specificity
than either result combining or machine learning
alone. Using this technique on datasets derived
from standard protein mixtures, we demonstrate
that the performance of the commercial search engine
Mascot can be bested by combining the results of
two open-source search engines, X!Tandem and OMSSA;
but that using all three search engines is better
still.
Such a reliable, sensitive, and specific peptide
identification analysis platform has the potential
to not only explore a largely untapped source of
potential cancer biomarkers from clinical cancer
samples and cancer cell-lines, but also to inform
functional genomics and genome annotation. The characterization
of expressed proteins using tandem mass-spectrometry
provides direct evidence for the amino-acid sequence
of functional proteins and their isoforms, evidence
which is not available using other high-throughput
experimental techniques. We conclude with a discussion
of unconventional experimental workflows for peptide
identification and their potential to inform functional
genome annotation.
Wednesday,
March 5th, 2008
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom
CCMB
Seminar Series Lecture |
CCMB (Center
for Computational and Molectular Biology)
and MCB (Department of Molecular Biology,
Cell Biology and Biochemistry) present:
Colin Collins, Ph.D.
UCSF Cancer Center
University of California, San Francisco
The Promise and
Challenge of Translational Oncogenomics |
|
Abstract: Dr.
Collins will present on the use of array comparative
genomic hybridization (aCGH) and end sequence profiling
(ESP) for the development of novel biomarkers and
therapeutic targets for cancer. Prostate
cancer is amongst the prevalent cancers in the
Western world and is increasing in incidence. PSA
screening for prostate cancer has resulted in stage
migration so that increasingly tumors are detected
at an earlier stage. Nonetheless the percentage
of men diagnosed with tumors at an intermediate
risk of progressing to metastasis has remained
relatively constant at ~ 25% and there is little
change in the outcome statistics for this group.
Therefore, it is critical that biomarkers be developed
that can identify men in this group who can safely
delay or avoid definitive therapy, alleviating
the problem of over treatment. Dr. Collins laboratory
has employed unique tumor cohorts and aCGH to identify
genome based biomarkers that may be capable of
dichotomizing this group of patients. To refine
and advance this assay to the clinic whole genome
amplification methods are being developed to enable
analysis of tumor biopsies on Agilent oligonucleotide
arrays. Dr. Collins will review progress and discuss
the challenges of this type of research. A problem
with array-based analyses is that it is very difficult
for them to detect tumor specific biomarkers and
drug targets such as the BCR-ABL fusion in CML.
In addition, they are blind to heterogeneity. To
overcome these limitations Drs. Collins and Volik
invented ESP with the explicit goals of determining
the structural organization of tumor genomes and
transcriptomes. Progress made in the application
of ESP to breast cancer cell lines and multiple
tumor types will be presented. Specifically, data
will be presented showing the structural organization
of tumor genomes, the frequency and spectrum of
mutations, molecular heterogeneity, validation
of genome breakpoints, and detection of tumor-specific
fusion genes and transcripts. In addition, ideas
will be explored for a genome project based on
ESP, and advancement of ESP to the clinic using
next generation sequencing technologies.
Wednesday,
February 13th, 2008
4.00pm, Sidney Frank Hall, Room #220
CCMB
Seminar Series Lecture
|
Nicholas
Eriksson, Ph.D.
University of Chicago
Department of Statistics
Faculty Search Candidate
Center for Computational Molecular Biology
Combinatorial Methods
in Evolutionary Biology |
|
Abstract: I'll
talk about three areas of evolutionary biology
using a combination of statistics and discrete
math: viral population diversity, the evolution
of drug resistance, and phylogenetics.
Knowledge of the diversity of viral populations
is important for understanding disease progression,
vaccine design, and drug resistance, yet it is
poorly understood. New technologies (pyrosequencing)
allow us to read short, error-prone DNA sequences
from an entire population at once. I will show
how to assemble the reads into genomes using graph
theory, allowing us to determine the population
structure.
Next,
I will describe a new class of graphical models
inspired by poset theory that describe the accumulation
of (genetic) events with constraints on the order
of occurrence. Applications of these models include
calculating the risk of drug resistance in HIV
and understanding cancer progression.
Finally, I'll describe a polyhedral
method for determining the sensitivity of phylogenetic
algorithms to changes in the parameters. We will
analyze several datasets where small changes in
parameters lead to completely different trees and
see how discrete geometry can be used to average
out the uncertainty in parameter choice.
Monday, March
3rd, 2008
4.00pm
CIT Building ~ Room 241 ~ SWIG Boardroom
CCMB
Seminar Series Lecture |
Antonio
Piccolboni
Quantcast, San Francisco, CA
"Multivariate
segmentation in the analysis of transcription
tiling array data" |
|
Abstract: Tiling
DNA microarrays extend current microarray technology
by probing the non-repeat portion of a genome at
regular intervals in an unbiased fashion. A fundamental
problem in the analysis of these data is the detection
of genomic regions that are differently transcribed
across multiple conditions. We propose a linear
time algorithm based on segmentation techniques
and linear modeling that can work at a user-selected
false discovery rate. It also attains a four-fold
sensitivity gain over the only competing algorithm
when applied to a whole genome transcription data
set spanning the embryonic development of Drosophila
melanogaster.
Monday, March
10, 2008
3:00 p.m.
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 2:45 p.m.
To receive CCMB seminar
announcements by email, sign up for the computational
biology mailing list by sending email to listserv@listserv.brown.edu with
the message body "subscribe computational-biology".
CCMB
Seminar Series Lecture |
Daniel
Weinreich, Ph.D.
Ecology and Evolutionary Biology
Center for Computational Molecular Biology
Brown University
"Predicting
Evolutionary Trajectories in Sexual Populations" |
|
Abstract: Epistasis
means that the functional consequence of mutations
varies with genetic background, and the evolutionary
consequences of epistasis are profound in asexual
populations because a novel mutation’s fate
is then determined by its fitness effect on only
one of many alternative genetic backgrounds. This
possibility motivates interest in Sewall Wright’s
adaptive landscape, the projection of genotypic
fitness values over a discrete, multidimensional
nucleotide sequence space. In this framework,
populations follow a temporal succession of individual
points through this space determined by the interplay
between mutational pressure, the local selective
gradient defined by the landscape, and stochastic
loss of novel genotypes. Several recent empirical
characterizations of small regions of this landscape
have demonstrated that in asexual populations functional
epistasis i) sharply limits the number
of mutational trajectories to high-fitness genotypes
that are selectively accessible and ii)
gives rise to a very sharp non-uniform probability
distribution among selectively accessible trajectories. To
date however, the absence of an analogous formal
framework in which to characterize selectively
accessible recombinational trajectories has limited
understanding of this problem in sexual populations. I
will describe a novel definition of the adaptive
landscape appropriate for this problem: the vector
field reflecting the joint pressures of mutation,
selection and recombination over a continuous multidimensional
space that represents both allele frequencies and
linkage disequilibrium among alleles. Populations
are again regarded as occupying a temporal succession
of points in the underlying space, and I will illustrate
the potential of this approach by describing how
recombination influences a population’s evolutionary
trajectory single- and multi-peaked fitness models.
Wednesday,
March 19th, 2008
4:00 p.m.
CIT Bldg, Room 241, SWIG Boardroom
CCMB
Seminar Lecture Series |
Gad
Kimmel
University of California, Berkeley
Faculty Candidate
Center for Computational Molecular Biology
"Computational
Problems in Human Genetics" |
|
Abstract: The
question how genetic variation and personal health
are linked is one of the compelling puzzles facing
scientists today. The ultimate goal is to
exploit human variability to find genetic causes
for multi-factorial diseases such as cancer and
coronary heart disease. Recent technology improvement
enables the typing of millions of single nucleotide
polymorphisms (SNPs) for a large number of individuals. Consequently,
there is a great need for efficient and accurate
computational tools for rigorous and powerful analysis
of these data. In my talk I am going to concentrate
on two computational problems, which are an essential
step in studying the data obtained by this technology:
Accurate and efficient significance testing with
a correction for population stratification and
estimating local ancestries in admixed populations.
Wednesday,
March 31, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Ben Raphael
CCMB
Seminar Lecture Series |
Maria
Pilar Francino, Ph.D.
DOE Joint Genome Institute
Faculty Candidate
Center for Computational Molecular Biology
"Comparative
Analyses of the Distribution of Promoter Motifs
across Bacterial Genomes" |
|
Abstract:
Because binding of RNA polymerase (RNAP) to misplaced
sites could compromise the efficiency of transcription,
natural selection for the optimization of gene
expression should regulate the distribution of
DNA motifs capable of RNAP-binding across the genome. We
have analyzed the distribution of –10 promoter
motifs recognized by the s70 subunit of RNAP in
42 bacterial genomes. We show that selection
on these motifs operates across the genome, maintaining
an over-representation of –10 motifs in regulatory
sequences while eliminating them from the nonfunctional
and, in most cases, from the protein coding regions. In
some genomes, however, –10 sites are over-represented
in the coding sequences; these sites could induce
pauses effecting regulatory roles throughout the
length of a transcriptional unit. For nonfunctional
sequences, the extent of motif under-representation
varies across genomes in a manner that broadly
correlates with the number of tRNA genes, a good
indicator of translational speed and growth rate. This
suggests that minimizing the time invested in gene
transcription is an important selective pressure
against spurious binding. However, selection
against spurious binding is also detectable in
the reduced genomes of host-restricted bacteria
that grow at slow rates, indicating that components
of efficiency other than speed may also be important. Minimizing
the number of RNAP molecules per cell required
for transcription, and the corresponding energetic
expense, may be most relevant in slow growers.
These results indicate that genome-level properties
affecting the efficiency of transcription and translation
can respond in an integrated manner to optimize gene
expression. The detection of selection against
promoter motifs in nonfunctional regions also indicates
that no sequence may evolve free of selective constraints,
at least in the relatively small and unstructured
genomes of bacteria.
Wednesday,
April 2, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Daniel Weinreich
CCMB
Seminar Series Lecture |
Bjarni
Halldorsson
deCODE genetics
Reykjavik University
"Detecting
genomic copy number variants using the
Ilumina platform" |
|
Abstract: Apart
from a relatively small number of variants the
genomes of two individuals are identical. These
small differences explain much the human diversity. The
most common form of variation between individuals
are single nucleotide polymorphisms (SNPs), representing
a single letter change in an otherwise conserved
sequence. In the past couple of years, tremendous
progress has been made in identifying the genetic
causes of a number of disease and phenotypic traits
using arrays, such as those made by Illumina, that
simultaneously assay a large number of SNPs.
Copy number variations (CNVs) occur when a segment
of the genome is either copied or deleted so that
individuals have different number of copies of
that variant. Although copy number variations
are much less common than SNPs they have the potential
to have a much greater impact on an individual,
since a functional element of the genome may be
missing or occur more often than desired.
We use the Illumina arrays to detect copy number
variations and designed an array of SNP and univariant
probes in the genome. We have assayed a large number
of individuals for these variations. In this talk
we consider the problem of translating these assay
results into a determination of the number of copies
an individual has of a copy number variation.
Wednesday,
April 16th, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm
CCMB
Seminar Series |
Yves
Moreau
SymBioSys Center for Computational Systems
Biology
University of Leuven (ESAT-SCD), Belgium
"Candidate
gene prioritization by genomic data fusion" |
|
Abstract: One
of the main challenges of systems biology is to
cope with the overwhelming amount and diversity
of omics data and a key problem is that of 'candidate
gene prioritization' - i.e., selecting among a
large list of candidate genes those that are most
promising for further biological validation. We
present ENDEAVOUR, a generic computational strategy
to prioritize candidate genes based on their similarity
(across multiple types of data, including sequence,
expression, literature, annotation, etc.) to a
set of genes already implicated in the process
under scrutiny. We first validate the overall performance
through a statistical cross-validation of 29 diseases
and 3 biological pathways. Next, we validate a
novel candidate for DiGeorge syndrome in a zebrafish
model. Finally, we present an alternative machine
learning strategy for gene prioritization using
kernel methods. The key advantage of kernel methods
in this context is that they provide an elegant
framework for the fusion of data - by relying only
on positive semi-definite kernel similarity matrices
for the representation of heterogeneous data sources.
Kernel-based novelty detection outperforms our
previous method on our disease gene benchmark.
Bio: My
research interest falls in the broad field of bioinformatics
and, more specifically into what I call Computational
Systems Biomedicine, which is the application of
computational methods in Systems Biology towards
the understanding and modulation of developmental
and pathological processes relevant to human health.
The area of application in which I am currently
most active is diagnosis and gene discovery in
congenital disorders.
Tuesday, July
29th, 2008
11:00 a.m.
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles E. Lawrence
Refreshments will be served at 10:45 a.m.
Dr. Moreau will be
on campus Tuesday, July 29th. Individuals interested
in meeting privately with him are encouraged to
contact Louise Patterson at Louise_Patterson@Brown.edu or
863-3178.
_______________________________________________________ Events
|