Brown University | Center for Computational Molecular Biology

Brown University

Center Home

The Center

Research Areas

Bioinformatics Tools

Courses

Events

Affiliated Programs

Executive Committee

Publications

Undergraduate Study

Graduate Study

Lecture Videos

News Archive

Open Positions

CCMB Seminar Series 2007-2008

_______________________________________________________ Events

CCMB Seminar Series
Richard Watson Natural Systems Group Southampton University Compositional Evolution: Alternative algorithms underlying natural evolutionary adaptation

Abstract:
Darwin's theory of evolution can be understood as a simple algorithm, a formal step-by-step procedure or mechanism, that produces adaptation in biological systems. Computer science studies a broad range of algorithms and seeks to understand which algorithms work best for which types of problems. Is the algorithm that Darwin described the only algorithm relevant to natural evolution, and if there are others, can they solve adaptive problems that are not solvable with Darwin's model?
It is well known that there have been numerous events of horizontal gene transfer, important cases of genetic encapsulation and symbiogenesis, and occasional 'major evolutionary transitions' where "entities that were capable of independent replication before the transition can replicate only as part of a larger whole after the transition" (Maynard Smith and Szathmary). It has been suggested before that these events present a challenge to Darwinian gradualism because the results of these events are not small genetic changes, but this somewhat misses the important point. What is algorithmically interesting about these phenomena is not that the genetic changes are large rather than small, but that the genetics involve the union of one genetic lineage with another lineage evolved in parallel rather than the sequential modification of a single lineage that Darwin described. Using algorithmic concepts we can understand why it makes a difference to evolve things in parallel and then bring them together, rather than evolve things sequentially in a single lineage. In computational terms the difference couldn't be more fundamental: Darwinian linear incremental improvement is analogous to 'stochastic local search', a very basic form of optimisation, but 'compositional' mechanisms, as I refer to them, are analogous to 'divide and conquer' optimisation, a fundamentally different class of algorithm based on problem decomposition.

Wednesday, October 31st, 2007
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom

Hosted by: Daniel Weinreich

CCMB Seminar Series
Daniel Weinreich, Ph.D Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University The Combinatorics of Molecular Evolution

Abstract:
The number of mutational trajectories between ancestral and derived sequences grows doubly-exponentially with gene length. I hypothesize however that the formal properties of fitness landscapes impose constraints on the subset of trajectories that may simultaneously be selectively accessible, even absent any biological data. The permutahedron is a discrete mathematical object that represents the space of all possible mutation trajectories and thus provides a framework in which to explore these constraints and the properties that unite selectively accessible trajectories.

Wednesday, September 26th, 2007
4.00pm, CIT Bldg, Room 241 SWIG Boardroom

CCMB Seminar Series
Ward Wheeler, Ph.D. American Museum of Natural History New York, NY Kolmogorov Complexity, Phylogenetic Analysis, and the Unity of Systematic Methods"

Abstract:
Ideas of computational complexity are applied to phylogenetic analysis which yield a new, more fundamental optimality criterion. This criterion, Minimum Descriptive Length, has far reaching implications for systematics and the context in which we view other optimality criteria such as parsimony, likelihood, and posterior probability.

Wednesday, December 5th, 2007
4:00 p.m.
CIT Bldg, Room 241 - SWIG Boardroom
Host: David Rand

CCMB Seminar Series
Zhijin Wu, Ph.D. Center for Statistical Sciences Center for Computational Molecular Biology Brown University Expanding the Dynamic Range of Gene Expression Measures on DNA Microarrays

Abstract: The DNA microarray technology has been widely used in measuring gene expression levels in biomedical research. The potential ability of monitoring tens of thousands of genes simultaneously makes the microarray approach an efficient tool. However, the raw measurements are noisy. The data is a combination of specific binding and background noise. A considerable proportion of genes appear constantly expressed because the variation in signal is small and seemingly within the variation of background noise alone. The dynamic range of the gene expression level is thus limited to highly expressed genes. This is a crucial limit because often more than half of the genes in an experiment do not reach the level that differential expression can be reliably detected. Our previous work has shown that probes have sequence specific background levels and probe specific background adjustment can improve the dynamic range. Since large number of observations on the same probe is usually not available in a single study, information is borrowed across probes within each sample to estimate the probe specific background. The accumulation of microarray data in public data depository has enabled us to approach the problem from a different angle. With a large number of samples, we now observe data on the same probe across a large variety of experimental conditions. We start with a database of human gene expression arrays of 500 samples. Based on the expression profile on each gene across experiments, we evaluate each probe's tendency to non-specific binding and ability to track target expression variation. We demonstrate further extension of the dynamic range for gene expression measures.

Wednesday, January 23rd, 2008
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom

CCMB Seminar Series
Nathan Edwards Georgetown University, Washington DC Improving the Reliability of Peptide Identification by Tandem Mass Spectromety for Clinical Proteomics and Genome Association

Abstract: Peptide identification by tandem mass spectrometry is the dominant proteomics workflow for protein characterization in complex samples. Coupled with well established high-throughput proteomics workflows, tandem mass spectrometry search engines make identifying the major constituent proteins in clinical samples straightforward. Driven by increasingly sensitive protein chemistry protocols and mass spectrometers, and a new perspective on the importance of alternative splicing and coding SNP protein isoforms, however, the shortcomings of the existing tools are becoming more and more apparent.

We use a variety of computational techniques to improve the reliability of peptide identification analyses, as we seek to address the limitations of current tandem mass spectrometry search tools. First, we aggressively enumerate an inclusive set of potential peptide sequences from transcript evidence, particularly ESTs, to ensure that evidence of novel, unexpected, or unannotated protein isoforms is not missed by tandem mass spectrometry search engines. We use a novel compression technique to ensure that the resulting sequence database can be searched quickly and easily using existing tools, and demonstrate that novel peptides, representing coding SNPs, alternative splicing, and novel mutations, can be observed in publicly available datasets. Second, we apply hidden Markov models to spectral matching of tandem mass spectra of previously identified peptides, improving on the sensitivity and specificity of peptide identification by sequence database search engines and traditional spectral matching techniques. Lastly, we post-process search engine peptide identification results using an unsupervised, model-free, result-combining machine-learning approach that achieves superior sensitivity and specificity than either result combining or machine learning alone. Using this technique on datasets derived from standard protein mixtures, we demonstrate that the performance of the commercial search engine Mascot can be bested by combining the results of two open-source search engines, X!Tandem and OMSSA; but that using all three search engines is better still.

Such a reliable, sensitive, and specific peptide identification analysis platform has the potential to not only explore a largely untapped source of potential cancer biomarkers from clinical cancer samples and cancer cell-lines, but also to inform functional genomics and genome annotation. The characterization of expressed proteins using tandem mass-spectrometry provides direct evidence for the amino-acid sequence of functional proteins and their isoforms, evidence which is not available using other high-throughput experimental techniques. We conclude with a discussion of unconventional experimental workflows for peptide identification and their potential to inform functional genome annotation.

Wednesday, March 5th, 2008
4:00 p.m.
CIT Building ~ Room 241 ~ SWIG Boardroom

CCMB Seminar Series Lecture

CCMB (Center for Computational and Molectular Biology) and MCB (Department of Molecular Biology, Cell Biology and Biochemistry) present:

Colin Collins, Ph.D.
UCSF Cancer Center
University of California, San Francisco

The Promise and Challenge of Translational Oncogenomics

Abstract: Dr. Collins will present on the use of array comparative genomic hybridization (aCGH) and end sequence profiling (ESP) for the development of novel biomarkers and therapeutic targets for cancer. Prostate cancer is amongst the prevalent cancers in the Western world and is increasing in incidence. PSA screening for prostate cancer has resulted in stage migration so that increasingly tumors are detected at an earlier stage. Nonetheless the percentage of men diagnosed with tumors at an intermediate risk of progressing to metastasis has remained relatively constant at ~ 25% and there is little change in the outcome statistics for this group. Therefore, it is critical that biomarkers be developed that can identify men in this group who can safely delay or avoid definitive therapy, alleviating the problem of over treatment. Dr. Collins laboratory has employed unique tumor cohorts and aCGH to identify genome based biomarkers that may be capable of dichotomizing this group of patients. To refine and advance this assay to the clinic whole genome amplification methods are being developed to enable analysis of tumor biopsies on Agilent oligonucleotide arrays. Dr. Collins will review progress and discuss the challenges of this type of research. A problem with array-based analyses is that it is very difficult for them to detect tumor specific biomarkers and drug targets such as the BCR-ABL fusion in CML. In addition, they are blind to heterogeneity. To overcome these limitations Drs. Collins and Volik invented ESP with the explicit goals of determining the structural organization of tumor genomes and transcriptomes. Progress made in the application of ESP to breast cancer cell lines and multiple tumor types will be presented. Specifically, data will be presented showing the structural organization of tumor genomes, the frequency and spectrum of mutations, molecular heterogeneity, validation of genome breakpoints, and detection of tumor-specific fusion genes and transcripts. In addition, ideas will be explored for a genome project based on ESP, and advancement of ESP to the clinic using next generation sequencing technologies.

Wednesday, February 13th, 2008
4.00pm, Sidney Frank Hall, Room #220

CCMB Seminar Series Lecture

Nicholas Eriksson, Ph.D.
University of Chicago
Department of Statistics

Faculty Search Candidate
Center for Computational Molecular Biology

Combinatorial Methods in Evolutionary Biology

Abstract: I'll talk about three areas of evolutionary biology using a combination of statistics and discrete math: viral population diversity, the evolution of drug resistance, and phylogenetics.
Knowledge of the diversity of viral populations is important for understanding disease progression, vaccine design, and drug resistance, yet it is poorly understood. New technologies (pyrosequencing) allow us to read short, error-prone DNA sequences from an entire population at once. I will show how to assemble the reads into genomes using graph theory, allowing us to determine the population structure.

Next, I will describe a new class of graphical models inspired by poset theory that describe the accumulation of (genetic) events with constraints on the order of occurrence. Applications of these models include calculating the risk of drug resistance in HIV and understanding cancer progression.

Finally, I'll describe a polyhedral method for determining the sensitivity of phylogenetic algorithms to changes in the parameters. We will analyze several datasets where small changes in parameters lead to completely different trees and see how discrete geometry can be used to average out the uncertainty in parameter choice.

Monday, March 3rd, 2008
4.00pm
CIT Building ~ Room 241 ~ SWIG Boardroom

CCMB Seminar Series Lecture
Antonio Piccolboni Quantcast, San Francisco, CA "Multivariate segmentation in the analysis of transcription tiling array data"

Abstract: Tiling DNA microarrays extend current microarray technology by probing the non-repeat portion of a genome at regular intervals in an unbiased fashion. A fundamental problem in the analysis of these data is the detection of genomic regions that are differently transcribed across multiple conditions. We propose a linear time algorithm based on segmentation techniques and linear modeling that can work at a user-selected false discovery rate. It also attains a four-fold sensitivity gain over the only competing algorithm when applied to a whole genome transcription data set spanning the embryonic development of Drosophila melanogaster.

Monday, March 10, 2008
3:00 p.m.
CIT Bldg, Room 241, SWIG Boardroom

Hosted by: Sorin Istrail
Refreshments will be served at 2:45 p.m.

To receive CCMB seminar announcements by email, sign up for the computational biology mailing list by sending email to listserv@listserv.brown.edu with the message body "subscribe computational-biology".

CCMB Seminar Series Lecture
Daniel Weinreich, Ph.D. Ecology and Evolutionary Biology Center for Computational Molecular Biology Brown University "Predicting Evolutionary Trajectories in Sexual Populations"

Abstract: Epistasis means that the functional consequence of mutations varies with genetic background, and the evolutionary consequences of epistasis are profound in asexual populations because a novel mutation’s fate is then determined by its fitness effect on only one of many alternative genetic backgrounds. This possibility motivates interest in Sewall Wright’s adaptive landscape, the projection of genotypic fitness values over a discrete, multidimensional nucleotide sequence space. In this framework, populations follow a temporal succession of individual points through this space determined by the interplay between mutational pressure, the local selective gradient defined by the landscape, and stochastic loss of novel genotypes. Several recent empirical characterizations of small regions of this landscape have demonstrated that in asexual populations functional epistasis i) sharply limits the number of mutational trajectories to high-fitness genotypes that are selectively accessible and ii) gives rise to a very sharp non-uniform probability distribution among selectively accessible trajectories. To date however, the absence of an analogous formal framework in which to characterize selectively accessible recombinational trajectories has limited understanding of this problem in sexual populations. I will describe a novel definition of the adaptive landscape appropriate for this problem: the vector field reflecting the joint pressures of mutation, selection and recombination over a continuous multidimensional space that represents both allele frequencies and linkage disequilibrium among alleles. Populations are again regarded as occupying a temporal succession of points in the underlying space, and I will illustrate the potential of this approach by describing how recombination influences a population’s evolutionary trajectory single- and multi-peaked fitness models.

Wednesday, March 19th, 2008
4:00 p.m.
CIT Bldg, Room 241, SWIG Boardroom

CCMB Seminar Lecture Series

Gad Kimmel
University of California, Berkeley

Faculty Candidate
Center for Computational Molecular Biology

"Computational Problems in Human Genetics"

Abstract: The question how genetic variation and personal health are linked is one of the compelling puzzles facing scientists today. The ultimate goal is to exploit human variability to find genetic causes for multi-factorial diseases such as cancer and coronary heart disease. Recent technology improvement enables the typing of millions of single nucleotide polymorphisms (SNPs) for a large number of individuals. Consequently, there is a great need for efficient and accurate computational tools for rigorous and powerful analysis of these data. In my talk I am going to concentrate on two computational problems, which are an essential step in studying the data obtained by this technology: Accurate and efficient significance testing with a correction for population stratification and estimating local ancestries in admixed populations.

Wednesday, March 31, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Ben Raphael

CCMB Seminar Lecture Series

Maria Pilar Francino, Ph.D.
DOE Joint Genome Institute

Faculty Candidate
Center for Computational Molecular Biology

"Comparative Analyses of the Distribution of Promoter Motifs across Bacterial Genomes"

Abstract:

Because binding of RNA polymerase (RNAP) to misplaced sites could compromise the efficiency of transcription, natural selection for the optimization of gene expression should regulate the distribution of DNA motifs capable of RNAP-binding across the genome. We have analyzed the distribution of –10 promoter motifs recognized by the s70 subunit of RNAP in 42 bacterial genomes. We show that selection on these motifs operates across the genome, maintaining an over-representation of –10 motifs in regulatory sequences while eliminating them from the nonfunctional and, in most cases, from the protein coding regions. In some genomes, however, –10 sites are over-represented in the coding sequences; these sites could induce pauses effecting regulatory roles throughout the length of a transcriptional unit. For nonfunctional sequences, the extent of motif under-representation varies across genomes in a manner that broadly correlates with the number of tRNA genes, a good indicator of translational speed and growth rate. This suggests that minimizing the time invested in gene transcription is an important selective pressure against spurious binding. However, selection against spurious binding is also detectable in the reduced genomes of host-restricted bacteria that grow at slow rates, indicating that components of efficiency other than speed may also be important. Minimizing the number of RNAP molecules per cell required for transcription, and the corresponding energetic expense, may be most relevant in slow growers.

These results indicate that genome-level properties affecting the efficiency of transcription and translation can respond in an integrated manner to optimize gene expression. The detection of selection against promoter motifs in nonfunctional regions also indicates that no sequence may evolve free of selective constraints, at least in the relatively small and unstructured genomes of bacteria.

Wednesday, April 2, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Daniel Weinreich

CCMB Seminar Series Lecture

Bjarni Halldorsson

deCODE genetics
Reykjavik University

"Detecting genomic copy number variants using the Ilumina platform"

Abstract: Apart from a relatively small number of variants the genomes of two individuals are identical. These small differences explain much the human diversity. The most common form of variation between individuals are single nucleotide polymorphisms (SNPs), representing a single letter change in an otherwise conserved sequence. In the past couple of years, tremendous progress has been made in identifying the genetic causes of a number of disease and phenotypic traits using arrays, such as those made by Illumina, that simultaneously assay a large number of SNPs.

Copy number variations (CNVs) occur when a segment of the genome is either copied or deleted so that individuals have different number of copies of that variant. Although copy number variations are much less common than SNPs they have the potential to have a much greater impact on an individual, since a functional element of the genome may be missing or occur more often than desired.

We use the Illumina arrays to detect copy number variations and designed an array of SNP and univariant probes in the genome. We have assayed a large number of individuals for these variations. In this talk we consider the problem of translating these assay results into a determination of the number of copies an individual has of a copy number variation.

Wednesday, April 16th, 2008
4:00 pm
CIT Building, Room 241 – SWIG Boardroom
Hosted by: Sorin Istrail
Refreshments will be served at 3:45 pm

CCMB Seminar Series
Yves Moreau SymBioSys Center for Computational Systems Biology University of Leuven (ESAT-SCD), Belgium "Candidate gene prioritization by genomic data fusion"

Abstract: One of the main challenges of systems biology is to cope with the overwhelming amount and diversity of omics data and a key problem is that of 'candidate gene prioritization' - i.e., selecting among a large list of candidate genes those that are most promising for further biological validation. We present ENDEAVOUR, a generic computational strategy to prioritize candidate genes based on their similarity (across multiple types of data, including sequence, expression, literature, annotation, etc.) to a set of genes already implicated in the process under scrutiny. We first validate the overall performance through a statistical cross-validation of 29 diseases and 3 biological pathways. Next, we validate a novel candidate for DiGeorge syndrome in a zebrafish model. Finally, we present an alternative machine learning strategy for gene prioritization using kernel methods. The key advantage of kernel methods in this context is that they provide an elegant framework for the fusion of data - by relying only on positive semi-definite kernel similarity matrices for the representation of heterogeneous data sources. Kernel-based novelty detection outperforms our previous method on our disease gene benchmark.

Bio: My research interest falls in the broad field of bioinformatics and, more specifically into what I call Computational Systems Biomedicine, which is the application of computational methods in Systems Biology towards the understanding and modulation of developmental and pathological processes relevant to human health. The area of application in which I am currently most active is diagnosis and gene discovery in congenital disorders.

Tuesday, July 29th, 2008
11:00 a.m.
CIT Bldg, Room 241, SWIG Boardroom
Hosted by: Charles E. Lawrence
Refreshments will be served at 10:45 a.m.

Dr. Moreau will be on campus Tuesday, July 29th. Individuals interested in meeting privately with him are encouraged to contact Louise Patterson at Louise_Patterson@Brown.edu or 863-3178.

_______________________________________________________ Events