Brown University | Center for Computational Molecular Biology

Brown University

Center Home

The Center

Research Areas

Bioinformatics Tools

Courses

Events

Affiliated Programs

Executive Committee

Publications

Undergraduate Study

Graduate Study

Lecture Videos

News Archive

Open Positions

CCMB Seminar Series 2006-2007

______________________________________________________ Events

Center For Computational Molecular Biology Seminar Series

Charles DeLisi

Boston University

Computational Identification of
Regulatory Markers for Human Tumors

Abstract:
I will outline two convergent lines of research in our lab: one a general computational-experimental strategy for identifying epigenetic alterations in upstream regions which can serve as markers for incipient neoplastic lesions; the other the development and application of supervised and unsupervised approaches to identifying the targets of transcriptional regulators. Results for Wilms' tumor and clear cell renal carcinoma will be briefly discussed.

Wednesday, May 9th, 2007
4:00 p.m.
CIT Building, Room 241 ~ SWIG Boardroom

Mark Gerstein

Gerstein Lab, Yale University

Understanding Protein Function
on a Genome-scale using Networks

Abstract:
My talk will be concerned with topics in proteomics, in particular predicting protein function on a genomic scale. We approach this through the prediction and analysis of biological networks -- both of protein-protein interactions and transcription-factor-target relationships. I will describe how these networks can be determined through integration of many genomic features and how they can be analyzed in terms of various simple topological statistics. I will discuss the accuracy of various reconstructed quantities.

Wednesday, May 2nd, 2007
4:00p.m.
CIT Building, Room 241 ~ SWIG Boardroom

Rob Kulathinal

Harvard University
Cambridge, MA

Parallel sequencing, synthetic assembly
and comparative analysis of the
Drosophila mauritiana genome

Abstract:
New sequencing technologies are beginning to transform the very approaches that biologists employ. For geneticists interested in speciation or, as Charles Darwin termed, "that mystery of mysteries", the availability of genomic sequence from closely related species of sequenced model organisms holds great promise. Using 454 sequencing, we have generated a 1.5X coverage of the Drosophila mauritiana genome. D. mauritiana is a sibling species of the genetics workhorse, D. melanogaster, as well as the more closely related species, D. simulans and D. sechellia. The genomes of these species were used to both assemble the D. mauritiana genome and to shed light on its divergence at the molecular level. In addition to the comparative analyses, a number of challenges will be highlighted, including the short and error-prone nature of 454 reads relative to Sanger sequencing reads. We hold that such laboratory-scale genomics provide an emerging entry point to successfully answer outstanding biological questions.

Wednesday, April 25th, 2007
4:00 p.m.
CIT Building, Room 241 ~ SWIG Boardroom

Bruce Hendrickson

Sandia National Laboratory
Albuquerque, New Mexico

High Performance Computing in Biology:
Applications, Algorithms and Architectures

Abstract:
In its relatively short lifespan, computational biology has had an enormous impact on high performance computing. The computing requirements of biological applications have altered the research and business landscape of advanced computing in myriad ways. In this talk, I will review some of these synergistic developments and speculate about the even more dramatic changes ahead.

Wednesday, April 18th, 2007
4:00 p.m.
CIT Bldg, Room 241 SWIG Boardroom

Wolfgang Peti

Brown University, Providence RI

Folded or Unfolded

Abstract:
Our research challenges the status quo in protein science, which says that only proteins with a 3-dimensional structure can fulfill a function. Linus Pauling stated in 1946 "Answers to many basic problems of biology-nature of growth, mechanism of duplication of viruses and genes, action of enzymes, mechanism of physiological activity of drugs, hormones, and vitamins, structure and action of nerve and brain tissue-may lie in knowledge of molecular structure and intermolecular reactions". However, in recent years it has been shown that also unstructured or partially structured proteins also fulfill essential functions. Usually these proteins fold upon binding to their targeting protein and, in this way, are critical for mediating numerous regulatory events. My laboratory has identified a large number of these intrinsically unstructured protein or protein domains which are, significantly, particularly important in the brain. Our current efforts are focused on identifying and developing methods that will enable us to determine the 3-dimensional structure of these unstructured proteins. These structures are very different then usual protein structures, since they depend on populations of states, rather den defined distances. This, in turn, will allow us to elucidate the rules that govern their folding upon binding interaction and advance our understanding of this new paradigm of protein: protein interactions.

Wednesday, April 11th, 2007
4:00 p.m.
CIT Bldg, Room 241 SWIG Boardroom

Mayetri Gupta, Ph.D.

University of North Carolina at Chapel Hill
Department of Biostatistics

Faculty Search Candidate
Center for Computational Molecular Biology

Statistical Methods for Deciphering Transcription Regulatory Processes

Abstract:
Understanding gene regulation within living cells is one of the major scientific challenges in the post-genome era. Short segments of DNA, known as transcription factor binding sites (TFBSs) or motifs, are believed to be instrumental in initiating the process of gene regulation. However, accurate motif discovery, especially in complex genomes, is a challenging task. New experimental methods studying genome-wide protein-DNA interactions suggest that certain periodic features of chromatin packaging are associated with binding of TFs. In this talk, I will present some of our recent work towards understanding gene regulatory processes, including (1) discovery of transcription regulatory modules from genomic sequence; (2) assessment of nucleosome positioning from high density tiling array experiments and (3) relating chromatin packaging to the presence of TF binding sites and other features of the DNA sequence. For (1), we introduce a Bayesian hidden Markov model framework for sequences containing binding site clusters, with a Markovian dependence structure for both motif ordering and motif site occurrences. For (2), we develop a generalized hierarchical hidden Markov model that allows for length-restricted features, as well as variability in probe behavior. Novel recursion-based Monte Carlo algorithms are devised for efficient model selection and parameter estimation, and applied with considerable degrees of success to yeast and mammalian data sets. In addition to the attractiveness of these models for their ability to capture important characteristics of the underlying biology, the sampling based inference allows us to assess the stochastic variation in the parameters of interest instead of providing a single point estimate. I will conclude with a description of the current and future extensions of this work, towards the overall goal of utilizing multiple sources of genomic data in elucidating regulatory pathways.

Lecture: Thursday, March 15th 12:00pm CIT, SWIG Room 241
Chalk Talk: Friday, March 16th 12:00pm CIT, SWIG Room 241

Yun S. Song, Ph.D.

University of California, Davis
Department of Computer Science

Faculty Search Candidate
Center for Computational Molecular Biology

Computational and Mathematical
Aspects of Meiotic Recombination

Abstract:
Meiotic recombination creates a mosaic genome from the two homologous genomes of an individual. In addition to being a major mechanism that can create new genetic types in a population, recombination has far-reaching consequences on the genealogy of chromosomes; as a result of recombination, different regions in a chromosome can have different evolutionary histories. Estimating the frequency and the location of historical recombination is relevant to a wide range of practical and theoretical problems in genomics. In this talk, I will address a couple of statistical problems related to meiotic recombination and sketch some algorithmic work on reconstructing parsimonious evolutionary histories with recombination.

Lecture: Tuesday, March 13th 12:00 noon CIT, SWIG Room 241
Chalk Talk: Wednesday, March 14th 12:00 noon CIT, SWIG Room 241

Hue Sun Chan, Professor of Biochemistry

Departments of Biochemistry and Medical Genetics & Microbiology,
Faculty of Medicine, University of Toronto

Cooperativity Principles in
Protein Folding, Entropic and Enthalpic Barriers

Abstract:

Many small single-domain proteins undergo cooperative, switch-like folding/unfolding transitions with very low populations of intermediate, i.e., partially folded, conformations. This phenomenon is referred to as cooperative folding. For most natural proteins, cooperativity is likely an evolved trait to guard against disease-causing aggregation. >From a biophysical standpoint, cooperativity is a remarkable molecular-recognition feat that has not yet been achieved by de novo experimental design. Therefore, knowing the biophysical basis of cooperativity is central to addressing many questions in protein folding and design and to progress in understanding diseases of misfolding. However, cooperativity is not readily accounted for by common notions about driving forces for folding. I will discuss how common protein chain models with pairwise additive interactions are insufficient to account for the folding cooperativity of natural proteins, and how models with nonadditive local-nonlocal coupling are able to rationalize cooperative folding rates that are well correlated with native topology. The traditional formulation of folding transition states entails a folding free energy barrier with both enthalpic and entropic components. I will explore the microscopic origins of these thermodynamic signatures in terms of conformational entropy as well as desolvation (dewetting) effects. Intriguingly, the existence of significant enthalpic folding barriers raises fundamental questions about the validity of the funnel picture of protein folding, because such enthalpic barriers appear to imply that there are substantial uphill moves along a folding trajectory. Using results from extensive atomic simulations, I will show how the paradox can be resolved by a dramatic entropy-enthalpy compensation at the rate-limiting step of folding. In this perspective, the height of the enthalpic barrier is seen as related to the degree of cooperativity of the folding process.

Wednesday, March 7th, 2007
4:00 p.m.
CIT Building, Room 241 ~ SWIG Boardroom

Jennifer Knies

University of North Carolina

Thermal adaptation in the bacteriophage G4

Abstract:

The likelihood and consequences of adaptation to novel environments can be predicted if there are common patterns of adaptation to novel environments. Thermal adaptation is a useful model for studying these common patterns, because the proximate mechanisms (e.g. biochemical rate processes) that determine the effects of temperature on growth or performance and the evolutionary causes of thermal adaptation are known. Two specific hypotheses about thermal adaptation are: (1) Thermal constraints on reaction rates will cause cold-adapted species to have lower maximal growth rates than hotter- adapted species at their thermal optima (i.e. "Hotter is better"), and (2) Adaptation to a novel temperature will result in trade-offs in performance at different temperatures (i.e. "Trade-off" Hypothesis). Both of these pstem from the idea that there are functional constraints on biochemical reactions that limit or reduce an organism perfat different temperatures. These predictions were investigated busing the bacteriophage G4 as a model experimental system. The growth rate of natural and lab isolates of G4 was examined ovewide temperature range and evidence was found for both "hotter isbetter" and the "trade-off" hypotheses. In addition, the proximmechanismderlying these observed patterns are also being investigated. Higher temperatures are predicted to select for increased protein stability. This hypothesis is being tested by adapting multiple phage populations to temperatures above and below their thermal optima, identifying the genetic bases of adaptation of these populations, and then assaying the thermostability of the evolved genotypes.

Wednesday, February 28, 2007 4:00 p.m.
CIT Building, Room 241 ~ SWIG Boardroom

Eran Halperin, Ph.D.

International Computer Science Institute, Berkeley, California

Faculty Search Candidate
Center for Computational Molecular Biology

Whole-Genome Disease Association Studies:
Challenges and Solutions

Abstract:

The recent data release of the Haplotype Mapping project, and the rapid reduction in genotyping costs, open new directions and opportunities in the study of complex diseases via the analysis of single nucleotide polymorphisms (SNPs) data. At the same time, the increased size of the SNP datasets set new computational and statistical challenges.

In this talk I will discuss some of the challenges set by the large-scale of these studies, and the current solutions to these challenges. In particular, I will describe recent results on whole-genome haplotype analysis, including haplotype inference, and the incorporation of the HapMap data in haplotype analysis of case-control studies. I will also discuss potential drawbacks of these methods due to population substructure, and suggest solutions that are scalable to the coming large-scale studies.

Lecture: Monday, February 26th 4:00p.m. CIT, SWIG Room 241
Chalk Talk: Tuesday, February 27th 12:00 noon CIT, SWIG Room 241

Andrew Kern, Ph.D.

University of California, Santa Cruz
Center for Biomolecular Science and Engineering

Faculty Search Candidate
Center for Computational Molecular Biology

Human Ultraconserved Elements are Ultraselected

Abstract:
Ultraconserved elements in the human genome, which have remained nearly un-changed since our divergence with chicken nearly 310 million years ago, remain an evolutionary mystery. Two competing explanatory hypotheses are strong selection to maintain their largely unknown functions, or mutational cold-spots. Using human population re-sequencing data of 332 such elements and their flanking sequences in 96 individuals, and a hierarchical Bayesian Markov Chain Monte Carlo (MCMC) analy-sis of the segregating site frequency spectrum, we find that extremely strong selection within ultraconserved elements continues in the current human population. Indeed, the strength of selection acting on ultraconserved elements approximately four times greater than that which constrains nonsynonymous protein coding bases.

Lecture: Tuesday, February 20th 12:00 noon CIT, SWIG Room 241
Chalk Talk: Wednesday, February 21st 12:00 noon CIT, SWIG Room 241

Christina Leslie, Ph.D.

Center for Computational Learning Systems, Columbia University

Faculty Search Candidate
Center for Computational Molecular Biology

Learning predictive models of gene regulation

Abstract:
Studying the behavior of gene regulatory networks by learning from high-throughput genomic data has become one of the central problems in computational systems biology. Most work in this area has focused on learning structure from data -- e.g. finding clusters or modules of potentially co-regulated genes, or building a graph of putative regulatory "edges" between genes -- and has been successful at generating qualitative hypotheses about regulatory networks.

Instead of adopting the structure learning viewpoint, our focus is to build predictive models of gene regulation that allow us both to make accurate quantitative predictions on new or held-out experiments (test data) and to capture mechanistic information about transcriptional regulation. Our algorithm, called MEDUSA, integrates promoter sequence, mRNA expression, and transcription factor occupancy data to learn gene regulatory programs that predict the differential expression of target genes. Instead of using clustering or correlation of expression profiles to infer regulatory relationships, the algorithm learns to predict up/down expression of target genes by identifying condition-specific regulators and discovering regulatory motifs that may mediate their regulation of targets. We use boosting, a technique from statistical learning, to help avoid overfitting as the algorithm searches through the high dimensional space of potential regulators and sequence motifs. We will report computational results on the yeast environmental stress response, where MEDUSA achieves high prediction accuracy on held-out experiments and retrieves key stress-related transcriptional regulators, signal transducers, and transcription factor binding sites. We will also describe recent results on the hypoxic response in yeast, where we used MEDUSA to propose the first global model of the oxygen sensing and regulatory network, including new putative context-specific regulators. Through our experimental collaborator on this project, the Zhang Lab at Columbia University, we are in the process of validating our computational predictions with wet lab experiments, with encouraging preliminary results.

Lecture: Thursday, February 22nd 12:00 noon CIT, SWIG Room 241
Chalk Talk: Friday, February 23rd 12:00 noon CIT, SWIG Room 241

Niko Beerenwinkel, Ph.D

Harvard University, Program for Evolutionary Dynamics
Faculty Search Candidate, CCMB

Evolutionary Escape on Fitness Landscapes

Abstract:

The evolution of HIV within individual patients is associated with disease progression and failure of antiretroviral drug therapy. Using graphical models we describe the development of HIV drug resistance mutations and show how these models improve predictions of the clinical outcome of combination therapy. We present combinatorial algorithms for computing the risk of escape of an evolving population on a given fitness landscape. The geometry of fitness landscapes and the underlying gene interactions are analyzed in an attempt to generalize the notion of pairwise epistasis to higher-order genetic systems. Finally, we discuss the new and exciting prospects for analyzing viral genetic variation that arises from recent pyro-sequencing technology.

Lecture: Thursday, February 1st, 4:00 pm, CIT SWIG Room 241
Chalk Talk: Friday, February 2nd, 12:00pm, CIT SWIG Room 241

John Moult

Center for Advanced Research in Biotechnology
University of Maryland Biotechnology Institute

SNPs, Protein Structure, and Disease

Abstract:

How does genetic variation influence disease susceptibility? To partly address this question we have developed structure and sequence based models of the impact of SNPs on protein function in vivo. The models have been applied to a set of single nucleotide variants known to cause monogenic disease, and to a set found in the Human population, not known to be associated with disease. There are two surprising findings: First, most monogenic disease causing variants act by mildly destabilizing protein structure. The results imply that most proteins are only just sufficiently stable to operate effectively in vivo, and suggest a possible general strategy for developing therapeutics. Second, about a quarter of the population SNPs appear to seriously impair function at the molecular level. Examination of a set of these cases suggests a variety of mechanisms that make the larger scale system robust with respect to component defects. Network level robustness analysis has the potential to identify those SNPs that most likely contribute to susceptibility to complex diseases. To facility this, we have integrated all the pertinent data into a 'knowledgenet' interface (www.snps3d.org), allowing rapid assessment of the known relationships between proteins relevant to a particular disease, as well as access to molecule level information and to the supporting literature.

January 31st, 2007
4:00 p.m.
CIT Building, Room 241 ~ SWIG Boardroom
Refreshments will be served at 3:45pm

John Quackenbush

Dana-Farber Cancer Institute and
the Harvard School of Public Health

Extracting Biological Meaning from
High-Dimensional Datasets

Abstract:

The revolution of genomics has come not from the "completed" genome sequences of human, mouse, rat, and other species. Nor has it come from the preliminary catalogues of genes that have been produced in these species. Rather, the genomic revolution has been in the creation of technologies - transcriptomics, proteomics, metabolomics - that allow us to rapidly assemble data on large numbers of samples that provide information on the state of tens of thousands of biological entities. Although the gene-by-gene hypothesis testing approach remains the standard for dissecting biological function, 'omic technologies have become a standard laboratory tool for generating new, testable hypotheses. The challenge is now no longer generating the data, but rather in analyzing and interpreting it. Although new statistical and data mining techniques are being developed, they continue to wrestle with the problem of having far fewer samples than necessary to constrain the analysis. One way to deal with this problem is to use the existing body of biological data, including genotype, phenotype, the genome, its annotation and the vast body of biological literature. Through examples, we will demonstrate show how diverse datasets can be used in conjunction with computational tools to constrain 'omics datasets and extract meaningful results that reveal new features of the underlying biology.

Wednesday, January 24th, 2007
4:00 p.m.
CIT Bldg, Room 241 SWIG Boardroom

Paola Oliveri, Ph.D.
California Institute of Technology

The genomic control of embryo development

Abstract:
The development from a single fertilized cell to a complex organism is an inherited property and is entirely encoded in the genome. The developmental program is controlled by large gene regulatory networks (GRNs). The sea urchin endomesoderm gene regulatory network is one of the most comprehensively understood regulatory apparatus for control of spatial and temporal gene expression in any complex developmental system. At present, more than 50 genes have been linked into this GRN. The architecture of the network is emerging from a system-level approach, in which computational analysis is applied to high resolution spatial and temporal gene expression data and large-scale quantitative perturbation data obtained by gene expression knockouts and other methods, combined with experimental embryology. In the endomesoderm domain the primary mesenchyme cell (PMC) GRN is so far the most complete model for early cell specification in that it extends convincingly from maternal inputs to cell-type differentiation. We are using the data emerging from the recently sequenced Strongylocentrotus purpuratus genome together with the transcriptome data and the spatio-temporal profiles of the sea urchin regulatory genes to address the question of completeness of the endomesoderm GRN. We identified all gene encoding transcription factors relevant for this embryological domain, determined their spatio-temporal profile, and we are in the process of integrating them into the GRN by perturbation and cis-regulatory analysis. The predictive model constructed from such perturbation data specifies the key cis-regulatory inputs into regulatory genes and their key outputs terminating at other regulatory genes. Hence, it is directly testable at the cis-regulatory level. We identified by computational and experimental methods, and isolated, some central cis-regulatory nodes of the GRN. In gene transfer experiments these genomic fragments displayed the same responses to the appropriate perturbations as do the endogenous genes in the whole embryo; and when the genomic target sites for the relevant inputs were mutated, the cis-regulatory constructs behave in the expected ways. To date, this system wide authentication indicates that the perturbation analysis provides the encoded linkages with surprising accuracy. A predictive model of the GRN provides the essential "transformation function" between the static genomic regulatory code that is hardwired into the DNA sequence, and the dynamic events of spatial and temporal gene expression. In addition, by comparison analysis we show that GRNs are modular in structure and we have identified some subcircuit design principles or network architectural subcomponents that qualitatively operate specific regulatory logic function regardless to the specific regulatory genes that take part.

Wednesday, December 13th, 2006
4:00pm
CIT Bldg ~ Swig Boardroom # 241

Advances in Genomic Tools Construction: Saturday, Dec. 9, 2006

Liliana Florea
George Washington University

Spaced Seeds for Cross-Species cDNA-to-Genome Alignment