Speakers & Events - Academic Year 2012


Spring Semester 2011-2012

Wednesday, February 1st, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Ion MandoiuIon Mandoiu 

Associate Professor, Computer Science & Engineering
University of Connecticut 

"Inferring Viral Quasispecies Spectra from NGS Reads"

RNA viruses infecting a host usually exist as a set of closely related sequences, referred to as quasispecies. The genomic diversity of viral quasispecies infecting a host is of great interest, particularly for chronic infections, since it can lead to resistance to existing antiviral therapies.  By eliminating time-consuming cloning steps and providing unprecedented sequencing depth, next-generation sequencing (NGS) technologies promise to enhance our ability to characterize quasispecies spectra of infected hosts.  Unfortunately, standard assembly software was originally designed for haploid genome reconstruction, and cannot be used to simultaneously assemble and estimate the abundance of multiple closely related quasispecies sequences.  In this talk I will present several algorithms for quasispecies spectrum reconstruction and frequency estimation from both shotgun and amplicon NGS reads. Results of empirical comparisons with existing methods on simulated and real 454 pyrosequencing reads will also be presented.

Wednesday, February 8th, 2011
4:00 p.m. - SWIG Boardroom (CIT 241)

Erica LarschanErica Larschan
Brown University
Molecular Biology, Cell Biology, and Biochemistry

"Establishing domains of coordinate gene regulation"

Dosage compensation is an important model system for defining the mechanisms of coordinate gene regulation because all of the genes on a single chromosome are specifically identified and co-regulated. The Male-Specific Lethal (MSL) complex is the key regulator of Drosophila dosage compensation because it increases transcript levels from active genes on the single male X chromosome to equalize gene dosage with females who have two copies of each X chromosome (Belote). Both cis-acting DNA sequences called MSL Recognition Elements (MREs) (Ref) and X-linked roX (RNA on X) non-coding RNA components (Ref) have been implicated in distinguishing the X chromosome from autosomes. However, the way in which the MSL complex specifically targets MRE sequences on the male X chromosome remained elusive because MREs are only two-fold X-enriched and known MSL components are insufficient for direct recognition of MREs (Ref). We recently identified the CLAMP (Coupling Lethal Adaptor for MSL Proteins) zinc-finger protein as one of many candidate MSL-regulators (Larschan et al., submitted). Here, we demonstrate that CLAMP serves as a critical link between MSL complex and MREs by directly interacting with MREs and targeting MSL complex to its high affinity sites. Furthermore, CLAMP and MSL complex exhibit an inter-dependent binding interaction that strongly increases occupancy of both factors at MSL complex high affinity sites. Even in the absence of MSL complex, CLAMP is highly enriched at potential ‘seed’ sites distributed along the length of the male X chromosome. Therefore, we propose the following novel mechanism for X chromosome recognition: 1) CLAMP directly recognizes MRE elements and is enriched at seed sites including the roX loci that serve as initial targets during X-identification; 2) A CLAMP-MSL inter-dependent association at high affinity sites concentrates MSL complex at these seed sites to generate X-specificity from a two-fold X-enrichment of MRE sequences. In this way, we provide key insight into how a single chromosome can be specifically recognized within a complex eukaryotic genome.

Wednesday, February 15th, 2011
4:00 p.m. - SWIG Boardroom (CIT 241)

Collin StultzCollin Stultz Associate Professor of Health Sciences & Technology and Electrical Engineering & Computer Science
W.M. Keck Associate Professor of Biomedical Engineering

Massachusetts Institute of Technology

"Are Models of Intrinsically Disordered Proteins Correct?"

The characterization of intrinsically disordered proteins is challenging because accurate models of these systems require a description of both their thermally accessible conformers and the associated relative stabilities or weights. These structures and weights are typically chosen such that calculated ensemble averages agree with some set of pre-specified experimental measurements; however, the large number of degrees of freedom in these systems typically leads to multiple conformational ensembles that are degenerate with respect to any given set of experimental observables.  Moreover, our recent work demonstrates that estimates of the relative stabilities of conformers within an ensemble are often incorrect when one does not account for the underlying uncertainty in the estimates themselves. Therefore, we have developed a method for modeling the conformational properties of disordered proteins that estimates the uncertainty in the weights of each conformer.  A unique and powerful feature of the approach is that it provides a built-in error measure that allows one to assess the accuracy of the ensemble.  Using this approach we constructed an ensemble that characterizes the accessible states of the IDP, tau protein.  These data led to new insights into intramolecular interactions that may play a role in promoting tau aggregation ? a process which has been linked to neuronal death and dysfunction in patients with Alzheimer?s disease.  More generally, we derive an order parameter that quantifies the extent of disorder within a protein.  Although protein disorder is normally thought of as a binary phenomenon (i.e., a protein is either disordered or not), we suggest that the concept of protein disorder should be treated like a continuous variable, and that not all unfolded states are created equal.

Wednesday, February 22nd, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Xi LuoXi Luo 

Assistant Professor of Biostatistics
Brown University

"Graphical Models for Gene Networks and Their Use in Classification"

Graphical models use network graphs to represent the statistical dependence structures of multiple variables, and they have been employed to study gene regularization networks.  In this talk, I will first describe a few statistical methods for estimating such large gene networks when the sample size is much smaller.  These approaches emphasize direct associations, and thus encourage parsimoniously connected networks to enhance interpretability.  Our approaches are based on convex optimization, and we will provide efficient algorithms (and R packages) for computation.  Mathematical analysis of our methods has demonstrated the improvement in both computation and network recovery, and we will further illustrate these merits using publicly available microarray data in a breast cancer study and in a HIV brain tissue study.  In the second part of the talk, I will describe a likelihood approach to utilize the estimated gene networks for subject classification.  This approach yields a simple risk score criterion based on the Bayes decision rules, and it is robust against the high correlations among those important gene factors.  Its prediction performance will be compared with other popular classification methods.


Wednesday, February 29th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Chris Bailey-KelloggChris Bailey-Kellogg


Associate Professor
Department of Computer Science
Dartmouth College

"Optimization Algorithms for the Design of Immunotolerant Biotherapies"

The explosive growth of biotherapeutic agents is revolutionizing treatment of numerous diseases, but innovations in biotherapies have also created new challenges for drug design and development.  One distinguishing risk factor of therapeutic proteins is the prospect of eliciting an immune response in humans.  To meet this challenge, we have developed optimization algorithms that minimize a protein’s T cell epitope content while simultaneously ensuring that the engineered variant maintains a high level of stability and activity.  Our algorithms assess immunogenicity using T cell epitope predictors that score peptide binding potential to class II MHC molecules.  The structural and functional consequences of deimmunizing mutations are evaluated with statistical sequence potentials and molecular mechanics force fields.  The development and implementation of these algorithms will be highlighted through comparative analysis with previously published deimmunization efforts as well as our own experimental validation using beta-lactamase, a model therapeutic candidate with utility in ADEPT cancer therapies.


Wednesday, March 14th, 2012
4:00 - SWIG Boardroom (CIT 241)

Joel WeltmanJoel Weltman

Clinical Professor Emeritus of Medicine
Brown University

Pandemic Influenza Bioinformatics

Epidemic influenza is a significant threat to public  health. Because of this threat, thousands of influenza viral sequences from the world-wide epidemic, ie, pandemic, of 2009-2010 have been archived at the NCBI-NIH Influenza Virus Resource. An analysis of this archive based upon the molecular bioinformatics of viral subsets sorted according to a reference nucleotide position of maximum information entropy will be presented. The results of this analysis reveal the presence of non-random, intergenic biological forces acting at the nucleotide level of viral organization. Evidence will be presented that these organizational and evolutionary forces may be of epidemiological and clinical significance.


Monday, March 19th, 2012
12:00 noon - SWIG Boardroom (CIT 241)

Ivo GrosseIvo Grosse 

Ivo Grosse
Professor, Institut für Informatik, Martin-Luther-Universität

De-Novo Discovery of Differentially Abundant DNA Binding Sites Including Their Positional Preference

The identification of DNA binding sites has been a challenge since the early days of computational biology, and its importance has been increasing with the development of new experimental techniques and the ensuing flood of large-scale genomics and epigenomics data yielding approximate regions of binding. Many binding sites have a pronounced positional preference in their target regions, which makes them hard to find as this preference is typically unknown, and many of them are weak and cannot be found from target regions alone but only by comparison with carefully selected control sets. Several de-novo motif discovery programs have been developed that can either learn positional preferences from target regions or differentially abundant motifs in target versus control regions, but the combination of both ideas has been neglected.

Here, we introduce Dispom, a de-novo motif discovery program for learning differentially abundant motifs and their positional preferences simultaneously. Dispom outperforms existing programs based on benchmark data and succeeded in detecting a novel auxin-responsive element (ARE) substantially more auxin-specific than the canonical ARE. Since its publication, we have endowed Dispom with more complex motif models and extended it to handle weighted input data such as ChIP-seq or BS-seq data. We have been applying Dispom to in-house and publicly available data of different transcription factors and insulators in yeasts, plants, and mammals as well as to protein-binding microarrays, where it turned out to be one of the top-scoring approaches in the corresponding DREAM challenge.


Wednesday, March 21, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Chaolong WangChaolong Wang 

PhD Candidate in Bioinformatics
University of Michigan, Ann Arbor

"Statistical approaches for studying human genetic variation in diverse populations"

The spatial pattern of human genetic variation provides a basis for investigating the history of human migrations. Statistical techniques such as principal components analysis (PCA) and multidimensional scaling (MDS) have been widely used to summarize spatial patterns of human genetic variation. Although similarity between these statistical maps of genetic variation and the geographic maps of sampling locations is often observed, it has not been assessed systematically and quantitatively across different parts of the world. In the first part of this presentation, I will present my recent work on quantitatively comparing the similarity between genes and geography using a Procrustes analysis approach. We combine genome-wide SNP data from over 100 populations worldwide to perform a systematic analysis on the geographic structure of human genetic variation in different regions. We find that significant similarity between genes and geography exists in general in different geographic regions and at different geographic levels, supporting a view that geography plays a strong role in giving rise to human population structure.

In the second part of the presentation, I will briefly go through another topic of my research: correcting for allelic dropout in microsatellite data. Allelic dropout is a commonly observed source of missing data in microsatellite genotypes, which can substantially compromise the data accuracy and affect many microsatellite-based studies. Traditional solutions for allelic dropout often require replicate genotyping, which is costly and often impossible in population-genetic studies. We have therefore proposed a maximum likelihood approach to estimate allelic dropout rates and correct for allelic dropout when only one set of nonreplicated genotypes is available. Through simulations, we show that our method is both accurate and fairly robust to some violations of model assumptions. 


Wednesday, April 4th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Ira HallIra Hall


Assistant Professor, Biochemistry & Molecular Sciences
University of Virginia School of Medicine

"Next-generation sequencing of structural variation in germline and somatic genomes"


Wednesday, April 11th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Eric MorrowEric MorrowAssistant Professor in Biology and Psychiatry & Human Behavior
Brown University 

"Novel genomic methods for mutation discovery in disorders of cognitive development"


Friday, April 13th, 2012
2:00 p.m. - SWIG Boardroom (CIT 241)

Iman HajirasoulihaIman Hajirasouliha

CCMB Postdoctoral Seminar
PhD Candidate at Simon Fraser University
Lab for Computational Biology, School of Computing Science


"Algorithmic Methods for Structural Variation Detection"

We discuss the challenges of detecting structural variations in sequenced genomes. We present our methods on finding deletions, novel sequence insertions, and mobile element insertions. We also present our recent approach: a shift in genomic structural variation (SV) studies away from the conventional two step approach of i) independent SV discovery and ii) pairwise comparison of structural variation to a simultaneous SV discovery framework in multiple genomes. Tests of our novel framework on the genomes of mother-father-child trios sequenced by Illumina show that the conventional strategy works poorly in providing meaningful biological results through comparative analysis. Our framework not only significantly reduces the number of incorrect de novo variations for the same number of total variations but also predicts more known true positives. We also present our results on the 1000 GP data and discuss the future directions.


Wednesday, April 18th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Ali BashirAli Bashir


Assistant Professor, Genetics & Genomic Sciences
Mount Sinai School of Medicine 

"Hybrid Assembly of Bacterial Genomes"

Next-generation sequencing technologies have dramatically improved our ability to characterize novel genomes.  Despite this, de novo assembly of genomes remains a challenging problem. The long read length and quick time to result of Single Molecule Real Time (SMRT®) sequencing make it an ideal platform for the rapid assembly and typing of bacterial pathogens.  Specifically, by leveraging high quality short read data (i.e. PacBio® Circular Consensus Sequences (CCS), Illumina® sequences, or 454® sequences) with SMRT sequencing, one can obtain highly accurate assemblies with reduced fragmentation and larger N50s.  Here, we present two different assembly methods used to assemble the bacteria from the recent outbreaks of V. cholerae in Haiti and E. coli in Germany.  First, we present a method to combine high-quality short read contigs with SMRT sequencing to create scaffolds that order the contigs and resolve repeats. We show how this method allowed the complete assembly of the Haitian V. cholerae by merging high quality Illumina/454 contigs with PacBio continuous long read (CLR) and strobe read data, without the need for experimental finishing techniques such as PCR.  Second, we present an error correction approach which uses high quality short read sequences to improve the accuracy of CLR data that can then be passed into conventional overlap-layout-consensus assembly algorithms.  This method is applied to the recent outbreak of E. coli, merging hiqh-quality PacBio CCS with PacBio CLR data.  These methods alone or together offer high promise for improving assemblies of increasingly complex genomes.


Wednesday, April 25th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Niall HowlettNiall Howlett

Niall Howlett
Assistant Professor, College of the Environment & Life Sciences
University of Rhode Island

Maintaining Genome Stability: Role of the Fanconi Anemia Pathway

The primary research focus of the Howlett laboratory is the eukaryotic DNA damage response. Specifically, we study the molecular etiology of the rare chromosome instability disease Fanconi anemia (FA), using biochemical, cytogenetic, and genomic approaches. FA is clinically characterized by congenital defects, progressive pediatric bone marrow failure, and pronounced cancer susceptibility. At thecellular level FA is characterized by chromosome instability and hypersensitivity to DNA crosslinking agents. To date 15 FA genes have been identified, the protein products of which function cooperatively in the FA-BRCA pathway to repair DNA damage. Importantly, somatic and epigenetic inactivation of the FA-BRCA pathway is a frequent occurrence in cancer and bone marrow failure in the general (non-FA) population. Therefore the study of this rare disease stands to impart a greater understanding of the molecular origins of abnormal hematopoiesis and cancer susceptibility in general.   

Thursday, April 26th, 2012
4:00 p.m. - SWIG Boardroom

Robert HeckendornRobert Heckendorn 

Associate Professor, Computer Science Department & College of Engineering
University of Idaho

Nonclassical Mathematical Tools for Conceptualizing Epistasis

BEACON, A Center for the Study of Evolution in Action, at Michigan
State University provides an opportunity for evolutionary biologists and
computer scientists studying evolutionary computation to exchange
ideas and techniques from their parallel universes.  While many of
views of evolution are very similar between the two groups, the views
of epistasis stand out as particularly distant from one another.  In
this talk I will outline a general mathematical model for the
structure of a fitness landscape used in evolutionary computation and
discuss the strengths and weaknesses for this model with respect to
the needs of biologists and the data they have available.  I will
discuss some preliminary work on repackaging algorithms from
evolutionary computation for use with biological data.  I will give some
examples on biological data.


Tuesday, May 15th, 2012
11:00 a.m. - CIT 345

Hannah CarterHannah Carter CCMB Postdoctoral Seminar

Graduate Student - Karchin Lab
Institute for Computational Medicine
Johns Hopkins University

"Identifying driver missense mutations in tumor sequencing data with CHASM"

Large-scale sequencing of cancer genomes is uncovering thousands of DNA alterations, but the functional relevance of the majority of these mutations to tumorigenesis is unknown. Identifying which of these mutations contribute to cancer is critical for understanding tumor biology, and for finding new diagnostic biomarkers and therapeutic targets. We have developed a computational method, called Cancer-specific High-throughput Annotation of Somatic Mutations (CHASM), to identify and prioritize the missense mutations most likely to generate functional changes in proteins that enhance tumor cell proliferation. CHASM uses a supervised machine learning technique called a Random Forest and more than 80 quantitative features describing amino acid changes to predict candidate driver mutations. The method has high sensitivity and specificity when discriminating between known driver missense mutations and randomly generated missense mutations, and performs well relative to other computational methods applied to this problem. CHASM has been applied to over 15 tumor sequencing studies to prioritize missense mutations for further study and initial results are promising; however, further experimental validation is needed to confirm CHASM predictions.


Tuesday, May 29th, 2012
4:00 p.m. - SWIG Boardroom (CIT 241)

Joachim KrugJoachim Krug


Professor, Institute for Theoretical Physics
University of Cologne

"Evolutionary accessibility and genetic architecture"

The adaptive dynamics of a population in the space of genotypes is constrained by epistatic interactions between
mutations at different genetic loci. Recent empirical studies have shown that this strongly reduces the number of evolutionary trajectories that are accessible under the common conditions of weak mutation and strong selection. In this talk I will describe several statistical models for fitness landscapes that quantify evolutionary accessibility under different assumptions on the amount of epistasis as well as on the underlying genetic architecture. Comparison of model predictions to empirical data provides an estimate of the overall amount of sign epistasis but does not, so far, allow to distinguish between different models. The talk is based on joint work with Jasper Franke, Martijn Schenk, Ivan Szendro and Arjan de Visser. 



Fall Semester 2011-2012

Wednesday, November 30th, 2011
4:00 p.m. - SWIG Boardroom (CIT 241)

Nicola NerettiNicola Neretti


Brown University
Molecular Biology, Cell Biology, and Biochemistry 

"Computational Biology of Transcriptional Networks in Aging"

Aging is a universal phenomenon and one of the most complex phenotypes: it takes different forms in different species, individuals, and tissues, and its mechanisms are multiple, complex and stochastic in nature. I will discuss several computational and experimental approaches we have applied to study the effects of specific genetic and environmental interventions that extend life span in model organisms. We have developed a collection of bioinformatics pipelines to investigate potential mechanisms of life span extension, extract aging biomarkers from high throughput gene expression datasets, and identify novel longevity genes. I will then present a novel statistical algorithm for the comparison of ranked lists of genes we have used to find conserved genetic signatures across genetic, environmental and drug interventions in multiple species.

Wednesday, October 5th, 2011
4:00 p.m. - SWIG Boardroom (CIT 241)

  Eric NawrockiEric Nawrocki

Howard Hughes Medical Institute
Eddy/Rivas Lab - Janelia Farm Research Campus

"Structural RNA homology search and alignment"

Some RNAs do not encode proteins, but rather function directly as
RNAs. Many of these RNAs form stable, evolutionarily conserved
three-dimensional structures that are crucial to their functions in
various fundamental cellular processes including protein synthesis,
gene expression, splicing, protein transport, and more. Two important
problems in RNA sequence analysis are (1) given known homologs of an RNA
family, to find other homologs by searching databases (homology search)
and (2) computing accurate multiple sequence alignments of homologous

When searching for and aligning structural RNAs it is useful to score
both primary sequence and secondary structure similarity. Covariance
models (CMs) are probablistic models well-suited for RNA sequence
analysis that are used by the Rfam database to annotate RNAs in
genomes. However, the high computational complexity of CM dynamic
programming alignment algorithms makes CM methods slow, and has
limited their practical application. We have implemented several
acceleration strategies involving banded dynamic programming and HMM
filtering which accelerate CM alignment by 1-3 orders of magnitude and
CM homology search by 2-3 orders of magnitude while sacrificing very
little sensitivity.