Reverse Ecology:
Computational Integration of Genomes, Organisms, and Environments

IGERT Vision, Goals, Themes, Impacts







Vision, Goals, Themes, Impacts

Reverse Ecology is the application of genomic approaches to organisms, complex traits or ecosystems to uncover the genetic bases of functional variation in nature (Borenstein et al. 2008; Li et al. 2008). This genomic revolution is for ecology, evolution and environmental sciences what “reverse genetics” was for molecular biology and biochemistry decades ago.  If one isolates a unique protein, that protein can be sequenced, reverse translated into a DNA probe and the gene for the protein can be cloned without doing any (Mendelian) genetics.  The regulation of that gene can be explored in the context of organismal function.  In “Reverse Ecology”, by discovering the genetic markers that are associated with an environmental gradient, a particular habitat, or an evolved trait, one can find the targets of natural selection without knowing the details of how selection targeted that trait.  This approach can be successful when one has many random genetic markers distributed across the genome, and only a targeted subset of the markers show strong associations between genotype and phenotype in independent populations or habitats (Wood et al. 2008).  The power of this approach follows from population genetic predictions of the neutral theory of molecular evolution: because gene flow and random genetic drift affect the entire genome, but selection acts on a subset of all genes, genomic regions showing repeatable allelic associations with specific traits are candidates for the action of selection.

Reverse Ecology approaches are transforming how scientists probe the natural world, and the tools are now in place to extend this approach to communities and ecosystems.  The logic of inter-locus contrasts within a genome can be applied to a parallel neutral model in ecology.  Consider a bacterial null model where dispersal is widespread and ‘everything is everywhere’.  If a subset of organisms , or microbial barcodes, change frequency along an environmental gradient, but a different subset of microbial barcodes does not change frequency along this gradient, one can identify ‘non-neutral’ species in the environment (e.g., (Etienne 2009)). For organisms or traits not amenable to laboratory crosses or analysis, one can now undertake a search for the genetic bases of functional differences in the wild, which redefines the meaning of a “model” organism. 

However, accomplishing this goal requires the integration of expertise from multiple disciplines that most ecologists, genomicists, or computational biologist lack individually. Sampling the environment for genomic analysis requires ecological insight that lab scientists have not developed; getting molecular biological analyses to work on environmental samples requires lab skills that field biologists often lack; deciphering false positives from true positives (i.e., distinguishing mere correlation from true causation) among 2 million sequence reads requires computational and statistical controls that most empirical biologists have never encountered.

The IGERT training program proposed here aims to integrate students into a community of experts in genomics, environmental science, and computationally-intensive statistics, training those students to communicate and operate broadly across disciplinary lines while also supporting their in-depth development as experts able to operate at the interfaces of these disciplines.

Intellectual Merit - The Current Challenge

A generation of scientists is emerging from graduate programs across the country where new genomic technologies are applied to biological problems across all scales in nature.  The vast majority of these questions are approached by a rapidly growing array of ‘post-genomic’ technologies that have been spawned by the last decade of genome projects.  The opportunities are many, the tools are expensive and powerful, and the data sets are vast and difficult to interpret.  A huge gap is opening up in the training of fundamentals of how these data sets are generated, interpreted, and integrated into real working knowledge.  Most graduate students today that are doing “genomics” prepare DNA or RNA using a kit purchased from a biotech company.  The recipes of the buffers are not published, and the contents of the special spin columns are not available.   These kits often work well, but when they do not there is little understanding of why.  When the nucleic acid is handed to the Core Facility, the student gets back a huge data file as a spreadsheet.  Commercially available software is used to identify “significant” genes or interactions among genes.  The sequences of these genes are then submitted to a BLAST search, which the student initiates by clicking a link on a Web page.  Results are obtained, and most students have only a minimal working knowledge of the biochemistry, molecular biology, statistical inference, or computational algorithms that have allowed them to obtain their results.  To top it off, these capabilities are spreading in to research groups where the professors or mentors have less experience in these technologies than the students. 

This revolution has, in turn, been upended by the advent of new high-throughput sequencing technologies that are completely changing how genomics is practiced.  These “Next-Generation” short-read sequencing technologies are redefining how reverse ecology is practiced, and have the potential to make all organisms “model organisms” for something.  One can now acquire “deep” sequence information from virtually any (formerly) non-model organism and ask biological questions at virtually any scale: ecological, environmental, physiological, developmental, transcriptional, etc.  Indeed, these new technologies are blurring the intellectual boundaries between ecosystems ecologists, microbial geneticists, biogeochemists, and computational biologists. A common theme that will integrate these disciplines in the next 10 years is a native working knowledge of high throughput DNA sequence data generation, analysis and interpretation.  As detailed below, we stress that this revolution is not just a new technique in search of a question. There are several major biotech firms that have invested huge resources in competing short-read sequencing platforms that have similar, but complementary, applications.  This technology is not going away, and graduate programs that fail to train students how to use these tools efficiently, effectively and creatively will fall behind.

The intellectual merit of The Brown MBL IGERT program in Reverse Ecology lies in the novel ways we will focus these advances in genomic sciences on truly fascinating questions, and empower a cohort of PhDs who can integrate a flood of information into novel insights of how the biosphere functions.  This IGERT will launch a new graduate program that unites the resources from recent, parallel investments in intellectual centers at Brown University and The Marine Biological Laboratory.  At Brown, investment in the Centers for Genomics and Proteomics (CGP), Computational Molecular Biology (CCMB), and the Environmental Change Initiative (ECI) have greatly strengthened the basic life sciences.  At MBL, the Josephine Bay Paul Center (BPC) for Comparative Molecular Biology and Evolution, and the Ecosystem Center (EC) have built world-class research programs in comparative molecular biology and in ecosystems analyses, with remarkable research profiles. 

Brown and MBL entered in to an institutional collaboration five years ago to build new research and educational programs.  The Brown-MBL graduate program allows graduate students in individual Brown departments to move on to MBL labs to complete their graduate training.  While this administrative arrangement is in place, there is no common intellectual theme that unites the small number of students who have exercised this option in the past five years.  The Brown – MBL IGERT program is novel in that it proposes an integrated, cross-disciplinary program at the interface between genomics, environmental science, and computational biology.  The major unifying theme of the training program will be using high throughput genomic technologies to discover functionally important characteristics of natural environments, or Reverse Ecology.  The strong advantage we have in this new arena is a tightly knit group of scientists who can inform one another about how to sample environments for genomic analysis, or which genomic assembly approach makes sense for inferring ecological patterns.  The dissemination of local knowledge can make the reverse ecology endeavor more successful.

Broader Impacts - Why this IGERT is different. IGERT Training programs exist that have components that are similar to aspects of the Reverse Ecology theme we envision.  Our core faculty and the IGERT fellows will engage in topics that are themes in existing IGERTs such as ‘comparative genomics’ and ‘ecological genomics’ and ‘geomicrobiology’ and ‘computational biology’ and ‘systems biology’ and will work in a ‘two institution’ model.  But the vision that unites us is the way our two institutions approach research excellence. Both Brown and MBL are relatively small institutions, and to excel at the graduate research level they have chosen to focus in a few areas of strength.  Brown’s University-College model with flexible curricula and close mentoring is very appealing to students in the sciences.  It is proving equally appealing to new faculty, as they find the level of interaction to be dramatically higher than at larger institutions.  Other programs seek to unite molecular biologists with ecologists, or computer scientists with genomics researchers.  The Brown MBL collaboration allows us to integrate ecologists with genomicists with computer scientists through a web of mutual interactions.  We will bring this highly collaborative, closely mentored training approach to our joint program.

A further novel opportunity of the program is the exposure to distinct cultures that are present in an academic university and an externally funded research institution.  By working in both of these environments, IGERT Fellows will have a better understanding of career choices in the future.  To complement the opportunities, we have engaged two outside institutional partners that will add breadth to our graduate training program.  Brown-MBL IGERT Fellows can work as interns at the J. Craig Venter Institute in Rockville, MD and at the IBM Deep Computing Systems group in Yorktown Heights, NY.  Fellows will be able to participate in environmental and organismal genomics projects that are employing advanced computational and informatics tools. As stated in the Letters of Support, JCVI and IBM are intrigued by our Reverse Ecology theme and see clear opportunities for creative training.  IGERT Fellows thus have an opportunity to bridge the gap between academics and industry and see how science is done in different settings.  This will broaden their understanding of the career options that are open to them.