Research in the lab lies in the fields of evolutionary biology and population genetics. We use mathematical modeling, applied statistical methods, and computer simulations to make inferences about aspects of population histories from extant individuals’ genetic variation. Most projects in the lab fall under the following themes:
Genotypes in extant humans contain signatures of events throughout our history as a species. Using genotype data from individuals living today, we are interested in identifying human population structure and how it correlates with covariates such as language spoken, geographic location, and climate-related variables; to this end we are currently developing software for post-hoc analysis of model-based clustering results from genotype data. We are currently developing new methods for the inference of population size changes over time using coalescent theory and modeling recombination.
A central goal in population genetics is to identify loci and genetic pathways with adaptive mutations. We are developing interpretable methods for localizing genomic sites of adaptive mutations, and for computing evidence of selection at the gene and pathway level. These methods draw on machine learning classification techniques, as well as hidden markov models in order to leverage the correlations among summary statistics commonly used to measure signatures of selection. We are interested in inferring the relative roles of various modes of selection (e.g., positive, balancing, background) in shaping human genomic variation.
The current GWAS framework assumes single mutations of large effect increase disease susceptibility. However, complex phenotypes may be caused by multiple mutations in a single gene or pathway. In collaboration with the Raphael Lab at Princeton University, we developed a new method for gene-level association tests that accounts for empirical LD, and more accurately identifies genes and pathways underlying common diseases and phenotypes. We are now focused on developing multiple model-based approaches to determine the genetic architecture of complex traits at the gene and pathway levels, in collaboration with the Crawford Lab at Brown University.
Genome-wide association studies (GWAS) have identified thousands of significant genetic associations in humans across a number of complex traits; however, the vast majority of these studies have been conducted in datasets of predominantly European ancestry (Popejoy & Fullerton 2016 Nature). It has generally been thought that complex trait genetic architecture should be transferable across populations of different ancestries; but recent work has shown a number of differences between ethnic groups, including heterogeneity in both the causal variants being discovered and in the effect size estimates for many overlapping variants (Martin et al. 2017 AJHG, Wojcik et al. 2017 bioRxiv). In light of these results, we have begun to investigate the polygenic architecture of multiple traits in multi-ethnic datasets, as well as formal approaches to properly conduct multi-ethnic GWAS. We are also investigating how our original assumptions regarding polygenic architecture might have influenced, and confounded, current GWAS approaches in the context of multi-ethnic differences.
The X chromosome is a particularly interesting chromosome due to its haplo-diploid existence in human populations; males carry one X chromosome, inherited from their mothers, while females carry two X chromosomes. Differences in patterns of genetic variation among the X and autosomes may reflect past differences between males and females in demographic parameters such as population size and migration rate; X-linked genes likely experience different levels of selection when in males compared to females. Currently we are interested in investigating the relative roles of demographic processes and natural selection in shaping X-linked genetic variation across human population.
We have devised a new, graph-theoretic approach to the problem of inferring and analyzing population structure from multilocus genotype data. We have implemented this new algorithm in a software package pong (Behr et al. 2016) which significantly outperforms current tools (e.g., CLUMPP by Jakobsson and Rosenberg, 2007) in terms of quality of results, computational time (<1s vs. minutes to hours for most datasets), and ease of use.
pong’s front-end offers an interactive visualization tool using D3.js, a very powerful and increasingly popular [web-based] data-driven visualization library. The resulting software package, with its dramatic increase in algorithmic efficiency and new dynamic visualization tool, will be useful in molecular ecology, landscape genetic, and population genetic studies.