Research in the lab lies in the fields of evolutionary biology and population genetics. We use mathematical modeling, applied statistical methods, and computer simulations to make inferences about aspects of population histories from extant individuals’ genetic variation. Most projects in the lab fall under the following themes:
Genotypes in extant humans contain signatures of events throughout our history as a species. Using genotype data from individuals living today, we are interested in identifying human population structure and how it correlates with covariates such as language spoken, geographic location, and climate-related variables; to this end we are currently developing software for post-hoc analysis of model-based clustering results from genotype data. We are currently developing new methods for the inference of population size changes over time using coalescent theory and modeling recombination.
The X chromosome is a particularly interesting chromosome due to its haplo-diploid existence in human populations; males carry one X chromosome, inherited from their mothers, while females carry two X chromosomes. Differences in patterns of genetic variation among the X and autosomes may reflect past differences between males and females in demographic parameters such as population size and migration rate; X-linked genes likely experience different levels of selection when in males compared to females. Currently we are interested in investigating the relative roles of demographic processes and natural selection in shaping X-linked genetic variation across human population.
A central goal in population genetics is to identify loci with adaptive mutations. Many methods to achieve this goal have limited power due to the use of single statistics associated with genomic signatures of adaptation. Other methods combine statistics to increase power but rely on arbitrary thresholds to classify loci as neutrally evolving versus containing adaptive mutations. We are developing novel, probabilistically interpretable methods to identify and localize sites of adaptive mutations, and we do this conditional on demographic history and using machine learning techniques. We are especially interested in incorporating correlations among statistics used to identify genomic sites of adaptation into our methods.
In collaboration with the Raphael Lab at Brown University, we are developing a new method for scoring genes in case-control genome-wide association studies. The goal of this project is to accurately identify genes and pathways underlying common diseases. The current GWAS framework assumes single mutations increase disease susceptibility, and we are curious about the extent to which genetic heterogeneity in causal mutations underlies common disease phenotypes.
We have devised a new, graph-theoretic approach to the problem of inferring and analyzing population structure from multilocus genotype data. We have implemented this new algorithm in a software package (in prep for release) which significantly outperforms current tools (e.g., CLUMPP by Jakobsson and Rosenberg, 2007) in terms of quality of results, computational time (<1s vs. minutes to hours for most datasets), and ease of use.
To take advantage of the new and improved information this method provides, we are in the process of building a novel interactive visualization tool using D3.js, a very powerful and increasingly popular [web-based] data-driven visualization library. We are very excited about the ability to provide users with much more information than ever before. The resulting software package, with its dramatic increase in algorithmic efficiency and new dynamic visualization tool, will be widely used in both molecular ecology and population genetics.