Sohini Ramachandran Lab: projects

X-chromosomal and autosomal data from the Human Genome Diversity Panel, analyzed in S Ramachandran, NA Rosenberg, MW Feldman, and J Wakeley (2008), "Population differentiation and migration: coalescence times in a two-sex island model for autosomal and X-linked loci". Theor Pop Biol. Vol. 74:291-301

readme [txt]
Plot of fraction of heterozygous loci out of those loci with non-missing data, from 36 X-linked loci genotyped in 1064 individuals. [pdf]
Archive of X-linked and autosomal genotype data files used in this study. [tar archive] [zip archive]
Note: the archives generate a new directory when extracted, and also include the readme and plot available above.

Outbreak data, analyzed in KF Smith et al. (2014), "Global rise in human infectious disease outbreaks". J Roy Soc Interface Vol. 11: 20140950

Outbreak data: including README and Excel workbook [zip archive]
Sample disease parser: including README, as well as example input and output files [zip archive]

A new approach for inferring population size changes over time, developed and implemented in JA Palacios, J Wakeley, and S Ramachandran (2015) "Bayesian nonparametric inference of population size changes from sequential genealogies". Genetics Vol. 201: 281-304

R code: including a dynamic document compiled using the knitr R package and test data. [zip archive]

Phoneme data for 2082 languages, analyzed in N Creanza et al. (2015) "A comparison of worldwide phonemic and genetic variation in human populations" Proc Natl Acad Sci USA Vol. 112: 1265-1272

includes: README, presence-absence data for 728 phonemes in 2082 languages, along with metadata for languages studied and phonemes compiled by Merritt Ruhlen. [zip archive]

pong: fast analysis and visualization of latent clusters in population genetic data

pong is a freely available software package, released by Behr et al. (2016, Bioinformatics), for post-processing output from clustering inference using population genetic data. It combines a a network-graphical approach for analyzing and visualizing membership in latent clusters with an interactive D3.js-based visualization. pong outpaces current solutions by more than an order of magnitude in runtime while providing a user-friendly, interactive visualization of population structure that is more accurate than those produced by current tools. Thus, pong enables unprecedented levels of scale and accuracy in the analysis of population structure from multilocus genotype data.

pong requires Python 3 and a modern web browser. pong is hosted on PyPI and can thus be easily installed with pip by running:

pip3 install pong

Resources

Behr et al. (2016, Bioinformatics)
pong's git repository.
pong's README: A quick-start guide to installing and running pong.
pong's manual: A manual detailing all options available for customizing pong's algorithm and visualization.
Example dataset: Using data from the 1000 Genomes Project Phase3 (2,426 individuals), we performed 8 runs of ADMIXTURE at each value of K from K=2 to K=8.

Feel free to open an issue if you run into any problems or have questions. We actively monitor user inquiries and tend to fix problems if they arise.

PEGASUS: the Precise, Efficient Gene Association Score Using SNPs

PEGASUS is a freely available software package, released by Nakka et al. (2016, Genetics), for combining SNP-level p-values into gene scores and conducting gene-level association tests with a phenotype of interest. PEGASUS computes gene scores of association analytically and produces gene scores with as much as 10 orders of magnitude higher numerical precision than competing methods.

PEGASUS requires Perl 5, R (3.0.2 or higher), PLINK (1.07; 1.9 beta 3, 7 Jun is also okay) , and the R packages corpcor and CompQuadForm.

Resources

the PEGASUS git repository, which contains source code, example data, and a README prototyping analyzing the example data with PEGASUS.
the PEGASUS_flies git repository, which allows PEGASUS to be run on Drosophila population-genetic data using gene annotations from the Drosophila Genome Research Project. PEGASUS_flies was released as part of Spierer et al. (2020 preprint).

SWIF(r): SWeep Inference Framework (controlling for correlation)

SWIF(r) is freely available software, released by Sugden et al. (2018, Nature Communications), for calculating SNP-based probabilities of adaptation based on training simulations from a demographic model. Code for training and running SWIF(r), as well as for calibrating the probabilistic output and visualizing learned distributions can be found at the SWIF(r) git repository.

SWIF(r) requires Python v2.7, Matplotlib v1.7, SciPy v0.16, and Scikit-learn v0.17.

Resources

the SWIF(r) git repository, which contains example training data and output, code for training and running SWIF(r), and for calibrating probabilistic output and visualizing trained distributions.

WINGS: Ward clustering to identify Internal Node branch length outliers using Gene Scores

WINGS is freely available software, released by McGuirl, Smith et al. (2020, Genetics), for identifying groups of phenotypes sharing a core set of genes enriched for mutations in cases. Code in MATLAB for running WINGS can be found at the WINGS git repository.

WINGS requires MATLAB with the Statistics and Machine Learning Toolbox

Resources

the WINGS git repository

pong: fast analysis and visualization of latent clusters in population genetic data

Resources

PEGASUS: the Precise, Efficient Gene Association Score Using SNPs

Resources

SWIF(r): SWeep Inference Framework (controlling for co*r*relation)

Resources

WINGS: Ward clustering to identify Internal Node branch length outliers using Gene Scores

Resources

SWIF(r): SWeep Inference Framework (controlling for correlation)