X-chromosomal and autosomal data from the Human Genome Diversity Panel, analyzed in S Ramachandran, NA Rosenberg, MW Feldman, and J Wakeley (2008), "Population differentiation and migration: coalescence times in a two-sex island model for autosomal and X-linked loci". Theor Pop Biol. Vol. 74:291-301

  • readme [txt]
  • Plot of fraction of heterozygous loci out of those loci with non-missing data, from 36 X-linked loci genotyped in 1064 individuals. [pdf]
  • Archive of X-linked and autosomal genotype data files used in this study. [tar archive] [zip archive]
  • Note: the archives generate a new directory when extracted, and also include the readme and plot available above.

Outbreak data, analyzed in KF Smith et al. (2014), "Global rise in human infectious disease outbreaks". J Roy Soc Interface Vol. 11: 20140950

  • Outbreak data: including README and Excel workbook [zip archive]
  • Sample disease parser: including README, as well as example input and output files [zip archive]

A new approach for inferring population size changes over time, developed and implemented in JA Palacios, J Wakeley, and S Ramachandran (2015) "Bayesian nonparametric inference of population size changes from sequential genealogies". Genetics Vol. 201: 281-304

  • R code: including a dynamic document compiled using the knitr R package and test data. [zip archive]

Phoneme data for 2082 languages, analyzed in N Creanza et al. (2015) "A comparison of worldwide phonemic and genetic variation in human populations" Proc Natl Acad Sci USA Vol. 112: 1265-1272

  • includes: README, presence-absence data for 728 phonemes in 2082 languages, along with metadata for languages studied and phonemes compiled by Merritt Ruhlen. [zip archive]

pong: fast analysis and visualization of latent clusters in population genetic data

pong is a freely available software package, released by Behr et al. (2016, Bioinformatics), for post-processing output from clustering inference using population genetic data. It combines a a network-graphical approach for analyzing and visualizing membership in latent clusters with an interactive D3.js-based visualization. pong outpaces current solutions by more than an order of magnitude in runtime while providing a user-friendly, interactive visualization of population structure that is more accurate than those produced by current tools. Thus, pong enables unprecedented levels of scale and accuracy in the analysis of population structure from multilocus genotype data.

pong requires Python 2.7 and a modern web browser (e.g. Chrome, Firefox, Safari). pong is not compatible with Internet Explorer. pong is hosted on PyPI and can thus be easily installed with pip by running:

pip install pong


PEGASUS: the Precise, Efficient Gene Association Score Using SNPs

PEGASUS is a freely available software package, released by Nakka et al. (in press, Genetics), for combining SNP-level p-values into gene scores and conducting gene-level association tests with a phenotype of interest. PEGASUS computes gene scores of association analytically and produces gene scores with as much as 10 orders of magnitude higher numerical precision than competing methods.

PEGASUS requires Perl 5, R (3.0.2 or higher), PLINK (1.07; 1.9 beta 3, 7 Jun is also okay) , and the R packages corpcor and CompQuadForm.


  • the PEGASUS git repository, which contains source code, example data, and a README prototyping analyzing the example data with PEGASUS.