(Version 1.0) The data files generated according to this description were used for the paper "Population differentiation and migration: coalescence times in a two-sex island model for autosomal and X-linked loci" by S Ramachandran, NA Rosenberg, MW Feldman, and J Wakeley (Theor Pop Biol, 2008, Vol. 74: 291-301). Created by S Ramachandran, Aug 12, 2008 --------------------------------------- HGDP.Xaut.datafiles.tar is an archive containing: 1. combined_aut_1048.stru (HGDP autosomal data - 1048 individuals, 783 microsatellites) 2. combined_aut_952.stru (HGDP autosomal data - 952 individuals, 783 microsatellites) 3. combined_X_1048.stru (HGDP X-chromosomal data - 1048 individuals, 36 microsatellites) 4. combined_X_952.stru (HGDP X-chromosomal data - 952 individuals, 36 microsatellites) Common traits among files 1-4: The files are in structure format; the first line contains locus names. All other rows represent individual data. Each row also contains six columns of labels preceding the genotype data. Col I: sex of individual (1 is male, 2 is female) Col II: HGDP individual ID number Col III: numeric code for population Col IV: name of population Col V: country of origin Col VI: geographic region of origin The rest of the columns represent individual genotype data. Each individual is represented by two rows of data; -9 indicates missing data, and we code males as hemizygous at X-linked loci by a second row of genotype data that is entirely missing. Note all Bantu individuals (Bantus from Kenya and from southern Africa) are grouped into one Bantu population in these files, with population code 999. ------------------------------------------------- Inference of sex: We took the Screening Set 10 and 52 Diversity Genotype STRP files from Marshfield's website (http://research.marshfieldclinic.org/genetics/GenotypingData_Statistics/humanDiversityPanel.asp). In these files, diploid genotypes are given for all individuals, including males at X-linked loci. Using the X-linked loci from both files, we counted the fraction of loci where each individual was heterozygous, across the 36 loci in the two screening sets. No individual had more than 15 loci with missing data; at least 21 loci were included in the calculation for each individual. Sex labels are given by Marshfield in both files. The Screening Set 52 sex labels are an update of the Set 10 sex labels that correct most earlier labeling errors. All individuals labeled as male (denoted by a 1) in the Set 52 file, except two (#139 and #920), were heterozygous at <15% of the loci at which they had scores. All individuals labeled as female in the Set 52 file, except one (#1239), were heterozygous at >19% of loci at which they had scores. We examined Y-chromosomal data for these same individuals from the Marshfield screening sets, also available at the same website. Individual #139 had data for all nine Y-chromosome markers across Set 10 and Set 52, and individual #1239 amplified eight Y-chromosome markers. We concluded these individuals were male. Individual #920 amplified only one of the Y-chromosome markers. We concluded this individual was female. After altering the sex labels of #920 and #1239, all Marshfield Set 52 sex labels agree with the sex inferred on the basis of X-chromosomal genotypes at 294 loci in the study of Conrad et al. (2006), for the 1039 HGDP individuals genotyped for that study. We note that Marshfield's Set 52 sex labels are incorrect for individuals #920 and #1239, but these errors are corrected in the files at http://www.people.fas.harvard.edu/~sramach/datasets.html. A plot of the proportion of X-linked loci where male and female individuals are heterozygous can also be found there; individual #139 is labeled to show that it has a much higher proportion of heterozygous loci on the X chromosome than other males. ------------------------------------------------- The generation of files 1 and 2 (autosomal data): To generate file 1, we used the data from Rosenberg et al. (2005), combinedmicrosats-1048.stru, available at http://rosenberglab.bioinformatics.med.umich.edu/diversity.html. File 2 contains individuals in set H952 from Rosenberg (2006). The individuals in file 1 that are not contained in file 2 are those in Supplementary Table 23 from Rosenberg (2006). The generation of files 3 and 4 (X-chromosomal data): File 3 was generated by combining the X-linked genotype data from Screening Set 10 and 52 Diversity Genotype STRP files on Marshfield's website (http://research.marshfieldclinic.org/genetics/GenotypingData_Statistics/humanDiversityPanel.asp). Homozygous males were made hemizygous, and loci where males were scored as heterozygous were coded as missing data. File 4 contains individuals in set H952 from Rosenberg (2006); the individuals listed in Supplementary Table 23 from Rosenberg et al. (2006) are excluded from this file. ------------------------------------------------- References: Conrad, D.F., Jakobsson, M., Coop, G., Wen, X., Wall, J.D., Rosenberg, N.A., Pritchard, J.K., 2006. A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nature Genet. 38, 1251-1260. Ramachandran, S., Rosenberg, N.A., Feldman, M.W., Wakeley, J. Population differentiation and migration: coalescence times in a two-sex island model for autosomal and X-linked loci. In review at Theor. Pop. Biol. Rosenberg, N.A., Mahajan, S., Ramachandran, S., Zhao, C., Pritchard, J.K., Feldman, M.W., 2005. Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet. 1, e70.