This dataset is expected to be useful for a variety of purposes
including software and workflow demonstration and development of probe-level analysis methods for making genotype
calls from probe intensity data.
The dataset consists of 48 samples, each on both the Nsp and Sty arrays (so a total of 48x2=96 hybridizations).
The samples consist of thirteen trios (5 HapMap CEPH trios, 5 HapMap Yoruban trios and three other non-HapMap trios)
and 9 unrelated HapMap Asian samples. In total 39 of the 48 samples are part of the samples use in the
International HapMap Project.
Of particular use is the fact that the HapMap Project has made available a large number of reference genotypes which
can be used in conjunction with this dataset. HapMap data access policy
limits redistribution rights on these genotypes so they cannot be made available directly by Affymetrix, but the reference data
can be downloaded directly from the HapMap Project. As of HapMap release 16c1,
a total of about 124,624 SNPs have reference genotypes available for the samples shared here (65,246 SNPs for Nsp and 59,378 SNPs for Sty).
These numbers are steadily increasing with each HapMap update.
The details of the analysis method used by GTYPE to determine genotype calls based on probe intensity data have been published in
Bioinformatics.
The dataset has been split into 13 parts for convenient download. These can be unzipped on top of one another. The file with the word ?base? in the filename is required, the other 12 zip files each contain distinct collections of chip data and users wanting to download only a subset of the data may pick a subset of these zips.
The data is provided in two versions. Each version contains the same data but in different file formats. Version 1 (in table 1) contains raw CEL, CHP and EXP files and is suitable for use outside of the GCOS/GTYPE framework. It is expected to be mainly of interest for users interested in low-level probe analysis. Version 2 (in table 2) contains DTT format files for integration with the GCOS/GTYPE framework and is expected to be mainly of interest for users wishing to integrate the data with these applications.
In either case there is a file named README.txt provided in the 'base' file with detailed instructions on how to use the data. Md5 checksums are provided in the tables below for verification of the integrity of downloaded data.
|