Latin Square Data for Expression Algorithm Assessment

Human Genome U133 Data Set

This data set consists of 3 technical replicates of 14 separate hybridizations of 42 spiked transcripts in a complex human background at concentrations ranging from 0.125pM to 512pM. Thirty of the spikes are isolated from a human cell line, four spikes are bacterial controls, and eight spikes are artificially engineered sequences believed to be unique in the human genome.

The data set is expected to be useful for the development and comparison of expression analysis methods. Distinct from the Human Genome U95 Data Set below, this data set includes many more spikes, a smaller concentration spike (0.125pM), a larger background population, 18 micron features scanned using the Affymetrix GeneChip® Scanner 3000, and some foreign and artificial clones expected to exhibit little, if any, specific cross-hybridization.

This data sets requires a special, alternate chip description file (CDF), available below, containing information about the eight artificial clones. The exact spiked sequences are found in the Excel file describing the experimental design.

Microarray Suite (MAS) users can analyze this data by downloading the U133 Complete library file listed below.

The DAT files for this data set are not available.

File Name
U133 Description
19 KB
U133 CDF
6.9 MB
U133 Data
141 MB
U133 Probe Tabular
4.2 MB
U133 Complete
14 MB

Human Genome U95 Data Set

The human data set consist of a series of genes spiked-in at known concentrations and arrayed in a Latin Square format. They represent a subset of the data used to develop and validate the Affymetrix Microarray Suite (MAS) 5.0 algorithm.

These data are provided for use in conjunction with data from other groups to establish a set of common or standardized data sets that can be used by the scientific community to develop and validate expression algorithms. For other available data sets, please see: http://www.stat.berkeley.edu/users/terry/zarray/Affy/affy_index.html

The Latin Square design for the human data set consists of 14 spiked-in gene groups in 14 experimental groups. The concentration of the 14 gene groups in the first experiment is 0, 0.25, 0.5, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, and 1024pM. Each subsequent experiment rotates the spike-in concentrations by one group; i.e. experiment 2 begins with 0.25pM and ends at 0pM, on up to experiment 14, which begins with 1024pM and ends with 512pM. Each experiment contains at least 3 replicates. Additional information can be obtained by examining the data files below.

The human data set contains 14 human genes in each of 14 experimental groups. Most groups contain 1 gene. Exceptions are group 1, which contains 2 genes, and group 12, which is empty. Specifically, transcript 407_at listed as present in group 12 is actually included in group 1 (together with 37777_at). Replicates within each group result in a total of 59 CEL files.

Certain probe pairs for transcripts 407_at and 36889_at have been found to perform poorly and should be excluded from the analysis.

The DAT files for this data set are not available.

File Name
U95 Description
38 KB
U95 - Part 1
24 MB
U95 - Part 2
30 MB
U95 - Part 3
32 MB
U95 - Part 4
26 MB
U95 - Part 5
27 MB
U95 - Part 6
25 MB
U95 - Part 7
7 MB
Total Size
~171 MB

