home login register your profile contact        
Affymetrix
Products Support Analysis Scientific Community Corporate Careers Shop Affymetrix Japan
BY PRODUCT
Affymetrix Support - GeneChip Arrays GeneChip Arrays
Affymetrix Support - Assays and Reagents Assays & Reagents
Affymetrix Support - Instruments Instruments
Affymetrix Support - Software Software
BY SUPPORT TYPE
Affymetrix Support - Technical 
            Documentation Technical Documentation
Affymetrix Support - Sample Data Data Resource Center
Affymetrix Support - Assay Panel Files Assay Panel Files
Affymetrix Support - NetAffx Annotation Files Annotation Files
Affymetrix Support - Library Files Library Files
Affymetrix Support - Sample Data Software Downloads
Affymetrix Support - Fluidics Scripts Fluidics Scripts
Affymetrix Support - Mask Files Mask Files
Affymetrix Support - Array Comparisons Array Comparisons
Affymetrix Support - Product Updates Product Updates
Affymetrix Support - Affymetrix Software Developer's Network Developers' Network
Affymetrix Support - GeneChip Compatible Partners - Software GeneChip Compatible Software
Affymetrix Support - Third Party Tools - Supported by Affymetrix Affymetrix Tools
Affymetrix Learning Center - Online Training LEARNING CENTER
Learning Center, Train on Affymetrix Tools and Instruments Learning Center Overview
Learning Center, Command Console Software Series Command Console®
Learning Center, Newark NJ - Data Analysis Workshops Data Analysis Workshops
Learning Center, Genotyping Console Software Series Genotyping Console™ 2.1
Learning Center, Genotyping Console Software Series Genotyping Console™
Learning Center, Genotyping Console Software Series NetAffx® Learning Center
Learning Center, GTYPE 4.1 Software Overview Mapping 500k Assay
Learning Center, GTYPE 4.1 Software Overview WT Assay Tutorial
Tiling Analysis Software Tutorial Tiling Analysis Software Tutorial
Learning Center, Expression Data Analysis Series Expression Data
Analysis Series
SERVICE SUPPORT
Ordering Information
Affymetrix Support - Instument Installation Instrument Installation
Service Contracts
Affymetrix Services - List of Service Providers Service Providers
Affymetrix Services - Email Technical Support E-mail Technical Support
Affymetrix Services - FTP Secure File Exchange Secure File Exchange

Affymetrix® Analysis Data Model
The Affymetrix® Analysis Data Model (AADM) is the relational database schema Affymetrix uses to store experiment results. It includes tables to support mapping and expression results.

Affymetrix publishes AADM to support open access to experiment information generated and managed by Affymetrix software so the results may be filtered and mined with compatible analysis tools.


 

1. Definitions, Acronyms, Abbreviations
2. General Description
2.1. Overview
2.2. Schema
2.2.1. Schema, Physical Design
2.2.2. Array Design Sub-Schema, Physical View
2.2.3. Experiment Setup Sub-Schema, Physical View
2.2.4. Analysis Results Sub-Schema, Physical View
2.2.5. Protocol Parameters, Physical View
3. Chip Design Tables
3.1. CHIP_DESIGN
3.2. ANALYSIS_SCHEME
3.3. UNIT_TYPE
3.4. SCHEME_UNIT
3.5. SCHEME_BLOCK
3.6. SCHEME_ATOM
3.7. SCHEME_CELL
3.8. BIOLOGICAL_ITEM
4. Experiment Setup Tables
4.1. EXPERIMENT
4.2. TARGET
4.3. TARGET_TYPE
4.4. PHYSICAL_CHIP
5. Analysis Results Tables
5.1. ANALYSIS
5.2. ANALYSIS_DATA_SET_COLLECTION
5.3. ANALYSIS_DATA_SET
5.4. ANALYSIS_DATA_SET_TYPE
5.5. ANALYSIS_ALGORITHM
5.6. ALGORITHM_TYPE
5.7. MEASUREMENT_ELEMENT_RESULT
5.8. ABS_GENE_EXPR_RESULT
5.9. ABS_GENE_EXPR_RES_STAT
5.10. ABS_GENE_EXPR_RESULT_TYPE
5.11. ABS_GENE_EXPR_ATOM_RESULT
5.12. REL_GENE_EXPR_RESULT
5.13. REL_GENE_EXPR_RES_STAT
5.14. REL_GENE_EXPR_RESULT_TYPE
5.15. GENOTYPE_RESULT_TYPE
5.16. GENOTYPE_RESULT_MAPPING_10K
5.17. GENOTYPE_RESULT_MAPPING_100K
6. Protocol Parameter Tables
6.1. PROTOCOL
6.2. PARAMETER
6.3. PROTOCOL_TEMPLATE
6.4. PARAMETER_TEMPLATE
6.5. TEMPLATE_TYPE
6.6. PARAMETER_UNITS
6.7. PARAMETER_TYPE
7. Example Queries
7.1. Analysis Results
7.2. Protocol Parameters
7.3. Experiment Setup
8. Creation Scripts
8.1. Oracle® 8.1.7
8.2. SQL Server 2000
9. Demo Data
9.1 Oracle® 8.1.7
9.2 SQL Server 2000


 

1. Definitions, Acronyms, Abbreviations

Term/Acronym Definition/Description

AK

Alternate Key

Antisense

An "antisense" oligonucleotide is designed to be complementary to an expressed sequence within a sample.

Atom

An atom (probe pair for expression arrays) is a set of cells that are used to interrogate a base position.

Block

A block is a subset of a unit. The cells of a block have similar characteristics.

CDF

Chip Description File. A type of library file used by the Affymetrix® Microarray Suite or GCOS software.

CEL

The file that contains cell intensities.

Cell

The smallest division of a chip. It is an area on the chip that has the same sequences. A cell contains several thousand copies of a probe sequence. Also called a "feature".

CHP

The file that contains analysis results for a chip. Sometimes called the "chip" file.

Control Cell

A cell that is used for quality control, grid alignment or other non-expression-level purposes.

DAT

The file that contains pixel intensities; the image file.

ERwin®

A methodology for drawing Entity Relationship diagrams in data modeling and database design.

EST

Expressed Sequence Tag.

FK

Foreign Key

Probe

A probe is a single-stranded nucleic acid sequence.

Probe Array

Means the same as chip. A physical device used to detect specific DNA or RNA sequences in a sample.

Sense

A ?sense? oligonucleotide is designed to be the same sequence as an expressed sequence within a sample.

Target

A target is a single stranded DNA or RNA sequence that is interrogated by the probe array. The targets are extracted from the samples that are being studied.

Unit

A unit is a subset of a chip. A unit usually contains cells that have some similar characteristics. A unit is known as a ?probe set? in the expression assay


back to top

2. General Description

2.1. Database Schema Overview

The database schema can be divided into four related sub-schemas as illustrated below.

2.1.1. Chip Design

Holds data equivalent to the CDF (Library) File. The Chip Design contains an overall chip description: the chip name, the number of rows and columns of cells, the number of units, etc. The unit description contains the number of blocks in the unit, whether the direction of the unit (sense or anti-sense), etc.

2.1.2. Experiment Setup

Holds information on the chip used and the target applied in any experiment analyzed with GCOS software.

2.1.3. Analysis Results

Stores results from any expression and genotyping (mapping) analysis experiment, including Cell Intensities, Absolute Gene Expression and Comparative Gene Expression.

2.1.4. Protocol Parameters

Contains any parameters that are captured during target preparation, experiment setup, and chip analysis. Tables as well as their constituent fields within each of these modules are detailed below.

back to top

2.2. Schema

Entity-relationship diagrams for the database appear on the next pages. One diagram shows all the tables. Another diagram shows how the tables are grouped into four sub-schemas. Additional diagrams show the four sub-schemas.

The schema diagrams were made using the ERwin® software from Logic Works, Inc. The diagram above illustrates how the schema diagrams are interpreted. A CHIP_DESIGN has zero, one, or more PHYSICAL_CHIPS. The dashed connecting line indicates that the relationship is non-identifying relationship.


2.2.1. Schema, Physical View

back to top

2.2.2. Array Design Sub-Schema, Physical View

back to top

2.2.3. Experiment Setup Sub-Schema, Physical View

back to top

2.2.4. Analysis Results Sub-Schema, Physical View

back to top

2.2.5. Protocol Parameters, Physical View

back to top

3. Chip Design Tables

3.1. CHIP_DESIGN

CHIP_DESIGN contains data describing the physical layout of a chip.

Column

Definition/Description

ID

Primary key.

NAME

The name of the probe array type or chip type. Same as "NAME" in the ANALYSIS_SCHEME table.

NUMBER_X

The number of cells along the X axis.

NUMBER_Y

The number of cells along the Y axis.


back to top

3.2. ANALYSIS_SCHEME

Logical layout of a chip type. A logical layout consists of a hierarchical assembly of units, blocks, atoms, and cells, each of which is detailed in a separate table.

Column

Definition/Description

ID

The primary key.

CHIP_DESIGN_ID

Foreign key to the CHIP_DESIGN table.

NAME

The name of the probe array type, or chip type. Same as "NAME" in the CHIP_DESIGN table.


back to top

3.5. UNIT_TYPE

There exist several unit types; each will have a record in this table. Table can be used as a mapping between the unit type name and the internal unit type ID.

Column

Definition/Description

ID

The primary key.

NAME

The name of the unit ("Expression"). The name of the unit is used to describe the purpose of the unit.


back to top

3.4. SCHEME_UNIT

Contains one record for each unit defined for the probe array type.

Column

Definition/Description

SCHEME_ID

Foreign key to the ANALYSIS_SCHEME table.

UNIT_IDX

Index number for the unit. Ranges from 1 to the total number of units on the array.

TYPE_ID

Foreign key to the UNIT_TYPE table.

NAME

The name of the unit. For mapping units, this is the name of the marker. A value of "NONE" indicates an unnamed unit.

DIRECTION

The direction (sense or anti-sense) the unit interrogates. This is not used for mapping units.

MUTATION_ID

The field is not used.


back to top

3.5. SCHEME_BLOCK

Contains one record for every block on the chip. For gene expression units, there is exactly one block to each unit. Each gene expression block interrogates the activity of a single probe set. For mapping units, there may be one or more blocks, the total are used to interrogate a marker.

Column

Definition/Description

SCHEME_ID

Foreign key to the ANALYSIS_SCHEME table.

UNIT_IDX

Index number for the unit. Ranges from 1 to the total number of units on the array.

BLOCK_IDX

Index number for blocks. Ranges from 1 to the number of blocks in the unit.

ITEM_ID

Foreign key to the BIOLOGICAL_ITEM table.


back to top

3.6. SCHEME_ATOM

Contains one record for every atom on the chip. For gene expression units, there are a variable number of atoms to each block.

Column

Definition/Description

SCHEME_ID

Foreign key to the ANALYSIS_SCHEME table.

UNIT_IDX

Index number for the unit. Ranges from 1 to the total number of units on the array.

BLOCK_IDX

Index number for blocks. Ranges from 1 to the number of blocks in the unit.

ATOM_IDX

Index number for atoms. Ranges from 1 to the number of atoms in the block.

POSITION

The substitution position of the probe.

TBASE

The target base at the substitution position.

ATOM_NO

The atom number, which gives positional information within the unit.


back to top

3.7. SCHEME_CELL

Contains one record for every cell on the chip. For gene expression units, there are two cells in each atom.

Column

Definition/Description

SCHEME_ID

Foreign key to the ANALYSIS_SCHEME table.

UNIT_IDX

Index number for the unit. Ranges from 1 to the total number of units on the array.

BLOCK_IDX

Index number for blocks. Ranges from 1 to the number of blocks in the unit.

ATOM_IDX

Index number for atoms. Ranges from 1 to the number of atoms in the block.

CELL_IDX

Index number for cells. Ranges from 1 to the number of cells in the atom.

LOCATION_X

The x-coordinate of the cell.

LOCATION_Y

The y-coordinate of the cell.

PBASE

The probe base at the substitution position.

FEATURE

A string that describes some aspect of the probe.

QUALIFIER

An additional string that describes some aspect of the probe.

PROBE_LENGTH

Number of bases making up the probe.

FLAG

A bit wise flag ? bit 1 is set if the cell?s probe is a perfect match and unset if the probe is a mismatch.


back to top

3.8. BIOLOGICAL_ITEM

Holds probe set names and marker names interrogated by all array types.

Column

Definition/Description

ID

Primary key.

ITEM_NAME

The name of a probe set or marker.

 

4. Experiment Setup Tables

4.1. EXPERIMENT

Contains one record for each experiment run, that is, whenever a DAT (image) file is produced. Ties together information on the chip used, the target applied, and the parameters captured.

Column

Definition/Description

ID

Primary key.

PROTOCOL_ID

Foreign key to the PROTOCOL table.

TARGET_ID

Foreign key to the TARGET table.

PHYSICAL_CHIP_ID

Foreign key to the PHYSICAL_CHIP table.

DAT_FILE_NAME

The full UNC path of the DAT file on the GCOS server.

NAME

The name of the experiment.


back to top

4.2. TARGET

Describes target applied to a physical chip in an experiment.

Column

Definition/Description

ID

Primary key.

TARGET_TYPE_ID

Foreign key to the TARGET_TYPE table.

PROTOCOL_ID

Foreign key to the PROTOCOL table.

CONCENTRATION

Not used in the current system.

DATE_PREPARED

Not used in the current system.

PREPARED_BY

The name of the person (users NT logon name) who prepared the target.


back to top

4.3. TARGET_TYPE

Describes all target types, for example, ?blood?, ?saliva?, etc.

Column

Definition/Description

ID

Primary key.

NAME

The "Sample Type" as captured in the GCOS system.


back to top

4.4. PHYSICAL_CHIP

Describes the actual, physical chip on which a target was applied.

Column

Definition/Description

ID

Primary key.

DESIGN_ID

Foreign key to the CHIP_DESIGN table.

EXPIRATION_DATE

Not used in the current system.


back to top

5. Analysis Results Tables

5.1. ANALYSIS

Contains one record for each GCOS analysis run (whenever a CEL or CHP file is produced). Ties together information on the data being analyzed, the algorithm used, the results, and the parameters captured.

Column

Definition/Description

ID

Primary key.

ALGORITHM_ID

Foreign key to the ANALYSIS_ALGORITHM table.

PROTOCOL_ID

Foreign key to the PROTOCOL table.

ANALYST_ID

Name of the person (NT logon name) who performed the analysis.

SCHEME_ID

Foreign key to the ANALYSIS_SCHEME table.

ANALYSIS_DATE

Date of the analysis.

DATA_SET_COLLECTION_ID

Foreign key to the ANALYSIS_DATA_SET_COLLECTION table.

NAME

The name of the analysis.


back to top

5.2. ANALYSIS_DATA_SET_COLLECTION

Provides a many-to-many relationship between an analysis (table ANALYSIS) and the data upon which an analysis can be run (table ANALYSIS_DATA_SET).

Column

Definition/Description

ID

Primary key.


back to top

5.3. ANALYSIS_DATA_SET

Provides foreign key IDs to the data upon which an analysis is run. The data can either be an experiment or an earlier analysis. The data type (experiment or analysis) and therefore the field used (EXPT_ID or ANALYSIS_ID) is determined by the TYPE_ID field.

Column

Definition/Description

ID

Primary key.

COLLECTION_ID

Foreign key to the ANALYSIS_DATA_SET_COLLECTION table.

TYPE_ID

Foreign key ANALYSIS_DATA_SET_TYPE table.

ANALYSIS_ID

Foreign key to the ANALYSIS table.

EXPT_ID

Foreign key to the EXPERIMENT table.


back to top

5.4. ANALYSIS_DATA_SET_TYPE

Describes the data type. Has exactly two records ? one for experiment data type and one for analysis data type.

Column

Definition/Description

ID

Primary key.

NAME

The name of the data type.


back to top

5.5. ANALYSIS_ALGORITHM

Holds information for algorithms used for analyses.

Column

Definition/Description

ID

Primary key.

NAME

The name of the algorithm.

TYPE_ID

Foreign key to the ALGORITHM_TYPE table.


back to top

5.6. ALGORITHM_TYPE

Algorithms can be one of three types: cell averaging, absolute expression calling, or comparative expression calling. The ALGORITHM_TYPE determines what kind of results are produced and, therefore, which table will store the results. Cell-averaging results are held in MEASUREMENT_ELEMENT_RESULT, absolute expression results in ABS_GENE_EXPR_RESULT, ABS_GENE_EXPR_RES_STAT and ABS_GENE_EXPR_ATOM_RESULT, and comparative expression results in REL_GENE_EXPR_RESULT and REL_GENE_EXPR_RES_STAT.

Column

Definition/Description

ID

Primary key

NAME

The name of the algorithm type.


back to top

5.7. MEASUREMENT_ELEMENT_RESULT

Holds results from a cell averaging analysis, one record for each cell on the chip.

Column

Definition/Description

ANALYSIS_ID

Foreign key to the ANALYSIS table.

LOCATION_X

The x-coordinate of the cell.

LOCATION_Y

The y-coordinate of the cell.

INTENSITY

Calculated intensity value of the cell.

STATISTIC

Standard deviation corresponding to the intensity value.

PIXELS

The number of pixels used in calculating the intensity value.

INTENSITY_ORIG

If the cell intensity has been modified, this field is set to the original calculated intensity, else it is set to ?1.

FLAG

A bit wise flag -- bit 1 is set if the cell has been masked out of the analysis, bit 2 if the cell is determined to be an outlier, and bit 3 if the intensity has been modified.


back to top

5.8. ABS_GENE_EXPR_RESULT

Holds results of an absolute gene expression analysis from the empirical gene expression algorithm. There is one record for each gene on the chip.

Column

Definition/Description

ANALYSIS_ID

Foreign key to the ANALYSIS table.

ITEM_ID

Foreign key to the BIOLOGICAL_ITEM table.

TYPE_ID

Foreign key to the ABS_GENE_EXPR_RESULT_TYPE table.
The call in an absolute analysis that indicates if the transcript was present (P), absent (A), or marginal (M).

NUMBER_POSITIVE

Number of positive probe pairs.

NUMBER_NEGATIVE

Number of negative probe pairs.

NUMBER_USED

Number of probe pairs used in the analysis.

NUMBER_ALL

Total number of probe pairs for the probe set.

AVG_LOG_RATIO

Average log ratio.

PM_EXCESS

Perfect match excess

NUMBER_IN_AVG

Number of probe pairs used in computing the average intensity difference.

MM_EXCESS