| |
| AK | Alternate Key |
| Antisense | An "antisense" oligonucleotide
is designed to be complementary to an expressed sequence within a sample. |
| Atom | An atom (probe pair for expression
arrays) is a set of cells that are used to interrogate a base position. |
| Block | A block is a subset of a unit.
The cells of a block have similar characteristics. | | CDF | Chip Description File. A type of
library file used by the Affymetrix® Microarray Suite or GCOS software. |
| CEL | The file that contains cell intensities. |
| Cell | The smallest division of a chip.
It is an area on the chip that has the same sequences. A cell contains several
thousand copies of a probe sequence. Also called a "feature". |
| CHP | The file that contains analysis
results for a chip. Sometimes called the "chip" file. | | Control Cell | A cell that is used for quality
control, grid alignment or other non-expression-level purposes. |
| DAT | The file that contains pixel intensities;
the image file. | | ERwin® | A methodology for drawing Entity
Relationship diagrams in data modeling and database design. |
| EST | Expressed Sequence Tag. |
| FK | Foreign Key |
| Probe | A probe is a single-stranded nucleic
acid sequence. | | Probe Array | Means the same as chip. A physical
device used to detect specific DNA or RNA sequences in a sample. |
| Sense | A ?sense? oligonucleotide is designed
to be the same sequence as an expressed sequence within a sample. |
| Target | A target is a single stranded DNA
or RNA sequence that is interrogated by the probe array. The targets are extracted
from the samples that are being studied. | | Unit | A unit is a subset of a chip. A
unit usually contains cells that have some similar characteristics. A unit is
known as a ?probe set? in the expression assay |
 | | 2.
General Description 2.1.
Database Schema Overview
| The database schema can be divided
into four related sub-schemas as illustrated below. | 2.1.1.
Chip Design
| Holds data equivalent to the CDF (Library)
File. The Chip Design contains an overall chip description: the chip name, the
number of rows and columns of cells, the number of units, etc. The unit description
contains the number of blocks in the unit, whether the direction of the unit (sense
or anti-sense), etc. | 2.1.2.
Experiment Setup
| Holds information on the chip used
and the target applied in any experiment analyzed with GCOS software. |
2.1.3. Analysis
Results
| Stores results from any expression and genotyping (mapping)
analysis experiment, including Cell Intensities, Absolute Gene Expression and
Comparative Gene Expression. | 2.1.4.
Protocol Parameters
| Contains any parameters that are captured
during target preparation, experiment setup, and chip analysis. Tables as well
as their constituent fields within each of these modules are detailed below. |

2.2.
Schema
Entity-relationship diagrams for the database
appear on the next pages. One diagram shows all the tables. Another diagram shows
how the tables are grouped into four sub-schemas. Additional diagrams show the
four sub-schemas. The schema diagrams were made using the ERwin®
software from Logic Works, Inc. The diagram above illustrates how the schema diagrams
are interpreted. A CHIP_DESIGN has zero, one, or more PHYSICAL_CHIPS. The dashed
connecting line indicates that the relationship is non-identifying relationship. |

2.2.1. Schema,
Physical View 
2.2.2.
Array Design Sub-Schema, Physical View 
2.2.3.
Experiment Setup Sub-Schema, Physical View 
2.2.4.
Analysis Results Sub-Schema, Physical View 
2.2.5.
Protocol Parameters, Physical View 
3.
Chip Design Tables 3.1.
CHIP_DESIGN
| CHIP_DESIGN contains data describing
the physical layout of a chip. |
| Column | Definition/Description | | ID | Primary key. |
| NAME | The name of the probe array type
or chip type. Same as "NAME" in the ANALYSIS_SCHEME table. |
| NUMBER_X | The number of cells along the X
axis. | | NUMBER_Y | The number of cells along the Y
axis. | 3.2.
ANALYSIS_SCHEME
| Logical layout of a chip type. A logical
layout consists of a hierarchical assembly of units, blocks, atoms, and cells,
each of which is detailed in a separate table. |
| Column | Definition/Description | | ID | The primary key. |
| CHIP_DESIGN_ID | Foreign key to the CHIP_DESIGN table. | | NAME | The name of the probe array type,
or chip type. Same as "NAME" in the CHIP_DESIGN table. |
3.5.
UNIT_TYPE
| There exist several unit types; each
will have a record in this table. Table can be used as a mapping between the unit
type name and the internal unit type ID. |
| Column | Definition/Description | | ID | The primary key. |
| NAME | The name of the unit ("Expression").
The name of the unit is used to describe the purpose of the unit. |
3.4.
SCHEME_UNIT
| Contains one record for each unit
defined for the probe array type. |
| Column | Definition/Description | | SCHEME_ID | Foreign key to the ANALYSIS_SCHEME table. |
| UNIT_IDX | Index number for the unit. Ranges
from 1 to the total number of units on the array. | | TYPE_ID | Foreign key to the UNIT_TYPE table. | | NAME | The name of the unit. For mapping
units, this is the name of the marker. A value of "NONE" indicates an unnamed
unit. | | DIRECTION | The direction (sense or anti-sense)
the unit interrogates. This is not used for mapping units. |
| MUTATION_ID | The field is not used. |
3.5.
SCHEME_BLOCK
| Contains one record for every block
on the chip. For gene expression units, there is exactly one block to each unit.
Each gene expression block interrogates the activity of a single probe set. For
mapping units, there may be one or more blocks, the total are used to interrogate
a marker. |
| Column | Definition/Description | | SCHEME_ID | Foreign key to the ANALYSIS_SCHEME table. |
| UNIT_IDX | Index number for the unit. Ranges
from 1 to the total number of units on the array. | | BLOCK_IDX | Index number for blocks. Ranges from
1 to the number of blocks in the unit. | | ITEM_ID | Foreign key to the BIOLOGICAL_ITEM table. |
3.6.
SCHEME_ATOM
| Contains one record for every atom
on the chip. For gene expression units, there are a variable number of atoms to
each block. |
| Column | Definition/Description | | SCHEME_ID | Foreign key to the ANALYSIS_SCHEME table. |
| UNIT_IDX | Index number for the unit. Ranges
from 1 to the total number of units on the array. | | BLOCK_IDX | Index number for blocks. Ranges from
1 to the number of blocks in the unit. | | ATOM_IDX | Index number for atoms. Ranges from
1 to the number of atoms in the block. | | POSITION | The substitution position of the
probe. | | TBASE | The target base at the substitution
position. | | ATOM_NO | The atom number, which gives positional
information within the unit. | 3.7.
SCHEME_CELL
| Contains one record for every cell
on the chip. For gene expression units, there are two cells in each atom. |
| Column | Definition/Description | | SCHEME_ID | Foreign key to the ANALYSIS_SCHEME table. |
| UNIT_IDX | Index number for the unit. Ranges
from 1 to the total number of units on the array. | | BLOCK_IDX | Index number for blocks. Ranges from
1 to the number of blocks in the unit. | | ATOM_IDX | Index number for atoms. Ranges from
1 to the number of atoms in the block. | | CELL_IDX | Index number for cells. Ranges from
1 to the number of cells in the atom. | | LOCATION_X | The x-coordinate of the cell. |
| LOCATION_Y | The y-coordinate of the cell. |
| PBASE | The probe base at the substitution
position. | | FEATURE | A string that describes some aspect
of the probe. | | QUALIFIER | An additional string that describes
some aspect of the probe. | | PROBE_LENGTH | Number of bases making up the probe. |
| FLAG | A bit wise flag ? bit 1 is set if
the cell?s probe is a perfect match and unset if the probe is a mismatch. |
3.8.
BIOLOGICAL_ITEM
| Holds probe set names and marker names
interrogated by all array types. |
| Column | Definition/Description | | ID | Primary key. | | ITEM_NAME | The name of a probe set or marker. |
4.
Experiment Setup Tables 4.1.
EXPERIMENT
| Contains one record for each experiment
run, that is, whenever a DAT (image) file is produced. Ties together information
on the chip used, the target applied, and the parameters captured. |
| Column | Definition/Description | | ID | Primary key. | | PROTOCOL_ID | Foreign key to the PROTOCOL table. | | TARGET_ID | Foreign key to the TARGET table. | | PHYSICAL_CHIP_ID | Foreign key to the PHYSICAL_CHIP table. |
| DAT_FILE_NAME | The full UNC path of the DAT file
on the GCOS server. | | NAME | The name of the experiment. |
4.2.
TARGET
| Describes target applied to a physical chip
in an experiment. |
| Column | Definition/Description | | ID | Primary key. | | TARGET_TYPE_ID | Foreign key to the TARGET_TYPE table. | | PROTOCOL_ID | Foreign key to the PROTOCOL table. | | CONCENTRATION | Not used in the current system. |
| DATE_PREPARED | Not used in the current system. |
| PREPARED_BY | The name of the person (users NT
logon name) who prepared the target. | 4.3.
TARGET_TYPE
| Describes all target types, for example,
?blood?, ?saliva?, etc. |
| Column | Definition/Description | | ID | Primary key. | | NAME | The "Sample Type" as captured in
the GCOS system. | 4.4.
PHYSICAL_CHIP
| Describes the actual, physical chip
on which a target was applied. |
| Column | Definition/Description | | ID | Primary key. | | DESIGN_ID | Foreign key to the CHIP_DESIGN table. | | EXPIRATION_DATE | Not used in the current system. |
5.
Analysis Results Tables 5.1.
ANALYSIS
| Contains one record for each GCOS analysis run (whenever a CEL or CHP file is produced). Ties together information
on the data being analyzed, the algorithm used, the results, and the parameters
captured. |
| Column | Definition/Description | | ID | Primary key. | | ALGORITHM_ID | Foreign key to the ANALYSIS_ALGORITHM table. |
| PROTOCOL_ID | Foreign key to the PROTOCOL table. | | ANALYST_ID | Name of the person (NT logon name)
who performed the analysis. | | SCHEME_ID | Foreign key to the ANALYSIS_SCHEME table. |
| ANALYSIS_DATE | Date of the analysis. |
| DATA_SET_COLLECTION_ID | Foreign key to the ANALYSIS_DATA_SET_COLLECTION
table. | | NAME | The name of the analysis. |
5.2.
ANALYSIS_DATA_SET_COLLECTION
| Provides a many-to-many relationship
between an analysis (table ANALYSIS) and the data upon which an analysis
can be run (table ANALYSIS_DATA_SET). |
| Column | Definition/Description | | ID | Primary key. |
5.3.
ANALYSIS_DATA_SET
| Provides foreign key IDs to the data
upon which an analysis is run. The data can either be an experiment or an earlier
analysis. The data type (experiment or analysis) and therefore the field used
(EXPT_ID or ANALYSIS_ID) is determined by the TYPE_ID field. |
5.4.
ANALYSIS_DATA_SET_TYPE
| Describes the data type. Has exactly
two records ? one for experiment data type and one for analysis data type. |
| Column | Definition/Description | | ID | Primary key. | | NAME | The name of the data type. |
5.5.
ANALYSIS_ALGORITHM
| Holds information for algorithms used
for analyses. | | Column | Definition/Description | | ID | Primary key. | | NAME | The name of the algorithm. |
| TYPE_ID | Foreign key to the ALGORITHM_TYPE table. |
5.6.
ALGORITHM_TYPE
| Column | Definition/Description | | ID | Primary key |
| NAME | The name of the algorithm type. |
5.7.
MEASUREMENT_ELEMENT_RESULT
| Holds results from a cell averaging
analysis, one record for each cell on the chip. |
| Column | Definition/Description | | ANALYSIS_ID | Foreign key to the ANALYSIS table. | | LOCATION_X | The x-coordinate of the cell. |
| LOCATION_Y | The y-coordinate of the cell. |
| INTENSITY | Calculated intensity value of the
cell. | | STATISTIC | Standard deviation corresponding
to the intensity value. | | PIXELS | The number of pixels used in calculating
the intensity value. | | INTENSITY_ORIG | If the cell intensity has been modified,
this field is set to the original calculated intensity, else it is set to ?1. |
| FLAG | A bit wise flag -- bit 1 is set if
the cell has been masked out of the analysis, bit 2 if the cell is determined
to be an outlier, and bit 3 if the intensity has been modified. |
5.8.
ABS_GENE_EXPR_RESULT
| Holds results of an absolute gene
expression analysis from the empirical gene expression algorithm. There is one
record for each gene on the chip. |
| Column | Definition/Description | | ANALYSIS_ID | Foreign key to the ANALYSIS table. | | ITEM_ID | Foreign key to the BIOLOGICAL_ITEM table. |
| TYPE_ID | Foreign key to the ABS_GENE_EXPR_RESULT_TYPE table.
The call in an absolute analysis that indicates if the transcript was present
(P), absent (A), or marginal (M). | | NUMBER_POSITIVE | Number of positive probe pairs. |
| NUMBER_NEGATIVE | Number of negative probe pairs. |
| NUMBER_USED | Number of probe pairs used in the
analysis. | | NUMBER_ALL | Total number of probe pairs for the
probe set. | | AVG_LOG_RATIO | Average log ratio. |
| PM_EXCESS | Perfect match excess |
| NUMBER_IN_AVG | Number of probe pairs used in computing
the average intensity difference. | | MM_EXCESS | | |