Affymetrix® CEL Data File Format

CEL FILE

Description

The CEL file stores the results of the intensity calculations on the pixel values of the DAT file. This includes an intensity value, standard deviation of the intensity, the number of pixels used to calculate the intensity value, a flag to indicate an outlier as calculated by the algorithm and a user defined flag indicating the feature should be excluded from future analysis. The file stores the previously stated data for each feature on the probe array. The information below will describe the following versions:

  • Version 3 is generated by the MAS software. This was also known as the ASCII version.
  • Version 4 is generated by the GCOS software. This was also known as the binary or XDA version.
  • Command Console version 1 is generated by the Command Console software. This is stored in the Command Console "generic" data file format.

Version 3 Format

The format of the CEL file is an ASCII text file similar to the Windows INI format.

The file is divided up into sections. The start of each section is defined by a line containing a section name enclosed in square braces. The section names are: "CEL", "HEADER", "INTENSITY", "MASKS", "OUTLIERS" and "MODIFIED". The data in each section is of the format TAG=VALUE.

The "CEL" section contains the version number of the file. The TAGS are:

TAGDescription
VersionThe version number. Always set to 3.

The "HEADER" section contains miscellaneous header information. The TAGS are:

TAGDescription
ColsThe number of columns in the array (of cells).
RowsThe number of rows in the array (of cells).
TotalXSame as Cols.
TotalYSame as Rows.
OffsetXNot used, always 0.
OffsetYNot used, always 0.
GridCornerULXY coordinates of the upper left grid corner in pixel coordinates.
GridCornerURXY coordinates of the upper right grid corner in pixel coordinates.
GridCornerLRXY coordinates of the lower right grid corner in pixel coordinates.
GridCornerLLXY coordinates of the lower left grid corner in pixel coordinates.
Axis-invertXNot used, always 0.
AxisInvertYNot used, always 0.
swapXYNot used, always 0.
DatHeaderThe header from the DAT file.
AlgorithmThe algorithm name used to create the CEL file.
AlgorithmParametersThe parameters used by the algorithm. The format is TAG:VALUE pairs separated by semi-colons or TAG=VALUE pairs separated by spaces.

The "INTENSITY" section contains intensity information. The TAGS are:

TAGDescription
NumberCellsThe total number of cells in the array (Rows*Cols)
CellHeaderThe header for the remainder of the data in this section.
The header is always set to: "X Y MEAN STDV NPIXELS"
NA The remaining lines in this section contain the intensity, standard deviation value and the number of pixels used to compute the intensity value for each cell in the array. The order is defined by the header.

The "MASKS" section specifies which cells have been masked by the user. The TAGS are:

TAGDescription
NumberCellsThe number of masked cells.
CellHeaderThe header for the remainder of the data in this section. The header is always set to: "X Y".
NAThe remaining lines in this section contain the XY coordinates of those cells masked by the user.

The "OUTLIERS" section specifies which cells were called outliers by the software. The TAGS are:

TAGDescription
NumberCellsThe number of outlier cells.
CellHeaderThe header for the remainder of the data in this section. The header is always set to: "X Y".
NAThe remaining lines in this section contain the XY coordinates of those cells called outliers by the software.

The "MODIFIED" section specifies which cells were modified by the user. This feature was dropped in MAS 4 thus the number of cells in this section should always be 0. The TAGS are:

TAGDescription
NumberCellsThe number of outlier cells.
CellHeaderThe header for the remainder of the data in this section. The header is always set to: "X Y ORIGMEAN".
NAThe remaining lines in this section contain the XY coordinates and the original intensity value (calculated by the software) of those cells modified by the user.

Version 4 Format

The format of the CEL file is a binary file where values are stored in little-endian format.

The file contents are defined by:

ItemDescriptionType
1Magic number. Always set to 64.integer
2 Version number. Always set to 4.integer
3 Number of columns.integer
4Number of rows. integer
5Number of cells (rows*cols). integer
6 Header lengthinteger
7Header as defined in the HEADER section of the version 3 CEL files. The string contains TAG=VALUE separated by a space where the TAG names are defined in the version 3 HEADER section. char[ length defined above]
8 Algorithm name length.integer
9The algorithm name used to create the CEL file.char[ length defined above]
10Algorithm parameters length. integer
11The parameters used by the algorithm. The format is TAG:VALUE pairs separated by semi-colons or TAG=VALUE pairs separated by spaces. char[ length defined above]
12Cell margin used for computing the cells intensity value. integer
13Number of outlier cells.DWORD
14Number of masked cells. DWORD
15Number of sub-grids. integer
16Cell entries - this consists of an intensity value, standard deviation value and pixel count for each cell in the array.

The values are stored by row then column starting with the X=0, Y=0 cell. As an example, the first five entries are for cells defined by XY coordinates: (0,0), (1,0), (2,0), (3,0), (4,0).< /p>

(float, float, short)
17Masked entries - this consists of the XY coordinates of those cells masked by the user. (short, short)
18Outlier entries - this consists of the XY coordinates of those cells called outliers by the software. (short, short)
19Sub-grid entries - This is the sub-grid definition. There are as many sub-grids in the file as defined by the number of sub-grids above. Each sub-grid is defined as:

- row number (integer)
- column number (integer)
- upper left x coordinate in pixels (float)
- upper left y coordinate in pixels (float)
- upper right x coordinate in pixels (float)
- upper right x coordinate in pixels (float)
- lower left x coordinate in pixels (float)
- lower left y coordinate in pixels (float)
- lower right x coordinate in pixels (float)
- lower right x coordinate in pixels (float)
- left cell position (integer)
- top cell position (integer)
- right cell position (integer)
- bottom cell position (integer)

(integer, integer, float, float, float, float, float, float, float, float, integer , integer , integer , integer )

Types used are defined as: integer (A 32-bit signed integer), DWORD (32-bit unsigned integer), float (An 32-bit floating-point number), short (16-bit signed integer).

Command Console Format

The format of the CEL file generated by the Command Console software uses the Command Console generic data format. The following describes the data sets and groups in the file.

The generic data header shall include:

The data type identifier is set to "affymetrix-calvin-intensity"

The parameters are dependent on the algorithm used to create the CEL file. For the percentile algorithm these include the following parameters:

Parameter Name Definition
affymetrix-algorithm-param-Percentile The percentile value used.
affymetrix-algorithm-param-CellMargin The number of pixels around the border to ignore.
affymetrix-algorithm-param-OutlierHigh The high threshold for the outlier calculation.
affymetrix-algorithm-param-OutlierLow The low threshold for the outlier calculation.
affymetrix-algorithm-param-GridULX The X coordinate of the upper left corner of the global grid.
affymetrix-algorithm-param-GridULY The Y coordinate of the upper left corner of the global grid.
affymetrix-algorithm-param-GridURX The X coordinate of the upper right corner of the global grid.
affymetrix-algorithm-param-GridURX The Y coordinate of the upper right corner of the global grid.
affymetrix-algorithm-param-GridLRX The X coordinate of the lower right corner of the global grid.
affymetrix-algorithm-param-GridLRX The Y coordinate of the lower right corner of the global grid.
affymetrix-algorithm-param-GridLLX The X coordinate of the lower left corner of the global grid.
affymetrix-algorithm-param-GridLLX The Y coordinate of the lower left corner of the global grid.

Other parameters include:

Parameter Name Definition
affymetrix-array-type The probe array type
affymetrix-algorithm-name The name of the algorithm.
affymetrix-cel-cols The number of columns of features (cells)
affymetrix-cel-rows The number of rows of features (cells)
affymetrix-file-version File version.

The DAT file parameters (if available) will be stored within the parent data header object.

The intensity data is stored in a single group called Default Group with 5 data sets. The data sets are defined as:

Data Set Name Description Number of Columns Column Name Column Type Description
Intensity The intensity values for each feature. 1 Intensity FLOAT The intensity value. The row order is the same as defined in the GCOS CEL file.
StdDev The standard deviations of the intensity values. 1 StdDev FLOAT The standard deviation value. The row order is the same as defined in the GCOS CEL file.
Pixel The number of pixels used to calculate the intensity values. 1 Pixel SHORT SHORT - The pixel count value. The row order is the same as defined in the GCOS CEL file.
Outlier The X/Y coordinates of those features called as outliers by the algorithm. 2 X

Y

SHORT

SHORT

The X coordinate of the outlier cell.

The Y coordinate of the outlier cell.

Mask The X/Y coordinates of the user masked features. 2 X

Y

SHORT

SHORT

The X coordinate of the outlier cell.

The Y coordinate of the outlier cell.