Affymetrix® CEL Data File Format
CEL FILE
Description
The CEL file stores the results of the intensity calculations on the pixel values of the DAT file. This includes an intensity value, standard deviation of the intensity, the number of pixels used to calculate the intensity value, a flag to indicate an outlier as calculated by the algorithm and a user defined flag indicating the feature should be excluded from future analysis. The file stores the previously stated data for each feature on the probe array. The information below will describe the following versions:
- Version 3 is generated by the MAS software. This was also known as the ASCII version.
- Version 4 is generated by the GCOS software. This was also known as the binary or XDA version.
- Command Console version 1 is generated by the Command Console software. This is stored in the Command Console "generic" data file format.
The format of the CEL file is an ASCII text file similar to the Windows INI format.
The file is divided up into sections. The start of each section is defined by a line containing a section name enclosed in square braces. The section names are: "CEL", "HEADER", "INTENSITY", "MASKS", "OUTLIERS" and "MODIFIED". The data in each section is of the format TAG=VALUE.
The "CEL" section contains the version number of the file. The TAGS are:
| TAG | Description |
|---|---|
| Version | The version number. Always set to 3. |
The "HEADER" section contains miscellaneous header information. The TAGS are:
| TAG | Description |
|---|---|
| Cols | The number of columns in the array (of cells). |
| Rows | The number of rows in the array (of cells). |
| TotalX | Same as Cols. |
| TotalY | Same as Rows. |
| OffsetX | Not used, always 0. |
| OffsetY | Not used, always 0. |
| GridCornerUL | XY coordinates of the upper left grid corner in pixel coordinates. |
| GridCornerUR | XY coordinates of the upper right grid corner in pixel coordinates. |
| GridCornerLR | XY coordinates of the lower right grid corner in pixel coordinates. |
| GridCornerLL | XY coordinates of the lower left grid corner in pixel coordinates. |
| Axis-invertX | Not used, always 0. |
| AxisInvertY | Not used, always 0. |
| swapXY | Not used, always 0. |
| DatHeader | The header from the DAT file. |
| Algorithm | The algorithm name used to create the CEL file. |
| AlgorithmParameters | The parameters used by the algorithm. The format is TAG:VALUE pairs separated by semi-colons or TAG=VALUE pairs separated by spaces. |
The "INTENSITY" section contains intensity information. The TAGS are:
| TAG | Description |
|---|---|
| NumberCells | The total number of cells in the array (Rows*Cols) |
| CellHeader | The header for the remainder of the
data in this section. The header is always set to: "X Y MEAN STDV NPIXELS" |
| NA | The remaining lines in this section contain the intensity, standard deviation value and the number of pixels used to compute the intensity value for each cell in the array. The order is defined by the header. |
The "MASKS" section specifies which cells have been masked by the user. The TAGS are:
| TAG | Description |
|---|---|
| NumberCells | The number of masked cells. |
| CellHeader | The header for the remainder of the data in this section. The header is always set to: "X Y". |
| NA | The remaining lines in this section contain the XY coordinates of those cells masked by the user. |
The "OUTLIERS" section specifies which cells were called outliers by the software. The TAGS are:
| TAG | Description |
|---|---|
| NumberCells | The number of outlier cells. |
| CellHeader | The header for the remainder of the data in this section. The header is always set to: "X Y". |
| NA | The remaining lines in this section contain the XY coordinates of those cells called outliers by the software. |
The "MODIFIED" section specifies which cells were modified by the user. This feature was dropped in MAS 4 thus the number of cells in this section should always be 0. The TAGS are:
| TAG | Description |
|---|---|
| NumberCells | The number of outlier cells. |
| CellHeader | The header for the remainder of the data in this section. The header is always set to: "X Y ORIGMEAN". |
| NA | The remaining lines in this section contain the XY coordinates and the original intensity value (calculated by the software) of those cells modified by the user. |
The format of the CEL file is a binary file where values are stored in little-endian format.
The file contents are defined by:
| Item | Description | Type |
|---|---|---|
| 1 | Magic number. Always set to 64. | integer |
| 2 | Version number. Always set to 4. | integer |
| 3 | Number of columns. | integer |
| 4 | Number of rows. | integer |
| 5 | Number of cells (rows*cols). | integer |
| 6 | Header length | integer |
| 7 | Header as defined in the HEADER section of the version 3 CEL files. The string contains TAG=VALUE separated by a space where the TAG names are defined in the version 3 HEADER section. | char[ length defined above] |
| 8 | Algorithm name length. | integer |
| 9 | The algorithm name used to create the CEL file. | char[ length defined above] |
| 10 | Algorithm parameters length. | integer |
| 11 | The parameters used by the algorithm. The format is TAG:VALUE pairs separated by semi-colons or TAG=VALUE pairs separated by spaces. | char[ length defined above] |
| 12 | Cell margin used for computing the cells intensity value. | integer |
| 13 | Number of outlier cells. | DWORD |
| 14 | Number of masked cells. | DWORD |
| 15 | Number of sub-grids. | integer |
| 16 | Cell entries - this consists of an intensity
value, standard deviation value and pixel count for each cell in the array. The values are stored by row then column starting with the X=0, Y=0 cell. As an example, the first five entries are for cells defined by XY coordinates: (0,0), (1,0), (2,0), (3,0), (4,0).< /p> | (float, float, short) |
| 17 | Masked entries - this consists of the XY coordinates of those cells masked by the user. | (short, short) |
| 18 | Outlier entries - this consists of the XY coordinates of those cells called outliers by the software. | (short, short) |
| 19 | Sub-grid entries - This is the sub-grid definition.
There are as many sub-grids in the file as defined by the number of sub-grids
above. Each sub-grid is defined as: - row number (integer) | (integer, integer, float, float, float, float, float, float, float, float, integer , integer , integer , integer ) |
Types used are defined as: integer (A 32-bit signed integer), DWORD (32-bit unsigned integer), float (An 32-bit floating-point number), short (16-bit signed integer).
The format of the CEL file generated by the Command Console software uses the Command Console generic data format. The following describes the data sets and groups in the file.
The generic data header shall include:
The data type identifier is set to "affymetrix-calvin-intensity"
The parameters are dependent on the algorithm used to create the CEL file. For the percentile algorithm these include the following parameters:
| Parameter Name | Definition |
|---|---|
| affymetrix-algorithm-param-Percentile | The percentile value used. |
| affymetrix-algorithm-param-CellMargin | The number of pixels around the border to ignore. |
| affymetrix-algorithm-param-OutlierHigh | The high threshold for the outlier calculation. |
| affymetrix-algorithm-param-OutlierLow | The low threshold for the outlier calculation. |
| affymetrix-algorithm-param-GridULX | The X coordinate of the upper left corner of the global grid. |
| affymetrix-algorithm-param-GridULY | The Y coordinate of the upper left corner of the global grid. |
| affymetrix-algorithm-param-GridURX | The X coordinate of the upper right corner of the global grid. |
| affymetrix-algorithm-param-GridURX | The Y coordinate of the upper right corner of the global grid. |
| affymetrix-algorithm-param-GridLRX | The X coordinate of the lower right corner of the global grid. |
| affymetrix-algorithm-param-GridLRX | The Y coordinate of the lower right corner of the global grid. |
| affymetrix-algorithm-param-GridLLX | The X coordinate of the lower left corner of the global grid. |
| affymetrix-algorithm-param-GridLLX | The Y coordinate of the lower left corner of the global grid. |
Other parameters include:
| Parameter Name | Definition |
|---|---|
| affymetrix-array-type | The probe array type |
| affymetrix-algorithm-name | The name of the algorithm. |
| affymetrix-cel-cols | The number of columns of features (cells) |
| affymetrix-cel-rows | The number of rows of features (cells) |
| affymetrix-file-version | File version. |
The DAT file parameters (if available) will be stored within the parent data header object.
The intensity data is stored in a single group called Default Group with 5 data sets. The data sets are defined as:
| Data Set Name | Description | Number of Columns | Column Name | Column Type | Description |
|---|---|---|---|---|---|
| Intensity | The intensity values for each feature. | 1 | Intensity | FLOAT | The intensity value. The row order is the same as defined in the GCOS CEL file. |
| StdDev | The standard deviations of the intensity values. | 1 | StdDev | FLOAT | The standard deviation value. The row order is the same as defined in the GCOS CEL file. |
| Pixel | The number of pixels used to calculate the intensity values. | 1 | Pixel | SHORT | SHORT - The pixel count value. The row order is the same as defined in the GCOS CEL file. |
| Outlier | The X/Y coordinates of those features called as outliers by the algorithm. | 2 | X Y |
SHORT SHORT |
The X coordinate of the outlier cell. The Y coordinate of the outlier cell. |
| Mask | The X/Y coordinates of the user masked features. | 2 | X Y |
SHORT SHORT |
The X coordinate of the outlier cell. The Y coordinate of the outlier cell. |