Affymetrix® Historical CHP Data File Format
CHP FILE
Description
The CHP file of the format described in this document was used to store expression, resequencing and genotyping results from algorithms implemented in the MAS5 and GCOS 1.1 software applications.
Format
The format of the CHP file is a binary file where values are stored in little-endian format. Defined below is for CHP files generated by algorithm in MAS 5.0 and above. CHP files generated in older versions of the MAS software are not described below.
The file contents are defined by:
| Item | Description | Type |
|---|---|---|
| 1 | Label. Always set to "GeneChip Sequence File". | char[22] |
| 2 | Version number. 12 for MAS 5 CHP files, 13 for GCOS 1.0 CHP files. | integer |
| 3 | Algorithm name length. | integer |
| 4 | Algorithm name. | char[ length defined above] |
| 5 | Algorithm version length. | integer |
| 6 | Algorithm version. | char[ length defined above] |
| 7 | Algorithm parameters length. | integer |
| 8 | Algorithm parameters (the string is a space delimited TAG=VALUE list of algorithm parameters). | char[ length defined above] |
| 9 | Algorithm summary length. | integer |
| 10 | Algorithm summary values (the string is a space delmited TAG=VALUE list of summary values). An example of a summary value is "RawQ". | char[ length defined above] |
| 11 | The number of rows (of features) in the array. | integer |
| 12 | The number of columns (of features) in the array. | integer |
| 13 | The number of probe sets defined in the array. | integer |
| 14 | A value that defines the maximum probe set number in the array. The probe set number is not displayed or used by the software and will be deprecated in future versions. | integer |
| 15 | The number of QC probe sets stored in the file. | integer |
| 16 | The probe set number for each probe set in the array. This information is not displayed or used by the software and will be deprecated in future versions. | integer[ number of probe sets] |
| 17 | The number of probe pairs in each probe set. If the max probe set number is greater than the number of probe sets, the value is 0 for the array entries after the number of probe sets. | integer[ maximum probe set number] |
| 18 | An indicator to the type of probe set (expression, genotyping, etc.). See the definition of UnitType in the CDF section. If the max probe set number is greater than the number of probe sets, the value is 0 for the array entries after the number of probe sets. | integer[ maximum probe set number] |
| 19 | The number of probes per element of a probe set. This value is 2 for expression tiles indicating probe pairs. | integer[ number of probe sets] |
| 20 | Probe array type | char[ 256] |
| 21 | Parent CEL file name. | char[ 256] |
| 22 | Programmatic identifier length. | integer |
| 23 | Programmatic identifier of the COM component that implements the algorithm. | char[ length defined above] |
| 24 | Probe set results. This item is repeated for each probe set in the array. | see table below. |
| 25 | Resequencing array results | see table below. |
| 26 | QC probe set information. This item is repeated for each QC probe set in the array. | see table below. |
Probe Set Results (Expression)
The following table defines the results for an expression probe set.
| Item | Description | Type |
|---|---|---|
| 1 | The number of probe pairs. | integer |
| 2 | The number of probe pairs used by the algorithm. | integer |
| 3 | Unused* | integer |
| 4 | The number of probe pairs used by the algorithm (repeated). | integer |
| 5 | Unused* | integer |
| 6 | Unused* | integer |
| 7 | Unused* | integer |
| 8 | Detection p-value | float |
| 9 | Unused* | float |
| 10 | Signal | float |
| 11 | Detection (0 - present, 1 - marginal, 2 - absent, 3 - no call). | integer |
| 12 | Probe pair data, repeated for each probe pair in
the set: Background - float
|
see description |
| 13 | Comparison analysis results exists flag. | integer |
| 14 | Comparison analysis results (only if exists).
Number of probe pairs in common - integer |
see description |
* Only stored in version 12 files.
** The floating point value is stored as an integer. Divide the integer value by 1000 to obtain the floating point value.
Probe Set Results (Genotyping)
The following table defines the results for a genotyping probe set.
| Item | Description | Type |
|---|---|---|
| 1 | The number of blocks in the probe set. | integer |
| 2 | Block information, repeated for each block,
consisting of: unused - integer |
see description |
| 3 | Flag indicating if results exist. If this value is 0 then skip to item #20 (items 4-19 will not be stored for the probe set). | unsigned char |
| 4 | Unused | integer |
| 5 | Unused string length. | integer |
| 6 | Unused string. | char[ length defined above] |
| 7 | Unused string length. | integer |
| 8 | Unused string. | char[ length defined above] |
| 9 | Unused string length. | integer |
| 10 | Unused string. | char[ length defined above] |
| 11 | Unused | integer |
| 12 | Unused | integer |
| 13 | The allele call (6 - AA, 7 - BB, 8 - AB, 9 - AB_A, 10 - AB_B, 11 - no call) | unsigned char |
| 14 | Confidence | float ** (version 12) or float (version 13) |
| 15 | Unused | float |
| 16 | Unused | float |
| 17 | Unused | float |
| 18 | RAS1 | float |
| 19 | RAS2 | float |
| 20 | Unused string length | integer |
| 21 | Unused string | char[ length defined above] |
| 22 | Unused string length | integer |
| 23 | Unused string | char[ length defined above] |
| 24 | Number of probe pairs | integer |
| 25 | Probe pair data, repeated for each probe pair in
the set: Background - float**
|
see description |
* Only stored in version 12 files.
** The floating point value is stored as an integer. Divide the integer value by 1000 to obtain the floating point value.
Probe Set Results (Resequencing)
There are no probe set results for resequencing analyses.
Resequencing Array Results
The following table defines the results for the combination of all probe sets for a resequencing array.
| Item | Description | Type |
|---|---|---|
| 1 | Sequence length. | integer |
| 2 | Sequence result.* | char[ length defined above] |
| 3 | Empty storage intended to store manual edits to the sequence result.* | char[ length defined above] |
| 4 | Array of confidence values. Each value is stored as an 8 bit signed value.* | char[ length defined above] |
| 5 | An array to store the probe set index which resulted in the base call. These values are not used by the CustomSeq analysis algorithm.* | short[ length defined above] |
| 6 | Algorithm name length.* | integer |
| 7 | The algorithm name.* | char[ length defined above] |
| 8 | Algorithm parameter length. This value is only stored if the algorithm name length is greater than 0.* | integer |
| 9 | The algorithm parameters stored as a space delimited TAG=VALUE list of parameters. This string is only stored if the parameter length is greater than 0.* | char[ length defined above] |
| 10 | Array of confidence values. Each value is stored as a 32 bit signed floating point value.* | float[ length defined above] |
* This item is only defined in the file if the sequence length is greater than 0.
QC Probe Set Information
The following table defines information for a QC probe set.
| Item | Description | Type |
|---|---|---|
| 1 | Number of probes in the set. | integer |
| 2 | The type of QC probe set. See the CDF file section for details on the type. | integer |
| 3 | Probe information, repeated for each probe in
the set, consisting of: X coordinate on array - integer
|
see description |
Types used are defined as: integer (A 32-bit signed integer), float (An 32-bit floating-point number), short (16-bit signed integer) and char (8-bit character).