Affymetrix® Historical CHP Data File Format

CHP FILE

Description

The CHP file of the format described in this document was used to store expression, resequencing and genotyping results from algorithms implemented in the MAS5 and GCOS 1.1 software applications.

Format

The format of the CHP file is a binary file where values are stored in little-endian format. Defined below is for CHP files generated by algorithm in MAS 5.0 and above. CHP files generated in older versions of the MAS software are not described below.

The file contents are defined by:

Item Description Type
1 Label. Always set to "GeneChip Sequence File". char[22]
2 Version number. 12 for MAS 5 CHP files, 13 for GCOS 1.0 CHP files. integer
3 Algorithm name length. integer
4 Algorithm name. char[ length defined above]
5 Algorithm version length. integer
6 Algorithm version. char[ length defined above]
7 Algorithm parameters length. integer
8 Algorithm parameters (the string is a space delimited TAG=VALUE list of algorithm parameters). char[ length defined above]
9 Algorithm summary length. integer
10 Algorithm summary values (the string is a space delmited TAG=VALUE list of summary values). An example of a summary value is "RawQ". char[ length defined above]
11 The number of rows (of features) in the array. integer
12 The number of columns (of features) in the array. integer
13 The number of probe sets defined in the array. integer
14 A value that defines the maximum probe set number in the array. The probe set number is not displayed or used by the software and will be deprecated in future versions. integer
15 The number of QC probe sets stored in the file. integer
16 The probe set number for each probe set in the array. This information is not displayed or used by the software and will be deprecated in future versions. integer[ number of probe sets]
17 The number of probe pairs in each probe set. If the max probe set number is greater than the number of probe sets, the value is 0 for the array entries after the number of probe sets. integer[ maximum probe set number]
18 An indicator to the type of probe set (expression, genotyping, etc.).  See the definition of UnitType in the CDF section. If the max probe set number is greater than the number of probe sets, the value is 0 for the array entries after the number of probe sets. integer[ maximum probe set number]
19 The number of probes per element of a probe set. This value is 2 for expression tiles indicating probe pairs. integer[ number of probe sets]
20 Probe array type char[ 256]
21 Parent CEL file name. char[ 256]
22 Programmatic identifier length. integer
23 Programmatic identifier of the COM component that implements the algorithm. char[ length defined above]
24 Probe set results. This item is repeated for each probe set in the array. see table below.
25 Resequencing array results see table below.
26 QC probe set information. This item is repeated for each QC probe set in the array. see table below.

Probe Set Results (Expression)

The following table defines the results for an expression probe set.

Item Description Type
1 The number of probe pairs. integer
2 The number of probe pairs used by the algorithm. integer
3 Unused* integer
4 The number of probe pairs used by the algorithm (repeated). integer
5 Unused* integer
6 Unused* integer
7 Unused* integer
8 Detection p-value float
9 Unused* float
10 Signal float
11 Detection (0 - present, 1 - marginal, 2 - absent, 3 - no call). integer
12 Probe pair data, repeated for each probe pair in the set:

Background - float
Flag indicating if the pair is used in the analysis (0 - not used, 2 - used) - integer
PM X coordinate - integer (version 12), unsigned short (version 13)
PM Y coordinate - integer (version 12), unsigned short (version 13)
PM intensity - float*
PM stdev - float*
PM pixel count - integer*
PM masked - char*
PM outlier - char*
MM X coordinate - integer (version 12), unsigned short (version 13)
MM Y coordinate - integer (version 12), unsigned short (version 13)
MM intensity - float*
MM stdev - float*
MM pixel count - integer*
MM masked - char*
MM outlier - char*

see description
13 Comparison analysis results exists flag. integer
14 Comparison analysis results (only if exists).

Number of probe pairs in common - integer
Unused* - integer
Unused* - integer
Unused* - integer
Change (1 - increase, 2 - decrease, 3 - moderate increase, 4 - moderate decrease, 5 - no change, 6 - no call) - integer
Baseline absent flag - char
Unused*  - char
Unused* - integer
Unused* - integer
Signal log ratio high - float**
Unused - integer
Unused* - integer
Signal log ratio - float**
Unused* - integer
Signal log ratio low - float**
Change p-value  - float ** (version 12) or float (version 13)

see description

* Only stored in version 12 files.

** The floating point value is stored as an integer. Divide the integer value by 1000 to obtain the floating point value.

Probe Set Results (Genotyping)

The following table defines the results for a genotyping probe set.

Item Description Type
1 The number of blocks in the probe set. integer
2 Block information, repeated for each block, consisting of:

unused - integer
unused string length - integer
unused string - char[ length defined above]
unused - char
unused - integer[3]

see description
3 Flag indicating if results exist. If this value is 0 then skip to item #20 (items 4-19 will not be stored for the probe set). unsigned char
4 Unused integer
5 Unused string length. integer
6 Unused string. char[ length defined above]
7 Unused string length. integer
8 Unused string. char[ length defined above]
9 Unused string length. integer
10 Unused string. char[ length defined above]
11 Unused integer
12 Unused integer
13 The allele call (6 - AA, 7 - BB, 8 - AB, 9 - AB_A, 10 - AB_B, 11 - no call) unsigned char
14 Confidence float ** (version 12) or float (version 13)
15 Unused float
16 Unused float
17 Unused float
18 RAS1 float
19 RAS2 float
20 Unused string length integer
21 Unused string char[ length defined above]
22 Unused string length integer
23 Unused string char[ length defined above]
24 Number of probe pairs integer
25 Probe pair data, repeated for each probe pair in the set:

Background - float**
PM X coordinate - integer (version 12), unsigned short (version 13)
PM Y coordinate - integer (version 12), unsigned short (version 13)
PM intensity - float* **
PM stdev - float* **
PM pixel count - integer*
PM masked - char*
PM outlier - char*
MM X coordinate - integer (version 12), unsigned short (version 13)
MM Y coordinate - integer (version 12), unsigned short (version 13)
MM intensity - float* **
MM stdev - float* **
MM pixel count - integer*
MM masked - char*
MM outlier - char*

see description

* Only stored in version 12 files.

** The floating point value is stored as an integer. Divide the integer value by 1000 to obtain the floating point value.

Probe Set Results (Resequencing)

There are no probe set results for resequencing analyses.

Resequencing Array Results

The following table defines the results for the combination of all probe sets for a resequencing array.

Item Description Type
1 Sequence length. integer
2 Sequence result.* char[ length defined above]
3 Empty storage intended to store manual edits to the sequence result.* char[ length defined above]
4 Array of confidence values. Each  value is stored as an 8 bit signed value.* char[ length defined above]
5 An array to store the probe set index which resulted in the base call. These values are not used by the CustomSeq analysis algorithm.* short[ length defined above]
6 Algorithm name length.* integer
7 The algorithm name.* char[ length defined above]
8 Algorithm parameter length. This value is only stored if the algorithm name length is greater than 0.* integer
9 The algorithm parameters stored as a space delimited TAG=VALUE list of parameters. This string is only stored if the parameter length is greater than 0.* char[ length defined above]
10 Array of confidence values. Each value is stored as a 32 bit signed floating point  value.* float[ length defined above]

* This item is only defined in the file if the sequence length is greater than 0.

QC Probe Set Information

The following table defines information for a QC probe set.

Item Description Type
1 Number of probes in the set. integer
2 The type of QC probe set. See the CDF file section for details on the type. integer
3 Probe information, repeated for each probe in the set, consisting of:

X coordinate on array - integer
Y coordinate on array - integer
Intensity value from CEL file - float
Stdev value from CEL file - float
Number of pixels from CEL file - integer
Background value - float

see description

Types used are defined as: integer (A 32-bit signed integer), float (An 32-bit floating-point number), short (16-bit signed integer) and char (8-bit character).