Affymetrix® XDA CHP Data File Format

CHP FILE

Description

The CHP file of the format described in this document was used to store expression, resequencing and genotyping results from algorithms implemented in the GCOS 1.2, 1.3 and 1.4 and BRLMM Analysis Tool software applications.

Format

The format of this CHP file is a binary file created for faster access and smaller file size. The values in the file are stored in little-endian format.

The file contents are defined by:

Item Description Type
1 Magic number. Always set to 65. integer
2 Version number. integer
3 The number of columns of cells on the array. unsigned short
4 The number of rows of cells on the array. unsigned short
5 The number of units in the array not including QC units. The term unit is an internal term which means probe set. integer
6 The number of QC units. integer
7 Results type. Values are:

0 - Expression
1 - Genotyping
2 - Resequencing
3 - Universal (tag arrays)

integer
8 Prog ID string length. integer
9 Prog ID of COM component that created the file. char [length defined above]
10 CEL file length. integer
11 CEL file used to create the CHP file. char [length defined above]
12 Probe array type string length. integer
13 Probe array type. char [length defined above]
14 Algorithm string length. integer
15 Algorithm name. char [length defined above]
16 Algorithm version string length. integer
17 Algorithm version. char [length defined above]
18 Number of algorithm parameters. integer
19 Algorithm parameters. The following is defined for each parameter:

Parameter name length - integer
Parameter name - char [length defined above]
Parameter value length - integer
Parameter value - char [length defined above]

see description
20 Number of summary statistic parameters. integer
21 Summary statistics (such as "RawQ"). The following is defined for each statistic:

Statistic name length - integer
Statistic name - char [length defined above]
Statistic value length - integer
Statistic value - char [length defined above]

see description
22 Total number of zones used for the background calculation. integer
23 The smooth factor used in the background calculation. float
24 Background zone information. This following items are repeated for each zone:

The features X coordinate - float
The features Y coordinate - float
The background value - float

see description
25 Size, in bytes, of each results object stored in the file. integer
26 Resequencing results (only stored in file if the results type is set to resequencing (2)):

Called sequence length - integer
Called sequence - char [length defined above]
Base call scores - float [length defined above]

The remaining items are stored in version 2 and above files.

Force calls length - integer
Force calls array [length defined above] : The force calls are an array of:

Index to the called sequence array (zero based) - integer
Force call at the index position - char
Reason for the force call - unsigned char

Original calls length - integer
Original calls array [length defined above] : The original calls array is an array of base calls called by the algorithm that were replaced by user edited calls. The edited calls are stored in the called sequence array while the original call is stored here. The original calls are an array of:

Index to the called sequence array (zero based) - integer
Call at the index position - char

see description
27 An indicator to the type of expression analysis (only stored in file if the results type is set to expression (0)). The possible values are:

0 - Absolute analysis using the empirical algorithm (MAS 4).
1 - Comparison analysis using the empirical algorithm (MAS4).
2 - Absolute analysis using the statistical (MAS5) or PLIER algorithms.
3 - Comparison analysis using the statistical (MAS5) or PLIER algorithms.

unsigned char
28 Expression results for each unit (only stored in file if the results type is set to expression (0)):

Non-empirical results:

Detection - unsigned char
Detection p-value - float
Signal - float
Number of pairs - unsigned short
Number of pairs used - unsigned short
Change - unsigned char*
Change p-value - float*
Signal Log Ratio - float*
Signal Log Ratio Low - float*
Signal Log Ratio High - float*
Common Pairs - unsigned short*

* only stored if comparison analysis was performed.

see description
29 Genotyping results for each unit (only stored in file if the results type is set to genotyping (1)):

Call - unsigned char (see above for numeric values)
Confidence - float
RAS1 (for 10K arrays) or p-value AA call (for 100K arrays) - float
RAS2 (for 10K arrays) or p-value AB call (for 100K arrays) - float
p-value BB call (for 100K arrays) - float
p-value No Call call (for 100K arrays) - float

see description
30 Tag array results for each unit (only stored in file if the results type is set to universal (3)):

Background value - float

see description