Scientific Publications
The Expression Console software stores the probe set summarizations (signal, detection, p-values, etc.) from the MAS5, RMA and PLIER algorithms in a binary file with a .CHP extension. This file is referred to as the CHP ("chip") file. There are two main formats of the CHP file which the Expression Console software supports. These are known as the GCOS/XDA format and the Command Console format. Documentation on the file formats are available on the DevNet section of the Affymetrix web site.
The contents of the CHP file depend on the format of the CHP file, the algorithm used (MAS5, RMA, PLIER) and the controls (the set of probe sets designated as 3', 5' or middle spike and housekeeping controls) specified by the user.
The MAS5 algorithm can perform either an absolute analysis or a comparison analysis. The absolute analysis results include a signal, detection call, detection p-value, number of probe pairs in the probe set and the number of probe pairs used for the analysis. The comparison analysis results include those for the absolute analysis plus a change call, change p-value, signal log ratio, signal log ratio low, signal log ratio high, number of probe pairs in common between the two files being analyzed. The RMA and PLIER algorithms generate only a signal value.
The GCOS/XDA file format was designed to store results from the MAS5 algorithm which has a fixed number of columns of results for the absolute and comparison analyses.
The Command Console file format is more flexible in that there can be any number of columns of results, of which each column can be of a variety of data types (integer, float, 8 byte string, 16 byte string as examples). When the Expression Console software uses this format for RMA and PLIER, it only needs to allocate its results table with only two columns - probe set name and signal value.
One thing to note about the CHP files is that the older format (GCOS/XDA) only stores the analysis results, not the probe set names. To obtain the associated probe set names you will need to read either the CDF or PSI files. The PSI file is an ASCII text file with the probe set name and number of probe pairs. This file is smaller and easier to parse than the CDF file. The CDF file contains, in addition to the probe set names, the list of associated features (X/Y feature coordinate on the array and other attributes) for each probe set. The format of this file is either ASCII or binary.
The PSI and CDF files are named using the array type (also known as chip type) with a .PSI or .CDF extension. The array/chip type is stored in the header of the CHP file. Given the full path to the CHP file and the full path of the library directory you can determine the CHP file's associated PSI/CDF file.
The order of the probe sets in the PSI, CDF and CHP are the same. With this you can open each file and use the index of the probe set to join the probe set data in the files.
Parsers in the form of C++ and Java source code are available from Affymetrix to parse the CHP, CDF and PSI files. These parsers, along with sample code and documentation, are contained within the Fusion SDK.
The classes/interfaces provided within the Fusion SDK provide the ability
to parse the different types of CHP files.
There are two main classes to use for reading 3' IVT CHP files generated by the Expression Console software.
These are:
| Class | Description |
|---|---|
FusionCHPLegacyData |
This class provides the interfaces for the "legacy"
type results. The "legacy" results are those values (detection,
p-value, signal, etc.) defined as outputs of the MAS5 algorithm stored in
either the GCOS/XDA format or Command Console format CHP file. |
FusionCHPQuantificationData |
This class provides the interfaces for the "quantification" type results. The "quantification" results are only the probe set name and quantification (signal) value. This type of data is created by the RMA or PLIER algorithm when stored in a Command Console format CHP file. This class also provides parsing capabilities for WT array CHP files. The difference between the two are that the probe set result (ProbeSetQuantificationData class) stores an "id" for WT results and "name" for 3' IVT results. The "name" property will be empty for WT results. |
The following table shows the class to use for each of the different algorithms and CHP file types:
| Algorithm | GCOS/XDA Format CHP File | Command Console Format CHP File |
|---|---|---|
MAS5 |
FusionCHPLegacyData |
FusionCHPLegacyData |
RMA |
FusionCHPQuantificationData |
|
PLIER |
FusionCHPQuantificationData |
The following is an example of C++ code using the Fusion SDK to extract the signal values from a CHP (XDA or Command Console) file. The top level function is ReadCHPFile.
|
#include "FusionCHPData.h"
// header file for reading CHP files. using namespace std; /*! This function extracts the signal value from
the CHP file and probe set name from the PSI file. // The chip type is stored in
the header of the CHP file. // The probe set names are
stored in either the PSI or CDF file. Use the // Now loop over the probe sets
to get the signal values. /*! This function extracts the signal value from the
CHP file and probe set name from the PSI file. /*! This will read the CHP file, determine the type
and extract the results. // Read the CHP file. This
function will read any type of CHP file whose parsers (from Fusion) // The following function will determine
if the CHP file read contains "legacy" format data. This // The following function will determine
if the CHP file read is a "quantification" type file. This file
contains // The CHP file was read, but
not of the type we wanted. |


