Expression Console™
Parsing Affymetrix 3' IVT Expression CHP Files

Bookmark and Share

The Expression Console software stores the probe set summarizations (signal, detection, p-values, etc.) from the MAS5, RMA and PLIER algorithms in a binary file with a .CHP extension. This file is referred to as the CHP ("chip") file. There are two main formats of the CHP file which the Expression Console software supports. These are known as the GCOS/XDA format and the Command Console format. Documentation on the file formats are available on the DevNet section of the Affymetrix web  site.

The contents of the CHP file depend on the format of the CHP file, the algorithm used (MAS5, RMA, PLIER) and the controls (the set of probe sets designated as 3', 5' or middle spike and housekeeping controls) specified by the user.

The MAS5 algorithm can perform either an absolute analysis or a comparison analysis. The absolute analysis results include a signal, detection call, detection p-value, number of probe pairs in the probe set and the number of probe pairs used for the analysis. The comparison analysis results include those for the absolute analysis plus a change call, change p-value, signal log ratio, signal log ratio low, signal log ratio high, number of probe pairs in common between the two files being analyzed. The RMA and PLIER algorithms generate only a signal value.

The GCOS/XDA file format was designed to store results from the MAS5 algorithm which has a fixed number of columns of results for the absolute and comparison analyses.

The Command Console file format is more flexible in that there can be any number of columns of results, of which each column can be of a variety of data types (integer, float, 8 byte string, 16 byte string as examples). When the Expression Console software uses this format for RMA and PLIER, it only needs to allocate its results table with only two columns - probe set name and signal value.

One thing to note about the CHP files is that the older format (GCOS/XDA) only stores the analysis results, not the probe set names. To obtain the associated probe set names you will need to read either the CDF or PSI files. The PSI file is an ASCII text file with the probe set name and number of probe pairs. This file is smaller and easier to parse than the CDF file. The CDF file contains, in addition to the probe set names, the list of associated features (X/Y feature coordinate on the array and other attributes) for each probe set. The format of this file is either ASCII or binary.

The PSI and CDF files are named using the array type (also known as chip type) with a .PSI or .CDF extension. The array/chip type is stored in the header of the CHP file. Given the full path to the CHP file and the full path of the library directory you can determine the CHP file's associated PSI/CDF file.

The order of the probe sets in the PSI, CDF and CHP are the same. With this you can open each file and use the index of the probe set to join the probe set data in the files.

Parsers in the form of C++ and Java source code are available from Affymetrix to parse the CHP, CDF and PSI files. These parsers, along with sample code and documentation, are contained within the Fusion SDK.

The classes/interfaces provided within the Fusion SDK provide the ability to parse the different types of CHP files.
There are two main classes to use for reading 3' IVT CHP files generated by the Expression Console software.

These are:


Class Description
FusionCHPLegacyData
This class provides the interfaces for the "legacy" type results. The "legacy" results are those values (detection, p-value, signal, etc.) defined as outputs of the MAS5 algorithm stored in either the GCOS/XDA format or Command Console format CHP file.
FusionCHPQuantificationData

This class provides the interfaces for the "quantification" type results. The "quantification" results are only the probe set name and quantification (signal) value. This type of data is created by the RMA or PLIER algorithm when stored in a Command Console format CHP file.

This class also provides parsing capabilities for WT array CHP files. The difference between the two are that the probe set result (ProbeSetQuantificationData class) stores an "id" for WT results and "name" for 3' IVT results. The "name" property will be empty for WT results.


The following table shows the class to use for each of the different algorithms and CHP file types:


Algorithm GCOS/XDA Format CHP File Command Console Format CHP File
MAS5
FusionCHPLegacyData
FusionCHPLegacyData
RMA
FusionCHPQuantificationData
PLIER
FusionCHPQuantificationData

The following is an example of C++ code using the Fusion SDK to extract the signal values from a CHP (XDA or Command Console) file. The top level function is ReadCHPFile.

   

#include "FusionCHPData.h"                // header file for reading CHP files.
#include "FusionCHPLegacyData.h"   // MAS5 type CHP files
#include "FusionCHPQuantificationData.h"    // RMA/PLIER type CHP files
#include "StringUtils.h"
#include <string>

using namespace std;
using namespace affymetrix_calvin_utilities;
using namespace affymetrix_calvin_data;
using namespace affymetrix_fusion_io;

/*! This function extracts the signal value from the CHP file and probe set name from the PSI file.
 * @param legchp  The CHP file object.
 * @param libPath  The full path to the library file directory.
 * @return True if successfully read.
 */

bool ExtractData(FusionCHPLegacyData *legchp, const char *libPath)
{
    if (legchp->GetHeader().GetAssayType() != FusionExpression)
        return false;

    // The chip type is stored in the header of the CHP file.
    string chipType = StringUtils::ConvertWCSToMBS(legchp->GetHeader().GetChipType());

    // The probe set names are stored in either the PSI or CDF file. Use the
    // FusionPSIFile class in the Fusion SDK to parse the PSI file.
    FusionPSIFile psi;
    string psiFile = libPath + chipType + ".psi";
    psi.SetFileName(psiFile.c_str());
    if (psi.Read() == false)
        return false;

    // Now loop over the probe sets to get the signal values.
    // Detection call and p-values are also available from the
    // psResults object. The probe set name comes from the PSI file.

    float signal;
    string name;
    FusionExpressionProbeSetResults psResults;
    int n = legchp->GetHeader().GetNumProbeSets();
    for (int i = 0; i < n; i++)
    {
        legchp->GetExpressionResults(i, psResults);
        signal = psResults.GetSignal();
        name = psi.GetProbeSetName(i);
    }
    return true;
}

/*! This function extracts the signal value from the CHP file and probe set name from the PSI file.
 * @param sigchp  The CHP file object.
 * @return True if successfully read.
 */

bool ExtractData(FusionCHPQuantificationData *sigchp)
{
    // Now loop over the probe sets to get the signal values.
    // The probe set names are stored in the CHP file (or you can get them from the PSI file).

    float signal;
    string name;
    ProbeSetQuantificationData psResults;
    int n = sigchp->GetEntryCount();
    for (int i = 0; i < n; i++)
    {
        sigchp->GetQuantificationEntry(i, psResults);
        signal = psResults.quantification;
        name = psResults.name;
    }
    return true;
}

/*! This will read the CHP file, determine the type and extract the results.
 * @param fileName The full path to the CHP file.
 * @param libPath The full path to the library file directory.
 * @return True if successfully read.
 */

bool ReadCHPFile(const char *fileName, const char *libPath)
{

    // Read the CHP file. This function will read any type of CHP file whose parsers (from Fusion)
    // have been compiled and linked into the program.

    FusionCHPData *chp = FusionCHPDataReg::Read(fileName);
    if (chp == null)
        return false;

    // The following function will determine if the CHP file read contains "legacy" format data. This
    // can be either a GCOS/XDA file or a Command Console file. The "legacy" format data is that
    // which contains a signal, detection call, detection p-value, probe pairs, probe pairs used and
    // comparison results if a comparison analysis was performed. This will be from the MAS5 algorithm.

    // Note: The file may also contain genotyping results from the GTYPE software. The ExtractData function
    // will perform a check to ensure it is an expression CHP file.
    bool status = false;
    FusionCHPLegacyData *legchp = FusionCHPLegacyData::FromBase(chp);
    if (legchp != null)
    {
        status = ExtractData(legchp, libPath);
        delete legchp;
        return status;
    }

    // The following function will determine if the CHP file read is a "quantification" type file. This file contains
    // the probe set name and the quantification value. This is will be a Command Console format
    // file with RMA or PLIER results.
    FusionCHPQuantificationData *sigchp = FusionCHPQuantificationData::FromBase(chp);
    if (sigchp != null)
    {
        status = ExtractData(sigchp);
        delete sigchp;
        return status;
    }

    // The CHP file was read, but not of the type we wanted.
    delete chp;
    return false;
}


Back to Top >