The Affymetrix GTYPE, BRLMM Analysis Tool and Genotyping Console genotyping
software packages store the probe set summarizations (allele call, confidence,
etc.) from the MPAM, Dynamic Model, BRLMM, and other algorithms in a binary
file with a .CHP. The newer BRLMM Analysis Tool and Genotyping Console software
will also include the algorithm name as part of the CHP file name. This file
is referred to as the CHP ("chip") file. There are two main formats
of the CHP file which will be supported by the Affymetrix genotyping software.
These are known as the GCOS/XDA format (generated by GTYPE and BRLMM Analysis
Tool) and the Command Console format (generated by BRLMM Analysis Tool and Genotyping
Console). Documentation on these file formats are available on the DevNet
section of the Affymetrix web site.
One thing to note about the CHP files is that the older format (GCOS/XDA)
only stores the analysis results, not the probe set names. To obtain the associated
probe set names you will need to read either the CDF or PSI files (library files).
The PSI file is an ASCII text file with the probe set name and number of probe
pairs. This file is smaller and easier to parse than the CDF file. The CDF file
contains, in addition to the probe set names, the list of associated features
(X/Y feature coordinate on the array and other attributes) for each probe set.
The format of this file is either ASCII or binary.
The PSI and CDF files are named using the array type (also known as chip type)
with a .PSI or .CDF extension. The array/chip type is stored in the header of
the CHP file. Given the full path to the CHP file and the full path of the library
directory you can determine the CHP file's associated PSI/CDF file.
The order of the probe sets in the PSI, CDF and XDA format CHP are the same,
with this you can open each file and use the index of the probe set to join
the probe set data in the files.
The newer Command Console format CHP files do contain the probe set names (also
known as the SNPID or Probe Set ID in the software). The format of this file
allows for multiple tables of data thus allowing control probe sets results
to be stored separately from the SNP probe set results. Because of this the
order of the probe sets in the CDF file may not match that of the CHP file.
Parsers in the form of C++ and Java source code are available from Affymetrix
to parse the CHP, CDF and PSI files. These parsers, along with sample code and
documentation, are contained within the Fusion
SDK.
The classes/interfaces provided within the Fusion SDK provide the ability
to parse the different types of CHP files. These include genotyping, expression
and tiling CHP files. The FusionCHPLegacyData class provides the support for
XDA format CHP files, and the FusionCHPMultiDataData class provides support
for Command Console format CHP files.
The following is an example of C++ code using the Fusion SDK to extract the
genotyping results from a CHP (GCOS/XDA or Command Console) file. The top level
function is ReadCHPFile.
#include "FusionCHPData.h"
// header file for reading CHP files.
#include "FusionCHPLegacyData.h"
// XDA CHP files
#include "FusionCHPMultiDataData.h" // Command Console CHP files
#include "StringUtils.h"
#include <string>
using namespace std;
using namespace affymetrix_calvin_utilities;
using namespace affymetrix_calvin_data;
using namespace affymetrix_fusion_io;
using namespace affymetrix_calvin_io;
/*! This function extracts the results from the CHP
file and probe set name from the PSI file.
* @param legchp The CHP file object.
* @param libPath The full path to the library file directory.
* @return True if successfully read.
*/
bool ExtractData(FusionCHPLegacyData *legchp, const char *libPath)
{
if (legchp->GetHeader().GetAssayType() != FusionGenotyping)
return false;
// The chip type is stored in the
header of the CHP file.
string chipType = StringUtils::ConvertWCSToMBS(legchp->GetHeader().GetChipType());
// The probe set names are stored
in either the PSI or CDF file. Use the
// FusionPSIFile class in the Fusion SDK to parse the PSI
file.
FusionPSIFile psi;
string psiFile = libPath + chipType + ".psi";
psi.SetFileName(psiFile.c_str());
if (psi.Read() == false)
return false;
// Now loop over the probe sets
to get the call and confidence values.
// p-values are also available from the psResults object.
// The probe set name comes from the PSI file.
// Note that the call can be one of the following constants
defined in CHPFileData.h
// ALLELE_A_CALL, ALLELE_AB_CALL, ALLELE_B_CALL, ALLELE_NO_CALL
float conf;
u_int8_t call;
string name;
FusionGenotypeProbeSetResults psResults;
int n = legchp->GetHeader().GetNumProbeSets();
for (int i = 0; i < n; i++)
{
legchp->GetGenotypingResults(i,
psResults);
conf = psResults.GetConfidence();
call = psResults.GetCall();
name = psi.GetProbeSetName(i);
}
return true;
}
/*! This function extracts the
results from the CHP file.
* @param mchp The CHP file object.
* @return True if successfully read.
*/
bool ExtractData(FusionCHPMultiDataData *mchp)
{
// Multiple
tables of results are stored in the multi data CHP file.
// The SNP
analysis results are stored in the "genotype" table.
int n = mchp->GetEntryCount(GenotypeMultiDataType);
// Now
loop over the probe sets to get the call and confidence values.
// Other values such as the contrast and strength for the
BRLMM algorithm
// are stored. Use the GetGenotypeEntry function
to retrieve all of the columns.
// results for the given SNP.
float conf;
u_int8_t call;
string name;
for (int i = 0; i < n; i++)
{
call
= mchp->GetGenoCall(GenotypeMultiDataType, i);
conf = mchp->GetGenoConfidence(GenotypeMultiDataType, i);
name = mchp->GetProbeSetName(GenotypeMultiDataType, i);
}
return true;
}
/*! This will read the CHP file, determine the type
and extract the results.
* @param fileName The full path to the CHP file.
* @param libPath The full path to the library file directory.
* @return True if successfully read.
*/
bool ReadCHPFile(const char *fileName, const char *libPath)
{
// Read the CHP file. This function
will read any type of CHP file whose parsers (from Fusion)
// have been compiled and linked into the program.
FusionCHPData *chp = FusionCHPDataReg::Read(fileName);
if (chp == null)
return false;
// The following function will determine
if the CHP file read is of the XDA format. This
// can be either a GCOS/XDA file. The "legacy"
format data is that
// which contains a call, confidence, p-values, RAS1 and
RAS2 results.
// Note: The XDA file format also provides for storage
of expression results. The ExtractData function
// will perform an additional check to ensure it is a genotyping
CHP file.
bool status = false;
FusionCHPLegacyData *legchp = FusionCHPLegacyData::FromBase(chp);
if (legchp != null)
{
status = ExtractData(legchp, libPath);
delete legchp;
return status;
}
// The following
function will determine if the CHP file read is of the Command Console format.
// The library path does not need to be passed into the
function as the probe set names (SNPID's)
// are stored in the CHP file.
status = false;
FusionCHPMultiDataData *mchp = FusionCHPMultiDataData::FromBase(chp);
if (mchp != null)
{
status = ExtractData(mchp);
delete mchp;
return status;
}
// The CHP file was read, but not
of the type we wanted/expected. This will happen when other CHP file parsers
//
are compiled and linked in your application.
delete chp;
return false;
}