Affymetrix® Genotyping Console CYCHP Data File Format

CYCHP FILES

Description

CYCHP files contain analysis results generated by the Chromosome Analysis Suite (CHAS).

Format

The format of the CYCHP files generated by the CHAS software uses the Command Console binary data format.

The data type identifier is set to: "affymetrix-multi-data-type-analysis"

The parameters stored in the header of the file include algorithm parameters (those whose names are prefixed with "affymetrix-algorithm-param-" and summary statistics (those whose names are prefixed with "affymetrix-chipsummary-". These parameters are algorithm specific.

Analysis results are retained in four data groups. Each data group contains one or more data sets as shown below.

Data Group Name

Description

Number of Data Sets

Chromosomes

Summary analysis results for each chromosome

1

ProbeSets

Analysis results for each probeset

2

Algorithm Data

Data produced during algorithm development

1

Segments

Analysis results for the segments found in the cychp file

5

Each data set header will contain a set of parameters to define the column labels for the data set.

The following describes the data groups and data sets stored in the file.

Chromosomes Data Group

The Chromosomes data group consists of s single data set. This data group has one data set.

The Summary data set contains summary information for each chromosome. This data set will contain the following columns:

Column Name Column Type Description
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
Display Ascii Text label for the chromosome
StartIndex UInt Zero-based reference to the row in the CopyNumber data set of the ProbeSets data group containing results for this chromosome
MarkerCount UInt The number of markers in the chromosome
MinSignal Float Minimum log2 ratio value found in the chromosome*
MaxSignal Float Maximum log2 ratio value found in the chromosome*
MedianCNState Float Median calibrated log2 ratio
HomFrequency Float Frequency of Homozygosity
HetFrequency Float Frequency of Heterozygosity
Mosaicism Float Median mosaicism mixture value
LOH Float Proportion of genomic distance of LOH calls per chromosome

* The log2 ratios are trimmed log2 ratios by default, but the values might go beyond the trimmed values due to GC correction and/or wave correction step.

Probesets Data Group

The Probesets data group consists of 2 data sets, containing analysis results specific to the probesets. The data sets included in this group are

  • CopyNumber
  • AllelePeaks

The CopyNumber data set will contain the following columns:

Column Name Column Type Description
ProbeSetName Ascii The name of the probeset
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
Position UInt Genomic position in the chromosome
Log2Ratio Float  Log2 ratio (signal / median of Reference signal)
WeightedLog2Ratio Float  Running median of log2 ratios for n markers (default: n = 5)
SmoothSignal Float Each value is the result of a kernel smooth over a window of genomically contiguous log2 ratios. By default this is converted to a copy number value using a log-linear model that fits each CN state to its expected log2 ratio. This is useful for seeing mosaics.

The AllelePeaks data set houses data summarizing allele patterns and contains the following columns:

Column Name Column Type Description
ProbesetName Ascii The name of the probeset
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
Position UInt Genomic position in the chromosome
AllelePeaks0 Float First three encoded allele peak values
AllelePeaks1 Float Next  three encoded allele peak values

AlgorithmData Data Group

The AlgorithmData data group contains data produced during algorithm development. It is not intended for external use and is not read or used by the Chromosome Analysis Suite User Interface (the Browser). The data is subject to change and/or removal at any time in future releases of APT and/or Chromosome Analysis Suite.

There is one data set in the AlgorithmData data group called MarkerABSignal.

The MarkerABSignal data set will contain the following columns:

Column Name Column Type Description
Index UInt This column contains the index relative to the probeset in the CopyNumber data set of the ProbeSet data group
ASignal Float A Signal value from PLIER
BSignal Float B Signal value from PLIER
SCAR Float Scaled centered allelic ratio values, LOH metric

Segments Data Group

The Segments data group contains five data sets. The data sets included in this data group are:

  • CN
  • LOH
  • CNNeutralLOH
  • NormalDiploid
  • Mosaicism

The CN data set will contain the following columns:

Column Name Column Type Description
SegmentID UInt Segment identifier unique to the CYCHP file
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
StartPosition UInt Start genomic position of the segment
StopPosition UInt Ending genomic position of the segment
MarkerCount Int Number of markers in the segment
MeanMarkerDistance UInt Mean distance between markers in the segment
State Float Copy Number State
Confidence Float Indicator score for non-normal copy number

The LOH data set will contain the following columns:

Column Name Column Type Description
SegmentID UInt Segment identifier unique to the CYCHP file
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
StartPosition UInt Start genomic position of the segment
StopPosition UInt Ending genomic position of the segment
MarkerCount Int Number of markers in the segment
MeanMarkerDistance UInt Mean distance between markers in the segment
LOH UByte The value of this column is 1 when a Loss of Heterozygosity is found, 0 when not found.
Confidence Float The ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models.

The CNNeutralLOH data set will contain the following columns:

Column Name Column Type Description
SegmentID UInt  Segment identifier unique to the CYCHP file
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
StartPosition UInt Start genomic position of the segment
StopPosition UInt Ending genomic position of the segment
MarkerCount Int Number of markers in the segment
MeanMarkerDistance UInt Mean distance between markers in the segment
CNNeutralLOH UByte The value of this column is 1 when CN State is 2 and LOH is 1.  Otherwise, the value of this column is 0.
Confidence Float The ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models. 

The NormalDiploid data set will contain the following columns:

Column Name Column Type Description
SegmentID UInt Segment identifier unique to the CYCHP file
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
StartPosition UInt Start genomic position of the segment
StopPosition UInt Ending genomic position of the segment
MarkerCount Int Number of markers in the segment
MeanMarkerDistance UInt Mean distance between markers in the segment
NormalDiploid UByte The value of this column is 1 when CN State is 2 and LOH is 0.  Otherwise, the value of this column is 0.
Confidence Float Confidence

The Mosaicism data set will contain the following columns:

Column Name Column Type Description
SegmentID UInt  Segment identifier unique to the CYCHP file
Chromosome UByte The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value)
StartPosition UInt Start genomic position of the segment
StopPosition UInt Ending genomic position of the segment
MarkerCount Int Number of markers in the segment
MeanMarkerDistance UInt Mean distance between markers in the segment
Mosaicism UByte More than one CN call found in the segment 
Confidence Float Proportion of markers that are above or below the thresholds required to make a CN change call for a running median segment size of 251.
State Float CN State
Mixture Float Estimated mosaicism mixture (i.e. typically one of -1, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 1.0)