Affymetrix® Genotyping Console CYCHP Data File Format
CYCHP FILES
Description
CYCHP files contain analysis results generated by the Chromosome Analysis Suite (CHAS).
Format
The format of the CYCHP files generated by the CHAS software uses the Command Console binary data format.
The data type identifier is set to: "affymetrix-multi-data-type-analysis"
The parameters stored in the header of the file include algorithm parameters (those whose names are prefixed with "affymetrix-algorithm-param-" and summary statistics (those whose names are prefixed with "affymetrix-chipsummary-". These parameters are algorithm specific.
Analysis results are retained in four data groups. Each data group contains one or more data sets as shown below.
|
Data Group Name |
Description |
Number of Data Sets |
|---|---|---|
|
Chromosomes |
Summary analysis results for each chromosome |
1 |
|
ProbeSets |
Analysis results for each probeset |
2 |
|
Algorithm Data |
Data produced during algorithm development |
1 |
|
Segments |
Analysis results for the segments found in the cychp file |
5 |
Each data set header will contain a set of parameters to define the column labels for the data set.
The following describes the data groups and data sets stored in the file.
Chromosomes Data Group
The Chromosomes data group consists of s single data set. This data group has one data set.
The Summary data set contains summary information for each chromosome. This data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| Display | Ascii | Text label for the chromosome |
| StartIndex | UInt | Zero-based reference to the row in the CopyNumber data set of the ProbeSets data group containing results for this chromosome |
| MarkerCount | UInt | The number of markers in the chromosome |
| MinSignal | Float | Minimum log2 ratio value found in the chromosome* |
| MaxSignal | Float | Maximum log2 ratio value found in the chromosome* |
| MedianCNState | Float | Median calibrated log2 ratio |
| HomFrequency | Float | Frequency of Homozygosity |
| HetFrequency | Float | Frequency of Heterozygosity |
| Mosaicism | Float | Median mosaicism mixture value |
| LOH | Float | Proportion of genomic distance of LOH calls per chromosome |
* The log2 ratios are trimmed log2 ratios by default, but the values might go beyond the trimmed values due to GC correction and/or wave correction step.
Probesets Data Group
The Probesets data group consists of 2 data sets, containing analysis results specific to the probesets. The data sets included in this group are
- CopyNumber
- AllelePeaks
The CopyNumber data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| ProbeSetName | Ascii | The name of the probeset |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| Position | UInt | Genomic position in the chromosome |
| Log2Ratio | Float | Log2 ratio (signal / median of Reference signal) |
| WeightedLog2Ratio | Float | Running median of log2 ratios for n markers (default: n = 5) |
| SmoothSignal | Float | Each value is the result of a kernel smooth over a window of genomically contiguous log2 ratios. By default this is converted to a copy number value using a log-linear model that fits each CN state to its expected log2 ratio. This is useful for seeing mosaics. |
The AllelePeaks data set houses data summarizing allele patterns and contains the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| ProbesetName | Ascii | The name of the probeset |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| Position | UInt | Genomic position in the chromosome |
| AllelePeaks0 | Float | First three encoded allele peak values |
| AllelePeaks1 | Float | Next three encoded allele peak values |
AlgorithmData Data Group
The AlgorithmData data group contains data produced during algorithm development. It is not intended for external use and is not read or used by the Chromosome Analysis Suite User Interface (the Browser). The data is subject to change and/or removal at any time in future releases of APT and/or Chromosome Analysis Suite.
There is one data set in the AlgorithmData data group called MarkerABSignal.
The MarkerABSignal data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| Index | UInt | This column contains the index relative to the probeset in the CopyNumber data set of the ProbeSet data group |
| ASignal | Float | A Signal value from PLIER |
| BSignal | Float | B Signal value from PLIER |
| SCAR | Float | Scaled centered allelic ratio values, LOH metric |
Segments Data Group
The Segments data group contains five data sets. The data sets included in this data group are:
- CN
- LOH
- CNNeutralLOH
- NormalDiploid
- Mosaicism
The CN data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| SegmentID | UInt | Segment identifier unique to the CYCHP file |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| StartPosition | UInt | Start genomic position of the segment |
| StopPosition | UInt | Ending genomic position of the segment |
| MarkerCount | Int | Number of markers in the segment |
| MeanMarkerDistance | UInt | Mean distance between markers in the segment |
| State | Float | Copy Number State |
| Confidence | Float | Indicator score for non-normal copy number |
The LOH data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| SegmentID | UInt | Segment identifier unique to the CYCHP file |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| StartPosition | UInt | Start genomic position of the segment |
| StopPosition | UInt | Ending genomic position of the segment |
| MarkerCount | Int | Number of markers in the segment |
| MeanMarkerDistance | UInt | Mean distance between markers in the segment |
| LOH | UByte | The value of this column is 1 when a Loss of Heterozygosity is found, 0 when not found. |
| Confidence | Float | The ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models. |
The CNNeutralLOH data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| SegmentID | UInt | Segment identifier unique to the CYCHP file |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| StartPosition | UInt | Start genomic position of the segment |
| StopPosition | UInt | Ending genomic position of the segment |
| MarkerCount | Int | Number of markers in the segment |
| MeanMarkerDistance | UInt | Mean distance between markers in the segment |
| CNNeutralLOH | UByte | The value of this column is 1 when CN State is 2 and LOH is 1. Otherwise, the value of this column is 0. |
| Confidence | Float | The ratio of the probability of the SCAR measurements under the LOH model to the sum of the probability under each of the LOH and non-LOH models. |
The NormalDiploid data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| SegmentID | UInt | Segment identifier unique to the CYCHP file |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| StartPosition | UInt | Start genomic position of the segment |
| StopPosition | UInt | Ending genomic position of the segment |
| MarkerCount | Int | Number of markers in the segment |
| MeanMarkerDistance | UInt | Mean distance between markers in the segment |
| NormalDiploid | UByte | The value of this column is 1 when CN State is 2 and LOH is 0. Otherwise, the value of this column is 0. |
| Confidence | Float | Confidence |
The Mosaicism data set will contain the following columns:
| Column Name | Column Type | Description |
|---|---|---|
| SegmentID | UInt | Segment identifier unique to the CYCHP file |
| Chromosome | UByte | The chromosome display values (1-22, X, Y, MT) are stored in the data set header. The values in this column are defined as 1-22, 24 (for X), 25 (for Y) and 26 (for MT) and 255 (for no value) |
| StartPosition | UInt | Start genomic position of the segment |
| StopPosition | UInt | Ending genomic position of the segment |
| MarkerCount | Int | Number of markers in the segment |
| MeanMarkerDistance | UInt | Mean distance between markers in the segment |
| Mosaicism | UByte | More than one CN call found in the segment |
| Confidence | Float | Proportion of markers that are above or below the thresholds required to make a CN change call for a running median segment size of 251. |
| State | Float | CN State |
| Mixture | Float | Estimated mosaicism mixture (i.e. typically one of -1, -0.7, -0.5, -0.3, 0, 0.3, 0.5, 0.7, 1.0) |