|
Update to convention regarding probe and interval coordinates produced by Affymetrix® Tiling Analysis Software (TAS) GeneChip® Tiling Arrays use a library file called the BPMAP (binary probe map) to associate the physical location of probes in an array with the position on a genome of interest to which the probes map. The convention used in the BPMAP to describe the genome location of probes is the zero-indexed left end of the probe. Thus a probe formed of the first 25 bases of a particular chromosome would be described as mapping to position zero on that chromosome, and a probe mapping to the last 25 bases of a chromosome of length L would be described as mapping to position L-25. This convention has been used in all versions of BPMAP files and there is no plan to change it. The Tiling Analysis Software (TAS) for performing analysis of tiling arrays uses the BPMAP to associate experimental probe intensities with genomic positions. One of the analysis workflows provided by TAS enables the derivation of a p-value or estimated signal associated with each probe coordinate, and such results are written to a BAR (binary array) file format. The BAR file format does not specify a convention to be used for recording the location of a probe and in BAR files written by TAS to-date the convention used has been the same as in the BPMAP file. However a change is being made in TAS and from version 1.0.14 onwards probe locations will be recorded as the 0-indexed center of the probe. So, for example, a probe formed of the first 25 bases of a chromosome would be assigned coordinate 12, and a probe formed of the last 25 bases of a chromosome of length L would be assigned position L-13 (see figure 1).
Figure 1: Illustration of various conventions which could be used to describe positions of probes in a hypothetical sequence of length 60 bases. The conventions used by TAS are indicated. TAS also performs a workflow in which the values in a BAR file are threshold and runs of consecutive probes whose scores exceed a particular threshold are grouped together to define a region of significant expression or fold change. These regions are written to a BED file which has a convention of using 0-indexed values for the first base of a region and 1-indexed values for the last. In previous versions of TAS the convention used to define the region boundaries was inherited directly from the BPMAP file conventions and so the first base of a derived region is taken to be the start of the first probe in the region and the last base of a derived region is the start of the last probe in the region. A change to this convention is being made in TAS and from version 1.0.14 onwards the first base of a region will be defined as the center base of the first probe in the region, and the last base in a region will be the center base of the last probe in the region.
Figure 2: Possible conventions which can be used to define regions formed from consecutive runs of probes. In this example a region is formed of a run of two probes. Note that whichever convention is used, the BED format requires that the coordinate of the first base in the region is 0-indexed and the coordinate of the last base of the region is 1-indexed. The conventions used by TAS are indicated. Note that the above changes in conventions used in creation of BAR and BED
files by TAS will change only the coordinates with which results are associated,
the results themselves (ROCs, signals and p-values) do not change. The changes
in convention are implemented solely in the TAS software and do not require
any changes in library files. BAR and BED files created according to the new
conventions will have information embedded within their tag-value pairs to indicate
the convention and version of TAS that was used in their creation.
|