Affymetrix® BPMAP File Format
`BPMAP FILE
Description
The BPMAP file contains information relating to the design of the Affymetrix tiling arrays.
Version 2 added the ability to a version, group and parameters associated with each sequence item.
Version 3 added the ability to store perfect match probes in addition to probe pairs.
Format
The format of the BPMAP file is a binary file with data stored in big-endian format. The following lists the sections and their order and placement in the file. The definition of each section is detailed below.
File Header
Sequence Description for sequence #1
Sequence Description for sequence #2
...
Sequence Description for sequence #N
Sequence Header for sequence #1
Position Information for probe/probe pair #1 of sequence #1
Position Information for probe/probe pair #2 of sequence #1
...
Position Information for probe/probe pair #M of sequence #1
Sequence Header for sequence #2
Position Information for probe/probe pair #1 of sequence #2
Position Information for probe/probe pair #2 of sequence #2
...
Position Information for probe/probe pair #M of sequence #2
...
Assuming there are N sequences and M_i probe pairs for sequence i.
Section Definitions
File Header
| Item | Description | Type | Size |
|---|---|---|---|
| 1 | Magic number. A value to identify the file type. The value is set to 'PHT7\r\n\032\n' | char | 8 bytes |
| 2 | The version
number of the file. The version number is either 1.0, 2.0 or 3.0. Due to a bug with the BPMAP file writer for early access arrays, this value may not be stored as a big endian float. To read this value:
| float | 4 bytes |
| 3 | Number of sequences stored in the file. | unsigned int | 4 bytes |
Sequence Description
| Item | Description | Type | Size |
|---|---|---|---|
| 1 | Length of the sequence name. | unsigned int | 4 bytes |
| 2 | Sequence name. | char | Specified by item #1. |
| 3 | Probe mapping type. (only for version 3.0 and above files) 0 indicates a
(PM/MM) probe
pair tiling across the sequence. |
unsigned int | 4 bytes |
| 4 | Sequence file offset. (only for version 3.0 and above files) The offset (in bytes), from the beginning of the file, of the probe position information. This is intended to enable fast look-up ability. |
unsigned int | 4 bytes |
| 5 | Number of probes/probe pairs in the sequence. | unsigned int | 4 bytes |
| 6 | Length of the group name (only for version 2.0 and above files) | unsigned int | 4 bytes |
| 7 | Group name (only for version 2.0 and above files) | char | Specified by item #4. |
| 8 | Length of the version number (only for version 2.0 and above files) | unsigned int | 4 bytes |
| 9 | Version number (only for version 2.0 and above files) | char | Specified by item #6 |
| 10 | Number of parameters (only for version 2.0 and above files) | unsigned int | 4 bytes |
| 11 | Parameters
name/value. The number of parameters is specified by item #8. (only for
version 2.0 and above files). Each parameter is defined as a pair of name/value strings where the strings are stored as the following: | see the description. | see the description. |
Sequence Header
| Item | Description | Type | Size |
|---|---|---|---|
| 1 | Sequence ID | unsigned int | 4 bytes |
Position Information
| Item | Description | Type | Size |
|---|---|---|---|
| 1 | X coordinate on array of the perfect match (PM) probe (note: array coordinates are 0 based). | unsigned int | 4 bytes |
| 2 | Y coordinate on array of the PM probe | unsigned int | 4 bytes |
| 3 | X coordinate on array of the mismatch probe (MM) probe (only if the probe mapping type indicates PM/MM tiling) | unsigned int | 4 bytes |
| 4 | Y coordinate on array of the MM probe (only if the probe mapping type indicates PM/MM tiling) | unsigned int | 4 bytes |
| 5 | Length of the PM probe (and MM if a pair). | unsigned char | 1 byte |
| 6 | Probe
sequence. The 25 base probe sequence is packed into a 7 byte character
sequence. Each byte represents up to 4 bases (so the format can handle
probes of length up to 25bp). | char | 7 bytes |
| 7 | Match score.
Note: The current BPMAP files are based on perfect match so the scores are
1. See the bug description in the version number field above. |
float | 4 bytes |
| 8 | Position of the PM probe within the sequence. Note: The position is the 0-based position of the lower coordinate of the 25-mer aligned to the target. | unsigned int | 4 bytes |
| 9 | 1 if the matching target (not the probe) is on the forward strand, 0 if on the reverse. | unsigned char | 1 byte |