|
It is not necessary to understand the format of these files to
use them with IGB. This section is included for for completeness only.
The alignment data files are provided in the PSL format,
described at UCSC.
Our files begin with comment lines indicating the array and
genome version, follwed by four data sections, each indicated by a
track line.
The four track lines separate the data into these four parts.
The file for the "HG-U133A" chip, for example, contains the
following sections:
- The section "HG-U133A netaffx consensus" contains mappings of
consensus/exemplar sequences onto the genome. Each consensus
sequence is labeled by the name of the corresponding probe set
in this format "HG-U133A:200732_S_AT".
When a consensus sequence maps to
the genome in multiple locations, those will be
listed on separate lines in the file.
- The section "HG-U133A netaffx probesets" indicates where the
probesets are located on the consensus sequence, not where
they are located on the genome. Thus it
maps the sequence "P.HG-U133A:200732_S_AT" into 11 locations
each with length 25 on the sequence named "HG-U133A:200732_S_AT".
- The section "HG-U133A netaffx poly_a_sites" indicates where the
poly-A sites are located on the consensus sequence. Poly-A sites
have length 1. Not all consensus sequences have poly-A sites.
- The section "HG-U133A netaffx poly_a_stacks" indicates where the
poly-A stacks are located on the consensus sequence. Poly-A stacks
have length 1. Not all consensus sequences have poly-A stacks.
Any program that can read PSL files should understand the initial sections
of these files -- the portion that maps consensus or exemplar sequences onto
the genome. Most programs will ignore the sections
mapping probe locations and poly-A locations onto the consensus sequence.
IGB will recognize that the files have this particular structure based
on the filename extension ".link.psl".
|