Alignments of the consensus and exemplar sequences of probesets to genomic coordinates are provided in the NetAffx Analysis Center. For species with well-characterized genomes, these alignments are also available as downloadable PSL files. To find these files, select the chip you are interested in from the list on the support by product page.
Downloadable files for each array will be updated when the alignments on the NetAffx Analysis Center are updated for that array. This generally happens only when a new version of the genome in question is released.
Please refer to our Terms and Conditions for details on the acceptable use of these files.
Using the PSL Files in IGB
The PSL files containing alignment data are designed to be viewed with the Integrated Genome Browser (IGB). This document contains only brief information on how to use these particular files in IGB. For full documentation of IGB, and for links to a user discussion forum, please see the Affymetrix Tools page.
To load an alignment file into IGB, first select and load the correct genome version for the correct species using the "QuickLoad" feature. You may also load "RefSeq" annotations, or other annotations of your choice with "QuickLoad". Next, use the "Open" item in the "File" menu to load the PSL alignment file for your GeneChip array. Make sure that the "Merge" checkbox is selected when loading the file. It is not necessary to unzip the PSL file before loading it into IGB; IGB can unzip files during loading.
You should not modify the names of the files if you intend to load them into IGB. IGB recognizes that these PSL files have a special structure based on the filename ending ".link.psl" or ".link.psl.zip".
Understanding the Genomic Alignment PSL Files
It is not necessary to understand the format of these files to use them with IGB. This section is included for for completeness only.
The alignment data files are provided in the PSL format, described at UCSC.
Our files begin with comment lines indicating the array and genome version, follwed by four data sections, each indicated by a track line.
The four track lines separate the data into these four parts. The file for the "HG-U133A" chip, for example, contains the following sections:
- The section "HG-U133A netaffx consensus" contains mappings of consensus/exemplar sequences onto the genome. Each consensus sequence is labeled by the name of the corresponding probe set in this format "HG-U133A:200732_S_AT". When a consensus sequence maps to the genome in multiple locations, those will be listed on separate lines in the file.
- The section "HG-U133A netaffx probesets" indicates where the probesets are located on the consensus sequence, not where they are located on the genome. Thus it maps the sequence "P.HG-U133A:200732_S_AT" into 11 locations each with length 25 on the sequence named "HG-U133A:200732_S_AT".
- The section "HG-U133A netaffx poly_a_sites" indicates where the poly-A sites are located on the consensus sequence. Poly-A sites have length 1. Not all consensus sequences have poly-A sites.
- The section "HG-U133A netaffx poly_a_stacks" indicates where the poly-A stacks are located on the consensus sequence. Poly-A stacks have length 1. Not all consensus sequences have poly-A stacks.
Any program that can read PSL files should understand the initial sections of these files -- the portion that maps consensus or exemplar sequences onto the genome. Most programs will ignore the sections mapping probe locations and poly-A locations onto the consensus sequence. IGB will recognize that the files have this particular structure based on the filename extension ".link.psl".