home login register your profile contact        
Affymetrix
Products Support Analysis Scientific Community Corporate Careers Shop Affymetrix Japan
BY PRODUCT
Affymetrix Support - GeneChip Arrays GeneChip Arrays
Affymetrix Support - Assays and Reagents Assays & Reagents
Affymetrix Support - Instruments Instruments
Affymetrix Support - Software Software
BY SUPPORT TYPE
Affymetrix Support - Technical 
            Documentation Technical Documentation
Affymetrix Support - Sample Data Data Resource Center
Affymetrix Support - Assay Panel Files Assay Panel Files
Affymetrix Support - NetAffx Annotation Files Annotation Files
Affymetrix Support - Library Files Library Files
Affymetrix Support - Sample Data Software Downloads
Affymetrix Support - Fluidics Scripts Fluidics Scripts
Affymetrix Support - Mask Files Mask Files
Affymetrix Support - Array Comparisons Array Comparisons
Affymetrix Support - Product Updates Product Updates
Affymetrix Support - Affymetrix Software Developer's Network Developers' Network
Affymetrix Support - GeneChip Compatible Partners - Software GeneChip Compatible Software
Affymetrix Support - Third Party Tools - Supported by Affymetrix Affymetrix Tools
Affymetrix Learning Center - Online Training LEARNING CENTER
Learning Center, Train on Affymetrix Tools and Instruments Learning Center Overview
Learning Center, Command Console Software Series Command Console®
Learning Center, Newark NJ - Data Analysis Workshops Data Analysis Workshops
Learning Center, CNAT 4.0 Overview BAT 2.0 Overview
Learning Center, CNAT 4.0 Overview CNAT 4.0 Overview
Learning Center, Genotyping Console Software Series Genotyping Console®
Learning Center, Genotyping Console Software Series NetAffx® Learning Center
Learning Center, GTYPE 4.1 Software Overview GTYPE 4.1 Overview
Learning Center, GTYPE 4.1 Software Overview Mapping 500k Assay
Learning Center, GTYPE 4.1 Software Overview WT Assay Tutorial
Tiling Analysis Software Tutorial Tiling Analysis Software Tutorial
Learning Center, Expression Data Analysis Series Expression Data
Analysis Series
SERVICE SUPPORT
Ordering Information
Affymetrix Support - Instument Installation Instrument Installation
Service Contracts
Affymetrix Services - List of Service Providers Service Providers
Affymetrix Services - Email Technical Support E-mail Technical Support
Affymetrix Services - FTP Secure File Exchange Secure File Exchange
Affymetrix - Help - Genotyping Glossary Terms Index
IVT Glossary
NetAffx™ Links

NetAffx links provide detailed sequence data used to design probe set in question. For a brief description of the 3' IVT Design process including a description of the terms, see the NetAffx Annotation Whitepaper. A detailed description of the design process can be found in Array Design for the Human Genome U133 Set (pdf XXX MB).

See also the Sequence Section which describes the target sequence and the probes explicitly.

Cluster Members – accesses the sequences originally clustered as evidence for transcription when the probe set was designed. This link is especially useful for probe sets resulting from clustering of EST sequences when other sequence data may not be available.

Consensus/Exemplar Sequence – is sequenced used at the time of design to represent the transcript that the GeneChip? probe set measures. A consensus sequence results from base-calling algorithms that align and combine sequence data into groups. An exemplar sequence is a representative cDNA sequence for each gene.

Group Members – If family of known transcripts are known for a probe set when the array is designed, they are compiled into the Group Member list. Typically, group members are found with _a and _s probe sets when a group of sequences are associated with one probe set.

GeneChip® Array Information

This page section displays the basic identifying information for the probe set.

Probe Set ID – The identifier that refers to a set of probe pairs selected to represent expressed sequences on an array. Designations are given at design time. See the probe specific information in this WHITEPAPER.

The probe set names never change, but they can give you an idea of what was known about the sequence at the time of design.

_at = all the probes hit one known transcript.
_a = all probes in the set hit alternate transcripts from the same gene
_s = all probes in the set hit transcripts from different genes
_x = some probes hit transcripts from different genes

For HG-U133, the _a designation was not used; an _s probe set on these arrays means the same as an _a on any of the HG-U133 arrays.

GeneChip® Array – The GeneChip probe array where the probe set is located. For more information, see the Array Product Page.

Organism Common Name – The genus and species of the organism represented by the probe set. This may be different from the organism indicated by the GeneChip Array field, since some chips contain probe sets for genes from accessory organisms. See the GeneChip?s Material Data Sheet to determine its contents.

Probe Design Information

A record of the evidence, compiled for the probe set. This information is provided to document the design content which led to the choice of the probe sequences, and so it does not change or update in any way and entities such as Unigene clusters and transcript or EST accessions may no longer be active in their respective databases of origin. The current biological interpretation of the probe set is given in the remainder of the Details page.

Design Date – This date precedes the release of the chip and is indicative of the date when the design data may have been drawn from its bioinformatics data sources.

Transcript ID (Array Design) – The UniGene cluster, if any, associated with the probe set at the time of design..

Sequence Type – Indicates whether the design sequence for this probe set was a Consensus or Exemplar sequence. A Consensus sequence is usually the result of a aligned cluster of EST sequences. An Exemplar sequence is a cluster that includes a representative sequence from each gene group, indicating a transcript was available at the time of design.

Representative Public ID – The accession number of the representative sequence on which the probe set is based. For UniGene based arrays, this is usually a GenBank, dbEST or RefSeq accession used for sequence selection. Refer to the Sequence Source field under the Sequence section to determine the database used.

Target Description – Information accumulated about the probe set and the transcription evidence available at the time of design.

Archival Unigene Cluster – The archival UniGene cluster is the base name of the UniGene cluster used as evidence to design this probe set, which may have evolved since design time.

Cluster Evidence – An inventory of the sequence evidence that composed any UniGene Clusters that were referenced in the design of this probe set.

Probe Selection Region Evidence – An inventory of the sequence evidence used in the probe set design, including the Unigene or RefSeq consensus sequences.

Annotation Method Description (change to: Current Probe Set Information)

This section describes the current information available for this probe set.

Annotation Date – The date of the last NetAffx update of this probe set. NetAffx updates are quarterly, but individual organisms may be updated less frequently if the genome and transcript record are not changing rapidly enough to warrant updates.

Annotation Description/Annotation Grade – NetAffx tracks five levels of relationships between IVT Probe sets and the current transcript record. The letter Annotation Grade corresponds to the class of evidence described in the Annotation Description Field, also summarized below. For more information see Annotation Grade below or the Transcript Assignment for NetAffx™ Annotation whitepaper.

  • Grade A – A majority probes from the probe set match this transcript perfectly.
  • Grade B – The transcript and the probe set?s Target Sequence overlap on the genome. The probes do not match the transcript, presumably because the 3' end of the transcript is truncated in the record.
  • Grade C – The transcript and the probe set?s Consensus/Exemplar Sequence overlap on the genome. The transcript does not overlap the target sequence presumably due to 3' end of the truncation.
  • Grade E – No transcripts are known to correspond to this probe set at this time, but a UniGene Cluster is known to correspond to it.
  • Grade R – No transcript currently supports this probe set, though EST sequences are available from the design information.

Only the transcripts with the highest available assignment grade are referenced in NetAffx. I.e. If transcripts which Match Probes (Grade A) are available, Grade B, C and matching EST data (Grades E and R) will not be displayed.

Note that because of the sheer volume of information, EST evidence which may relate to a probe set is not updated as are transcript relationships in NetAffx.

Annotation Transcript Cluster – EntrezGene or UniGene transcript clusters available for the probe set. These records may represent families of transcripts and the strongest collection of evidence for a gene related to a probe set. After the accession, the number of matching probes is given in parentheses.

Transcript Assignments – Here is where the transcripts associated with a given probe set are displayed. The Accession (Representative Transcript), Description the number of matching Probes (most helpful for ranking Grade A assignments) and Related Probes are given.

The Related Probes field provides a link to see other probe sets that are also associated with this particular transcript, sorted by the Annotation Grade the transcript has for those other probe sets.

Annotation Notes – Additional notes for special cases in the annotations.

  • Cross Hybridizing Probe Sets – A list of transcripts with perfect matches to the probe set but with fewer perfect matches than Grade A. Transcripts with Grade B,C assignments are removed from this list.
Reverse Complement Probe Sets – Indicates that this probe set matches to the reverse compliment of some known transcript. The forward compliment Probe set for the same gene is usually the one desired for Expression Analysis and is referenced in this field. Reverse Compliment Probe Sets are discussed in Transcript Assignment for NetAffx? Annotation whitepaper.

Genomic Alignment of Consensus/Exemplar Sequence

The current genomic location of the probe set?s consensus or exemplar sequence is described in this section. The genomic location of the probe set is given.

Assembly – The Genome Assembly version of the current update is listed here.

Alignments – Chromosomal coordinates and cytoband location and properties (identity and coverage) of the alignment of the consensus to the genome. Genome Browser views of the alignment are available through links as well.

Public Domain and Genome References

This section provides a summary of the annotation and transcript record from a number of public domain Databases. The entries depend on the species to which the probe set is associated.

This section provides a summary of the annotation and transcript record from a number of public domain Databases. The entries depend on the species to which the probe set is associated.

Gene Title – The gene name is usually extracted from the Gene or UniGene databases. In some cases, specialty databases (such as WormBase, etc.) may provide the gene name.

Gene Symbol – Gene symbols are derived by different organizations for different species. Affymetrix data comes from the UniGene record for UniGene based arrays such as human, mouse, and rat. For arrays that are not based on the UniGene database, Affymetrix obtains the gene symbol from various sources including: FlyBase, WormBase, and Saccharomyses Genome Database.

Chromosomal location – The cytoband location of the Gene derived from the UniGene record, as available. This location may vary from the Genomic location given for the Consensus/Exemplar Sequence given elsewhere on the page.

EC Number – Derived from the NCBI or ENSEMBL entry, the Enzyme Commission (EC) family number describes enzymatic activity of the gene. The EC number is a hierarchical description of enzymatic activity with up to four levels in this format: A.B.C.D.

  • The first level (A) describes the substrate class for the enzyme.
  • The second level (B) describes the chemical donor the enzyme uses.
  • The third level (C) describes the chemical acceptor the enzyme uses.
  • The fourth level (D) describes the specific family of enzymes.

For example, the three numbers in the EC designation of 1.1.5, respectively, describe an Oxidoreductase, acting on CH-OH donor groups with a quinine or similar compound as an acceptor.

The full description of the EC number can be found at the Expasy.org site.

OMIM – A link to the gene?s description in Online Mendelian Inheritance in Man, a hand-curated database of disease and genetic disorders, biomedical and biochemical information, and phenotypes associated with known human genes. OMIM indexes give the NetAffx user access to detailed descriptions of biomedical research associated with their genes of interest. Only available probe sets to human genes.

MeSH terms (not currently included ) – Medical Subject Headings are controlled vocabulary of biomedical and heath terms linked the NCBI transcripts. Examples of MeSH terms are: Arteriosclerosis, Osteosarcoma, Coronary Disease, Inflammation, Leukemia, and Bipolar Disorder. Although MeSH terms are too numerous to display on the probe set details page, they are searchable in the All Descriptions field in the Standard Query.

Transcript Accessions
Transcripts from various sources may appear in this section, varying depending on the species origin of the transcripts that the probe set detects.

AGI ID – A uniform, gene nomenclature system for Arabidopsis created by the Arabidopsis Genome Initiative (AGI). AGI is an international effort to sequence the complete Arabidopsis genome.

AGI ID's are based on the following format: At = organism 1, 2, 3, 4, 5 = chromosome g = gene 00010 = gene id.

Ensembl ID – A transcript identifier from the ENSEMBL project.

FLYBASE – A locus name from FlyBase: A database of the drosophila genome.

MGI ID – A locus identifier from the Mouse Genome Informatics (MGI) database.

RefSeq – Reference Sequences (RefSeq) are obtained from NCBI?s nonredundant and comprehensive sequence collection.

RGD ID – A locus from the Rat Genome Database (RGD).

SGD ID – A locus from the Saccharomyces Genome Database (SGD™).

SWISS–PROT – SWISS-PROT (sometimes known as SWALL) accession numbers of the peptide sequences corresponding to the mRNA?s in the UniGene cluster represented by the probe set.

UniGene ID – The UniGene collection of sequences.

WORMBASE – A locus name from Wormbase, a database of the genome and biology of C. elegans.

XDB – Xenopus Gene Database provides mappings between XGD IDs and Affymetrix probe set IDs.

Functional Annotations

This section contains annotations of gene function culled from the transcripts assigned to this probe set.

Pathways ? Known gene Pathways related to the transcripts for this probe set from GenMAPP.org. Signaling and metabolic pathways are groups of genes known to work together in the cell, allowing the user to connect. The NetAffx Analysis site offers links to GenMAPP, where GenMAPP?s desktop application can also combine expression data overlaid with pathway diagrams.

QTL – (Quantitative Trait Loci) Genetic linkage data that provide disease associations for some loci. This data comes from RatMap at the Rat Genome Database; so these annotations only appear on Rat arrays.

Gene Ontology Annotations – Functional annotations for the product, encoded in the Gene Ontology, a biological function language maintained by the Gene Ontology™ Consortium.

GO is divided into three major sections:

  • Biological Process – functional terms related to cell biological processes (e.g. signal transduction and amino acid metabolism)
  • Cellular Component – functions related to cell physiological terminology (e.g. mitochondrial and proteome).
  • Molecular Function – functional terms related to biochemical terminology (e.g. hydrolase activity and hormone binding).
Fields in the Gene Ontology Annotations section are:
  • ID – A unique integer assigned to each GO term by the Gene Ontology Consortium.
  • Description – The description of the GO term that corresponds directly to the GO ID.
  • Evidence – The GO evidence type that substantiates a GO term assignment for a public mRNA as described here is labeled direct or extended.

  • Direct refers to the type of evidence provided by the GO Consortium for their curation of a relationship between the Entrez Gene accession and the GO term. Extended refers to evidence that the ontology term was curated by Affymetrix Inc. based on similarity with genes that are annotated by the GO Consortium. Data in this field relates the GO term to a Pfam or EC number related to the reference sequence.

Protein Domains
All the protein domains identified through Pfam Families A for the transcripts are described here.

InterPro – This field gives the InterPro domain number and description.

PFAM – PFAM (protein families), an extensive database of protein domains and Hidden Markov Models (HMM) designed to recognize them. Protein sequences on the NetAffx site which our entries include an ID and Description (this sentence does not make sense). The description may end with an E-value.

TMHMM – Putative Transmembrane Helix domains as identified by the TMHMM program.

Orthologs/Homologs - References to probe sets on other Affymetrix GeneChip arrays where the reference sequences on which the two probes are based have a significant amount of similarity. The data on reference sequence similarity are derived from HomoloGene and then cross-referenced to Affymetrix probe sets.

Protein Similarities
For probe sets where no gene symbol is available for the transcripts assigned, NetAffx gives BLAST results comparing uncharacterized transcripts to the non redundant transcript record.

BLASTP – BLASTP compares poorly characterized mRNA transcripts against NCBI?s non redundant (NR) transcript collection. The first three results are given. NR contains all non-redundant GenBank CDS translations, RefSeq Proteins, peptide sequences from the Protein Data Bank, SwissProt, PIR and PDF.

BLASTX – If a probe set has no known transcript associated with it, EST data from any known Unigene cluster is compared to NR using BLASTX. Unigene?s BLASTX procedure is described here.

Probe set Sequence information

This section displays the sequences of the individual probes and the target sequences of the probe set.

Target Sequence – The target sequence is the portion of the Consensus or Exemplar sequence from which the probe sequences were selected. The if desired, the BLASTn GenBank NR link will blast the target sequence against the non redundant protein database.

Probe Info – This section displays the individual probe sequences, the location on the Gene Chip Array (Probe X, Y), the starting location of the probe sequence on the Consensus/Exemplar sequence (Probe Interrogation Position), and the sense of the probe with respect to the detection target (usually an mRNA).



TECHNICAL SUPPORT
  United States / Canada
888-DNA-CHIP
(888-362-2447)
e-mail technical support
  Europe
+44 (0) 1628 552550
e-mail technical support
  Japan
+81 3-5730-8222
e-mail technical support
POPULAR DOWNLOADS
Brochure, The GeneChip® System: An Integrated Solution For Expression and DNA Analysis (pdf, 227 KB)
Brochure, RNA Expression Analysis with the GeneChip® System (pdf, 1.3 MB)
Data Sheet, Human Genome Arrays (pdf, 169 KB)
Manual, Expression Analysis Technical Manual
Manual, Data Analysis Fundamentals (pdf, 723 KB)
888-DNA-CHIP (888-362-2447) +44 (0) 1628 552550 feedback e-mail support terms of use privacy policy