NetAffx® IVT Glossary

IVT Expression GeneChip® Array Term Glossary

Probe Set Summary

The Probe Set Summary gives an at a glance report of the current annotation status for this probe set. Click on the anchor links in the report to jump to the details below.

Probe Set ID - The identifier that refers to a set of probe pairs selected to represent expressed sequences on an array. Designations are given at design time. Additional information about probe set nomenclature is available here.

The probe set names never change, but they can give you an idea of what was known about the sequence at the time of design.

_at = all the probes hit one known transcript.
_a = all probes in the set hit alternate transcripts from the same gene
_s = all probes in the set hit transcripts from different genes
_x = some probes hit transcripts from different genes

For HG-U133, the _a designation was not used; an _s probe set on these arrays means the same as an _a on any of the HG-U133 arrays.

GeneChip® Array - The GeneChip probe array where the probe set is located. For more information, see the Array Product Page.

Organism Common Name - The genus and species of the organism represented by the probe set. This may be different from the organism indicated by the GeneChip Array field, since some chips contain probe sets for genes from accessory organisms. See the GeneChip's Material Data Sheet to determine its contents.

Current Probe Set Information

This section describes the current information available for this probe set.

Transcript Assignments - Here is where the transcripts associated with a given probe set are displayed. The Accession (Representative Transcript), Description the number of matching Probes (most helpful for ranking Grade A assignments) and Related Probes are given.

The Related Probes field provides a link to see other probe sets that are also associated with this particular transcript, sorted by the Annotation Grade the transcript has for those other probe sets.

Annotation Description/Annotation Grade - NetAffx tracks five levels of relationships between IVT Probe sets and the current transcript record. The letter Annotation Grade corresponds to the class of evidence described in the Annotation Description Field, also summarized below. For more information see Annotation Grade below or the Transcript Assignment for NetAffx™ Annotation whitepaper.

  • Grade A - A majority probes from the probe set match this transcript perfectly.
  • Grade B - The transcript and the probe set's Target Sequence overlap on the genome. The probes do not match the transcript, presumably because the 3' end of the transcript is truncated in the record.
  • Grade C - The transcript and the probe set's Consensus/Exemplar Sequence overlap on the genome. The transcript does not overlap the target sequence presumably due to 3' end of the truncation.
  • Grade E - No transcripts are known to correspond to this probe set at this time, but a UniGene Cluster is known to correspond to it.
  • Grade R - No transcript currently supports this probe set, though EST sequences are available from the design information.

Only the transcripts with the highest available assignment grade are referenced in NetAffx. I.e. If transcripts which Match Probes (Grade A) are available, Grade B, C and matching EST data (Grades E and R) will not be displayed.

Note that because of the sheer volume of information, EST evidence which may relate to a probe set is not updated as are transcript relationships in NetAffx.

Annotation Transcript Cluster - EntrezGene or UniGene transcript clusters available for the probe set. These records may represent families of transcripts and the strongest collection of evidence for a gene related to a probe set. After the accession, the number of matching probes is given in parentheses.

Annotation Notes - Additional notes for special cases in the annotations.

  • Cross Hybridizing Probe Sets - A list of transcripts with perfect matches to the probe set but with fewer perfect matches than Grade A. Transcripts with Grade B,C assignments are removed from this list.

Reverse Complement Probe Sets - Indicates that this probe set matches to the reverse compliment of some known transcript. The forward compliment Probe set for the same gene is usually the one desired for Expression Analysis and is referenced in this field. Reverse Compliment Probe Sets are discussed in Transcript Assignment for NetAffx' Annotation whitepaper.

Transcript Accessions

Transcripts from various sources may appear in this section, varying depending on the species origin of the transcripts that the probe set detects.

AGI ID - A uniform, gene nomenclature system for Arabidopsis created by the Arabidopsis Genome Initiative (AGI). AGI is an international effort to sequence the complete Arabidopsis genome.

AGI ID's are based on the following format: At = organism 1, 2, 3, 4, 5 = chromosome g = gene 00010 = gene id.

Ensembl ID - A transcript identifier from the ENSEMBL project.

FLYBASE - A locus name from FlyBase: A database of the drosophila genome.

MGI ID - A locus identifier from the Mouse Genome Informatics (MGI) database.

RefSeq - Reference Sequences (RefSeq) are obtained from NCBI?s nonredundant and comprehensive sequence collection.

RGD ID - A locus from the Rat Genome Database (RGD).

SGD ID - A locus from the Saccharomyces Genome Database (SGD™).

SWISS-PROT - (sometimes known as SWALL) accession numbers of the peptide sequences corresponding to the mRNA's in the UniGene cluster represented by the probe set.

UniGene ID - The UniGene collection of sequences.

WORMBASE - A locus name from Wormbase, a database of the genome and biology of C. elegans.

XDB - Xenopus Gene Database provides mappings between XGD IDs and Affymetrix probe set IDs.

Alternatively Spliced Variants of Detected Transcripts

Transcripts detected by the probe set that are related by alternative splicing are described in this section. All transcript variants (a.k.a. isoforms) are listed for each associated Entrez Gene record, though not all may be detectable by a given probe set. The probe set(s) detecting each transcript isoform are indicated.

Entrez - The NCBI Entrez Gene identifier associated with the detected transcripts.

Total Refseq Isoforms - The number of transcript variants in NCBI's Refseq repository for the indicated Entrez Gene entry at the time of the current NetAffx annotation update.

Total Detected By This Probeset - The number of the total transcript variants that are detected by the probe set.

Refseq - The NCBI Refseq accession for the transcript variant.

Probesets - Listing of all probe sets that detect a given transcript variant. Click on the probe set listing to view NetAffx annotation details for each probe set.

Genomic Alignment of Consensus/Exemplar Sequence

The current genomic location of the probe set's consensus or exemplar sequence is described in this section. The genomic location of the probe set is given.

Assembly - The Genome Assembly version of the current update is listed here.

Alignments - Chromosomal coordinates and cytoband location and properties (identity and coverage) of the alignment of the consensus to the genome. Genome Browser views of the alignment are available through links as well.

Public Domain and Genome References

This section provides a summary of the annotation and transcript record from a number of public domain Databases. The entries depend on the species to which the probe set is associated.

Gene Title - The gene name is usually extracted from the Gene or UniGene databases. In some cases, specialty databases (such as WormBase, etc.) may provide the gene name.

Gene Symbol - Gene symbols are derived by different organizations for different species. Affymetrix data comes from the UniGene record for UniGene based arrays such as human, mouse, and rat. For arrays that are not based on the UniGene database, Affymetrix obtains the gene symbol from various sources including: FlyBase, WormBase, and Saccharomyses Genome Database.

Chromosomal location - The cytoband location of the Gene derived from the UniGene record, as available. This location may vary from the Genomic location given for the Consensus/Exemplar Sequence given elsewhere on the page.

EC Number - Derived from the NCBI or ENSEMBL entry, the Enzyme Commission (EC) family number describes enzymatic activity of the gene. The EC number is a hierarchical description of enzymatic activity with up to four levels in this format: A.B.C.D.

  • The first level (A) describes the substrate class for the enzyme.
  • The second level (B) describes the chemical donor the enzyme uses.
  • The third level (C) describes the chemical acceptor the enzyme uses.
  • The fourth level (D) describes the specific family of enzymes.

For example, the three numbers in the EC designation of 1.1.5, respectively, describe an Oxidoreductase, acting on CH-OH donor groups with a quinine or similar compound as an acceptor.

The full description of the EC number can be found at the site.

OMIM - A link to the gene's description in Online Mendelian Inheritance in Man, a hand-curated database of disease and genetic disorders, biomedical and biochemical information, and phenotypes associated with known human genes. OMIM indexes give the NetAffx user access to detailed descriptions of biomedical research associated with their genes of interest. Only available probe sets to human genes.

MeSH terms (not currently included ) - Medical Subject Headings are controlled vocabulary of biomedical and heath terms linked the NCBI transcripts. Examples of MeSH terms are: Arteriosclerosis, Osteosarcoma, Coronary Disease, Inflammation, Leukemia, and Bipolar Disorder. Although MeSH terms are too numerous to display on the probe set details page, they are searchable in the All Descriptions field in the Standard Query.

Functional Annotations

This section contains annotations of gene function culled from the transcripts assigned to this probe set.

Pathways' Known gene Pathways related to the transcripts for this probe set from Signaling and metabolic pathways are groups of genes known to work together in the cell, allowing the user to connect. The NetAffx Analysis site offers links to GenMAPP, where GenMAPP's desktop application can also combine expression data overlaid with pathway diagrams.

QTL - (Quantitative Trait Loci) Genetic linkage data that provide disease associations for some loci. This data comes from RatMap at the Rat Genome Database; so these annotations only appear on Rat arrays.

Gene Ontology Annotations - Functional annotations for the product, encoded in the Gene Ontology, a biological function language maintained by the Gene Ontology™ Consortium.

GO is divided into three major sections:

  • Biological Process - functional terms related to cell biological processes (e.g. signal transduction and amino acid metabolism)
  • Cellular Component - functions related to cell physiological terminology (e.g. mitochondrial and proteome).
  • Molecular Function - functional terms related to biochemical terminology (e.g. hydrolase activity and hormone binding).

Fields in the Gene Ontology Annotations section are:

  • ID - A unique integer assigned to each GO term by the Gene Ontology Consortium.
  • Description - The description of the GO term that corresponds directly to the GO ID.
  • Evidence - The GO evidence type that substantiates a GO term assignment for a public mRNA as described here is labeled direct or extended.

Direct refers to the type of evidence provided by the GO Consortium for their curation of a relationship between the Entrez Gene accession and the GO term. Extended refers to evidence that the ontology term was curated by Affymetrix Inc. based on similarity with genes that are annotated by the GO Consortium. Data in this field relates the GO term to a Pfam or EC number related to the reference sequence. Protein Domains

All the protein domains identified through Pfam Families A for the transcripts are described here.

InterPro - This field gives the InterPro domain number and description.

PFAM - PFAM (protein families), an extensive database of protein domains and Hidden Markov Models (HMM) designed to recognize them. Protein sequences on the NetAffx site which our entries include an ID and Description (this sentence does not make sense). The description may end with an E-value.

TMHMM - Putative Transmembrane Helix domains as identified by the TMHMM program.

Orthologs/Homologs - References to probe sets on other Affymetrix GeneChip arrays where the reference sequences on which the two probes are based have a significant amount of similarity. The data on reference sequence similarity are derived from HomoloGene and then cross-referenced to Affymetrix probe sets.

Protein Similarities

For probe sets where no gene symbol is available for the transcripts assigned, NetAffx gives BLAST results comparing uncharacterized transcripts to the non redundant transcript record.

BLASTP - BLASTP compares poorly characterized mRNA transcripts against NCBI?s non redundant (NR) transcript collection. The first three results are given. NR contains all non-redundant GenBank CDS translations, RefSeq Proteins, peptide sequences from the Protein Data Bank, SwissProt, PIR and PDF.

BLASTX - If a probe set has no known transcript associated with it, EST data from any known Unigene cluster is compared to NR using BLASTX. Unigene's BLASTX procedure is described here.

Probe Design Information

A record of the evidence, compiled for the probe set. This information is provided to document the design content which led to the choice of the probe sequences, and so it does not change or update in any way and entities such as Unigene clusters and transcript or EST accessions may no longer be active in their respective databases of origin. The current biological interpretation of the probe set is given in the remainder of the Details page.

Design Date - This date precedes the release of the chip and is indicative of the date when the design data may have been drawn from its bioinformatics data sources.

Transcript ID (Array Design) - The UniGene cluster, if any, associated with the probe set at the time of design..

Sequence Type - Indicates whether the design sequence for this probe set was a Consensus or Exemplar sequence. A Consensus sequence is usually the result of a aligned cluster of EST sequences. An Exemplar sequence is a cluster that includes a representative sequence from each gene group, indicating a transcript was available at the time of design.

Representative Public ID - The accession number of the representative sequence on which the probe set is based. For UniGene based arrays, this is usually a GenBank, dbEST or RefSeq accession used for sequence selection. Refer to the Sequence Source field under the Sequence section to determine the database used.

Target Description - Information accumulated about the probe set and the transcription evidence available at the time of design.

Archival Unigene Cluster - The archival UniGene cluster is the base name of the UniGene cluster used as evidence to design this probe set, which may have evolved since design time.

Cluster Evidence - An inventory of the sequence evidence that composed any UniGene Clusters that were referenced in the design of this probe set.

Probe Selection Region Evidence - An inventory of the sequence evidence used in the probe set design, including the Unigene or RefSeq consensus sequences.

Probe Set Sequence Information

This section displays the sequences of the individual probes and the target sequences of the probe set.

Target Sequence - The target sequence is the portion of the Consensus or Exemplar sequence from which the probe sequences were selected. The if desired, the BLASTn GenBank NR link will blast the target sequence against the non redundant protein database.

Cluster Members - accesses the sequences originally clustered as evidence for transcription when the probe set was designed. This link is especially useful for probe sets resulting from clustering of EST sequences when other sequence data may not be available.

Consensus/Exemplar Sequence - is sequenced used at the time of design to represent the transcript that the GeneChip' probe set measures. A consensus sequence results from base-calling algorithms that align and combine sequence data into groups. An exemplar sequence is a representative cDNA sequence for each gene.

Group Members - If family of known transcripts are known for a probe set when the array is designed, they are compiled into the Group Member list. Typically, group members are found with _a and _s probe sets when a group of sequences are associated with one probe set.

Probe Info - This section displays the individual probe sequences, the location on the Gene Chip Array (Probe X, Y), the starting location of the probe sequence on the Consensus/Exemplar sequence (Probe Interrogation Position), and the sense of the probe with respect to the detection target (usually an mRNA).

