Welcome, Guest

Exon and Gene Array Glossary

1. Overview (Summary)

The upper section of the NetAffx details page provides a condensed report of the current vital characteristics of a probe group and any assigned transcripts based on the current NetAffx annotation release. This glossary provides definitions of all terms used in the fields on details pages, arranged in alphabetical order within each subsection of the page.

Annotation Genome – The build version and date of the genome assembly used for a given NetAffx annotation release for the given array. Public domain transcripts and array design sequences (e.g., probes) are localized on this version of the genome by the NetAffx annotation pipeline to aid in the assignment of probe sets to the transcripts they detect. This genome version may differ from the Genome Source used at array design time.

Array – The GeneChip® probe array where the probe group is located. Click here for information about GeneChip® probe arrays.

Category (Probe Set Type) – Array design category of a Gene Array transcript cluster. Corresponds to the Probe Set Types on an Exon Array.

Classification (Transcript) – An indication of the degree of experimental support for a public-domain transcript based on the current NetAffx annotation release. In the overview section of a details page, the classification field contains the class of the most well-annotated transcript among all assigned transcripts for the given probe group (if any). This classification is analogous to the design-time Evidence Level classification of Exon Array probe sets, except that the transcript classification is updated to reflect the evolution of the public transcript record.

Transcript Class Description
Full-length Well-curated RefSeq and Ensembl transcripts as well as Genbank transcripts annotated as "complete CDS". Also includes RNAs from other curated sources, such as model organism databases and miRBase.
Partial Moderately supported RefSeq and Ensembl transcripts and Genbank transcripts not annotated as "complete CDS". Not all sequences in this class are partial in the sense of being incomplete (though some will be), but here the term "partial" refers mainly to the fact that the degree of experimental support for these transcripts is less complete than for transcripts in the full-length class.
EST EST-only supported transcripts. Includes the Genbank high-throughput cDNA (HTC) subset as well as all sequences from gene predictions that rely on EST data, such as transcript from the Ensembl "novel" class and UCSC genes.
Predicted Computationally predicted transcripts. Includes RefSeq model and Ensembl ab initio transcripts.

Exon Cluster – A group of one or more probe sets that cover a contiguous stretch of putatively transcribed genomic sequence, typically corresponding to an exon within a transcript. Exon clusters exist only on Exon Arrays. The exon cluster is divided into more than one probe selection region when transcript evidence implies that a splice boundary or polyadenylation site occurs within the exon. For more information, see the Exon Array Design Technical Note.

Gene Symbol and Title – Identification of any associated gene(s) whose activity is detected by probe group. Gene information is extracted from Entrez Gene or UniGene. In some cases, specialty databases may provide the gene name such as: FlyBase, WormBase, and Saccharomyces Genome Database.

Location – The genomic location of the probe group for the genome assembly build as described in the Annotation Genome section of the NetAffx details page. These coordinates begin at the first base of the first probe sequence and end at the last base of the last probe in the probe set.

Pathways – Displays the GenMAPP pathway if the transcript has been found to play a role in a proteome functional pathway in the GenMAPP collection. Pathways can be further visualized with probe set data overlaid by the GenMAPP application from genmapp.org.

Probe Group – A generic term for any grouping of related GeneChip® array probes from the array design. On Exon Arrays, a probe group can be a probe set, exon cluster, or transcript cluster.� On Gene Arrays, the only kind of probe group is the transcript cluster. NetAffx detail pages are provided for all probe groups of each type for Exon and Gene Arrays.

Probe Set – A set of synthetic oligonucleotide probes on Exon arrays that interrogate gene expression from one exon and typically contain four probes. All probe sets on an "ST" array, such as the Human Exon 1.0 ST array, are designed to anneal to the sense strand of the transcript and therefore are located on the array in an antisense orientation. Probe sets on Exon Arrays are distinguished by the cross-hybridization potential of their probes (see glossary term Hybridization Target) and are also classified into one of several Evidence Levels. On Gene Arrays, probes are grouped into transcript clusters rather than probe sets.

Species – The target organism for the probe group. This may be different from the organism indicated by the array, since there are probe sets which may contain sequences from other organisms on any given GeneChip® probe array. The organism name may vary because there are always control probe sets on an array which may be from different organisms. Some GeneChip® probe arrays are designed to read off of more than one related organism or strain. For information about any specific product of interest, see the product information under Support on affymetrix.com.

Transcript Cluster – A group of one or more exon clusters (on Exon Arrays) or probes (on Gene Arrays) covering a region of the genome reflecting all the exonic transcription evidence known for the region and corresponding to a known or putative gene. The underlying exonic evidence can come from transcripts of well-annotated genes or predicted genes. A given transcript cluster may have a variety of assigned RNAs, classified according to the NetAffx transcript classification system.

2. Transcripts Detected

This section describes the public-domain transcripts or gene predictions that should be detected by a given probe group based on computational sequence analysis performed by the NetAffx™ annotation pipeline. Assigned transcripts are listed in groups based on the NetAffx transcript classification system.

Transcript assignments are made by compiling a non-redundant set of publicly available transcript sequences from GenBank, RefSeq, and Ensembl and then using sequence alignment programs and other tools to associate them with probe sequences. Details of methods used by this assignment pipeline will be available in a whitepaper on affymetrix.com.

The NetAffx transcript assignment methods derive a relationship between probe groups and the current public transcript record. The number of transcript sequences and Expressed Sequence Tags (ESTs) available in public databases continues to evolve from the original time of design. The NetAffx website maintains a current view of public transcripts that GeneChip probe sets interrogate, updated in March, July, and November.

Accession ID (Source) – The unique identifier for the transcript, with the source database in parentheses.

Assign score – A quantitative measure how well a probe group interrogates an assigned transcript. This is determined by dividing the number of probes in the group that perfectly match a transcript (shown in numerator) by the number of probes that could potentially match the transcript (shown in denominator), then multiplying by 100. The number of potentially matching probes is computed based on the transcript-vs-genome or transcript-vs-transcript cluster alignment and the known genomic location of the probes.

Coverage – A quantitative measure of the degree of overlap between the probe group and the assigned transcript. This is determined by dividing the number of potentially matching probes (described in the assign score glossary entry) by the total number of probes in the probe group, then multiplying by 100. A low coverage number can occur with short but biologically meaningful transcript isoforms as well as with partial transcript sequences.

Gene Symbol and Title – Defined in the Overview section.

Entrez GeneID – Gene names, IDs and symbols are extracted from Entrez Gene or UniGene. In some cases, specialty databases may provide the gene name such as: FlyBase, WormBase, and Saccharomyces Genome Database.

Pathways – Defined in the Overview section.

Transcript – An RNA sequence produced from a genomic region corresponding to a known or predicted gene. The NetAffx™ annotation pipeline draws on GenBank, RefSeq, Ensembl, and other public databases to obtain currently available transcript records for mRNAs and non-protein-coding RNA genes. Transcripts with a variety of experimental support are included and are classified according to the NetAffx transcript classification system.

3. Design Information

This section provides a record of the information from the time of array design substantiating the design of the given probe group. The current biological interpretation of the probe set is given in the Summary and Transcripts Dectected sections of the Details page.

Design Date – The date the chip was released.

Exon Cluster Location – The genomic location of the exon cluster for the genome assembly used at array design time (See Genome Source). These coordinates begin at the first base of the first probe sequence and end at the last probe of the last probe set in the exon cluster.

Genome Source – The version of the genome assembly used to design the array. This version may differ from the Annotation Genome use when annotating the array for a given NetAffx release.

Number of Probe Sets (or Exon Clusters) – The number of probe sets (or Exon Clusters) in the probe group.

Number of Probes – The number of probes in this probe group. Probe sets on exon arrays contain only 1-4 probes, while exon or transcript clusters may contain up to hundreds of probe sets.

Overlapping Probes – Actual probe sequences that overlap in genomic sequence, when a probe selection region is small.

Probe Set Bounded – Probesets are grouped into transcript clusters based on spliced annotations (transcripts) which share splice sites and single exons which have overlapping exonic sequence. Remaining single exon content may be grouped into a transcript cluster if it is bounded by it (i.e. falls within an intron). This flag indicates such inclusion by bounding. Bounded probesets have lower confidence of correct transcript cluster placement.

Probe Set Location – The genomic location of the probe set for the genome assembly used at array design time (See Genome Source). These coordinates begin at the first base of the first probe sequence and end at the last probe of the probe set.

Probe Set Overlaps CDS – Indicates whether the probe set overlapped the coding sequence of a supporting transcript sequence used when designing the probe set.

Probe Set Type (Category) – Indicates the array design category of the probe set (on Exon Arrays) or transcript cluster (on Gene Arrays). The possible values are as follows:

Probe Set Type (Category) Description
Main design part of the main design
AFFX control a standard AFFX control
Chip control a chip control
Antigenomic background control contains antigenomic background probes
Genomic background control contains genomic background probes
Exonic normalization control from an exonic region of a normalization control gene
Intronic normalization control from an intronic region of a normalization control gene
Unmapped full-length transcript contains probes tiled across an mRNA transcript which either did not align to the genome, or aligned poorly

PSR Evidence – Lists the nucleotide sequences from different data sources that substantiated the creation of a probe selection region (PSR) to select probes for the given probe group at array design time. For a summary of the different data sources used to design the Exon Array, see the Evidence Level glossary entry as well as the Exon Array Design Technical Note. The searchable PSR evidence classes for the Human Exon Array include:

  • ncbi
  • fl
  • mrna
  • ensGene
  • est
  • est-fl
  • mouse-fl
  • mouse-mrna
  • rat-fl
  • mitomap
  • microRNAregistry
  • vegaGene
  • vegaPseudoGene
  • geneid
  • genscan
  • genscanSubOpt
  • exoniphy
  • rnaGene
  • sgpGene
  • twinscan

Transcript Cluster Location – The genomic location of the transcript cluster for the genome assembly used at array design time (See Genome Source). These coordinates begin at the first base of the first probe sequence and end at the last probe of the last probe set in the transcript cluster.

4. Related Probe Sets

Lists the constituent probe sets within the a given exon cluster or transcript cluster.

Evidence Level – Classifies an Exon Array probe set according to the quality of evidence supporting the transcription of the genomic sequence it is designed to interrogate. Evidence level is fixed at array design time and will not change with each NetAffx annotation release. (See glossary entry for NetAffx Transcript Classification).

Evidence Level Description
Core Refers to probe sets that are supported by the most reliable evidence from RefSeq and full-length mRNA GenBank records containing complete CDS information.
Extended Refers to probe sets that are supported by other cDNA evidence beyond what is used to support core probe sets. Extended evidence comes from other Genbank mRNAs not annotated as full-length, EST sequences, ENSEMBL gene collections, syntenically mapped mRNA from Mouse, Rat, or Human, mitoMap mitochondrial genes, microRNA registry genes, vegaGene, and vegaPseudoGene records.
Full Refers to probe sets that are supported by computational gene prediction evidence only. They are supported by gene and exon prediction algorithms including GeneID, GenScan, GenScanSubOptimal, exoniphy, RNAGene, sgpGene and Twinscan.
Free Refers to probe sets that are supported by annotations which were merged such that no single annotation (or evidence) contains the probe set.
Ambiguous Refers to probe sets that cannot be unambiguously assigned to a particular transcript cluster.

Evidence level is assigned according to the highest confidence evidence that supports the probe set. In order for a probe set to be labeled at the Core, Extended, or Full levels, it must be entirely contained within the bounds of an annotation at that level. For example, if half the probes of a probe set measure a Core annotation, but all of the probes measure an Extended annotation, then the probe set is labeled at the Extended level.

An evidence level of Ambiguous results when two different genes have overlapping transcripts. A probe set that lies within this overlap region is given an Ambiguous level tag, since the gene it belongs to cannot be determined at design time. An exception is made for Core annotations, however. If a probe set is contained within the overlap region of multiple genes, but within the Core region of only one of the genes, then the probe set is labeled Core.

A probe set is labeled Free if the probe set is not contained in any annotations.

Additional details about the probe set evidence level and grouping procedure is found in the whitepaper, Exon Probeset Annotations and Transcript Cluster Groupings.

Hybridization Target – Describes the cross-hybridization potential of the probe set determined at the time of array design. This field is based on computational sequence alignment against all known and putatively transcribed array design content, which includes all transcribed regions of the genome and other transcribed sequences that could not be mapped to the genome. This field has one of three possible values:

Hybridization Target Description
Unique All probes in the probe set perfectly match only one sequence in the putatively transcribed array design content. The vast majority (>80%) of probe sets are unique.
Identical All the probes in the probe set perfectly match more than one sequence in the putatively transcribed array design content. (This class is also refered to as "Similar".)
Mixed The probes in the probe set either perfectly match or partially match more than one sequence in the putatively transcribed array design content.

Location – The chromosome name, genomic coordinates, and strand of the probe set. This information is based on the version of the genome assembly used for the current NetAffx annotation release (see Annotation Genome).

Probe Selection Region Evidence – The number of nucleotide sequences compiled as evidence for the probes described. The evidence may have been cDNA, mRNA, EST or predicted gene sequences. For full information on the exon array design see the Exon Array Design Technical Note.

5. Sequence

This section gives the sequence data for the probe group. For Exon Array probe sets or Gene Array transcript clusters, the sequences of individual probes is provided. The sequence of a transcript cluster is the spliced-together genomic sequence of all constituent exon clusters with any intronic sequence removed. Thus the transcript cluster sequence is not necessarily a contiguous genomic sequence. For an exon cluster, the sequence is the contiguous nucleotide sequence that spans all the constituent probe selection regions.

Probe Cross-Hybridizes – Describes whether or not a probe was found at the design date that would hybridize to a transcript outside of this transcript cluster.

Probe Self Cross-Hybridizes – Describes whether or not a probe, which hybridizes to a position within this transcript cluster, was found at the design date.

Probe X – X (horizontal) coordinate for the location of the feature containing the probe on the GeneChip® array.

Probe Y – Y (vertical) coordinate for the location of the feature containing the probe on the GeneChip® array.

Location – Genomic location of the probe in the version of the genome assembly used at array annotation time (see Annotation Genome). Specified as "chromosome:start-stop (strand)".

Back to Top >