|
Researchers that are new to GeneChip® technology are often
initially overwhelmed by the amount of high-quality data generated
by GeneChip arrays. To aid users in managing and mining this wealth
of data, Affymetrix has developed a set of analysis tools that help
transform raw hybridization signal measurements into biologically
meaningful results.
Analyzing GeneChip gene expression experiments begins with
Affymetrix®
Microarray Suite software (MAS). This software tool manages
both the acquisition and processing of GeneChip-generated data,
providing a seamless transition from assay performance to data analysis.
The software also provides indicators of sample integrity, assay
execution, and hybridization performance through the assessment
of control hybridizations. Therefore, users can determine the quality
of the raw data and tailor subsequent analyses accordingly.
At its most basic level of analysis, MAS evaluates the abundance
of each transcript represented on the array and labels it as either
present, absent, or marginal. The algorithm identifies and removes
the contributions of stray hybridization signals, and combines the
results from probes that interrogate different fragments of a transcript
(see Probe
Selection and Array Design). The statistical significance of
each detection call is indicated by an associated p-value.
The algorithm described above enables users to balance sensitivity
and specificity, an important advantage. By adjusting the parameters,
researchers can change the boundaries of the present, absent and
marginal categories. If the primary goal of the experiment is to
achieve high sensitivity and avoid false negatives while tolerating
some miscalls, users can increase the parameters, requiring a less
stringent p-value. Conversely, if the primary goal is to achieve
high specificity avoiding false positives while missing a few positive
calls, users can decrease the parameters. The software also generates
an estimate of the relative abundance of each transcript.
Most gene expression studies involve comparing data from two or
more arrays. To facilitate these comparisons, MAS enables users
to designate one of their arrays as a baseline and another as experimental.
Just as in the analysis of single arrays, comparison analysis relies
on algorithms that generate a qualitative output with an associated
p-value and a quantitative metric associated with a confidence interval.
The qualitative output indicates if a transcript in the experimental
array is increased, decreased, or equivalent to its baseline counterpart.
The quantitative metric provides an estimate of the relative difference
in transcript abundance between the two arrays.
To identify the most significant results in an experiment, researchers
then use the Affymetrix
Data Mining Tool (DMT). This software tool includes algorithms
to filter and sort expression results. Users can combine results
from replicate samples and apply statistical tests, such as Mann-Whitney
and t-test analyses. DMT also enables researchers to reduce the
time spent on analysis by combining multiple array data sets into
a single virtual array.
Most importantly, the package provides two clustering algorithms
for grouping together samples or genes with similar expression patterns.
These algorithms can be used to address a variety of research questions,
such as searching for new disease classes or novel relationships
between genes.
Because expression experiments generate large amounts of data, 72
MB for a typical array, it is critical to establish consistent procedures
for data storage and management. Affymetrix has developed two systems
for creating databases: Affymetrix
MicroDB for users conducting low to moderate throughput
analyses and Affymetrix
Laboratory Information Management System (LIMS) for those performing
moderate to high throughput analyses. Both systems offer the flexibility
of the open architecture design provided by the Affymetrix
Analysis Data Model, a database schema that stores array results
in a format that can be easily recognized and used by many software
programs to analyze and exchange data.
When researchers using GeneChip arrays identify interesting results,
they often want to learn more about the probe sets used on the array.
Using the NetAffx Analysis
Center, users can correlate their GeneChip gene expression results
with array design and annotation information.
This web-based tool provides access to public databases, including
GenBank®, dbEST, RefSeq, and UniGene, as well as proprietary databases
containing probe sequences for GeneChip arrays and bioinformatics
annotations.
In addition to searching array probe sets for sequences of interest
and examining gene and protein annotations, researchers use the
NetAffx Analysis Center to sort transcripts by various criteria,
such as involvement in metabolic pathways or disease association.
Using this tool in conjunction with the CustomExpress
array design program, researchers can design and order custom
arrays with up to 1,000 probe sets online.
Analyzing, storing, and mining the bounty of genomic data generated
by GeneChip arrays can be a challenge. But, with the aid of powerful
analysis tools, such as those developed by Affymetrix, scientists
are successfully generating and mining their own GeneChip array-generated
data to identify new therapeutic targets, develop improved diagnostic
tests, and gain a better understanding of how biological systems
work.
|