Automating a Typical Affymetrix Workflow


The following white papers were developed to assist software developers to write applications for the Affymetrix platform.

  • Integration with Sample Databases – The first step to interacting with Affymetrix Instrument Control software is to register a sample. Sample Registration is flexible allowing either minimal information of a file name and array type or as much information as required leveraging controlled vocabulary and user-defined sample templates. Affymetrix allows numerous ways to accomplish programmatic sample registration in both GCOS and AGCC. Methods and tools exist to import one XML file per sample or a tab-delineated text file with one row per experiment. Using the SOAP interface on AGCC Workgroup, batch sample registration can occur over the internet.

  • Remote Instrumentation Monitoring – The Affymetrix auto-loader allows unattended array scanning. However, any responsible lab manager will want to keep tabs on the lab’s activities. The Affymetrix system enables email notification of system performance through custom configurable error messaging using either GCOS or AGCC.

  • Automated Monitoring of File Creation – Many bioinformatics data pipelines are initiated at the time of the creation of the file, such as the probe intensity CEL file, that feeds into the start of the pipeline. The Affymetrix Developers’ Network has suggested processes, tools, and sample code available to monitor file creation and initiate custom downstream data processing.

  • Automated Data Processing – Upon the creation of a probe intensity CEL file, individual probe intensities require data processing to create estimates of molecular levels or states within the cell. This occurs in processing data from the CEL to the CHP file level. Programmatic control of these processes vary depending on the given application.

    • Automated Data Processing for Genotyping – The Affymetrix Genotyping Console software allows users within a windows graphical user interface (GUI) to perform primary data analysis ending in CHP files. The engine that drives Genotyping Console is available in the cross-platform, open-source, command-line Affymetrix Power Tools (APT) v1.8. A data pipeline can be easily scripted on a variety of operating systems using the APT framework for 100K, 500K, SNP 5 and SNP 6 microarrays using model-based algorithms such as BRLMM, BRLMM-P, and Birdseed.

    • Automated Data Processing for Expression – Similar to genotyping, a graphical user interface application, Expression Console, is available for expression data processing. The Affymetrix Power Tools’ engine also drives Expression Console and is available for primary data analysis of all Affymetrix GeneChip expression microarrays, including our Exon and whole-transcript gene arrays. APT offers the PLIER, RMA, DABG, and IterPLIER algorithms for expression arrays.

    • Automated Data Processing for Tiling – Affymetrix Tiling Analysis Software (TAS) is available for the GUI windows user. The Affymetrix Developers’ Network offers the Tiling SDK providing ANSI C++ source code to the TAS software. This allows informaticists to incorporate the TAS algorithm into their tiling analysis data pipelines.

    • Automated Data Processing for Copy Number – Copy number workflows within a windows GUI environment are dependent on GCOS, GTYPE, and the Affymetrix Copy Number Analysis Tool (CNAT). However, the Affymetrix Developers’ Network offers the Copy Number Pipeline providing a command line set of tools to process 10K, 100K, and 500K data from CHP file to CNT file. Tools for copy number analysis on both SNP5 and SNP6 are currently in development.

  • Programmatic Access to Microarray Data – Affymetrix stores microarray data in binary files because of the storage efficiency they offer. Any developer can access these data using the open source Fusion SDK available in both ANSI C++ and JAVA. Alternatively, Affymetrix publishes its file format specifications and offers collections of sample data for the purposes of testing.

  • Automated Retrieval of Microarray Annotations – Of course, interpreting microarray data is always the largest challenge. To support this effort, the Affymetrix Developers’ Network offers the NetAffx SDK. The entire contents of NetAffx are programmatically accessible through the NetAffx SDK. NetAffx is an online collection of annotations associated with each of the Affymetrix microarray designs. NetAffx content aids researchers in translating Affymetrix identifiers into gene names, SNP IDs, or genomic coordinates. Also provided are the most current publicly available genomic annotations.

Back to Top >