MANUAL: apt-geno-qc (1.14.3)

Contents

Introduction

apt-geno-qc is a program for making statistical assessments of experimental qualities from Affymetrix SNP microarrays using the model based algorithm DM.

For assessing Axiom™ arrays see Axiom™ Single-Sample QC Analysis.

As DM is a single sample model based algorithm it processes one CEL file a time in a multiple CEL file batch. Since DM requires MM existence apt-geno-qc only operates on SNPs tiled with mismatches, therefore, apt-geno-qc works on all SNPs on Mapping500K, but only works on a small subset of SNPs on SNP5 and SNP6 tiled for QC purposes (FQC SNPs).

The current option to output DM calls (see --dm-out) is experimental. This feature may go away or it may be moved into a separate program to generate DM calls.

Quick Start

We illustrate the most basic way to run apt-geno-qc with an example. This example runs an analysis under the default parameter settings to generate a report file containing QC statistics for each CEL file with each specified method (or same method with different parameters). (Library files can be downloaded from http://www.affymetrix.com/estore/browse/level_one_category_template_one.jsp?category=35796)

An Axiom ™ example:

  apt-geno-qc \
    --analysis-files-path /library/file/path \
    --xml-file Axiom_GW_Hu_SNP.r4.apt-geno-qc.AxiomQC1.xml \
    --cel-files cel.txt \
    --out-file qc.txt

For more details on the use of apt-geno-qc with Axiom ™ arrays see the vignette on Axiom ™ Single-Sample QC Analysis.

A SNP6 example:

apt-geno-qc \
  --cdf-file GenomeWideSNP_6.cdf \
  --qca-file GenomeWideSNP_6.r2.qca \
  --qcc-file GenomeWideSNP_6.r2.qcc \
  --chrX-probes GenomeWideSNP_6.chrXprobes \
  --chrY-probes GenomeWideSNP_6.chrYprobes \
  --cel-files cel_file_list.txt \
  --out-file results.txt

A SNP5 example:

apt-geno-qc \
  --cdf-file GenomeWideSNP_5.cdf \
  --qca-file GenomeWideSNP_5.qca \
  --qcc-file GenomeWideSNP_5.qcc \
  --cel-files cel_file_list.txt \
  --out-file results.txt

A 500K example:

apt-geno-qc \
  --cdf-file Mapping250K_Sty.cdf \
  --qca-file Mapping250K_Sty.qca \
  --qcc-file Mapping250K_Sty.qcc \
  --cel-files cel_file_list.txt \
  --out-file results.txt

The Report File

apt-geno-qc creates a report file with file name specified by --out-file option. The report file contains QC stats to measure the experimental quality for each CEL file. The format of the file is tab-delimited text with a header followed by a column header, then a line for each CEL file analyzed and a column for each method specified in the configuration file. The header contains information of which CDF file is used and what kind of methods are used for generating QC stats, and parameters associated with each method. The column entries are:

  1. The CEL file name.
  2. The stats for each method, variable number of columns equal to number of methods in the configuration file

Options:

apt-geno-qc - a single-chip-based genotyping chip quality control tool,
it reports one or more metrics for each chip analyzed.  The metrics
computed are determined by one of the supplied input files (the qca file).

usage:
  apt-geno-qc --cdf-file my_chip.cdf --qcc-file my_chip.qcc \
    --qca-file my_chip.qca --out-file my_output.txt \
    --cel-files my_cel_files.txt

  apt-geno-qc -c my_chip.cdf -q my_chip.qcc -a my_chip.qca \
    --out-file my_output.txt --cel-files my_cel_files.txt


options:
 Common Options (not used by all programs)
   -h, --help                           Display program options and extra
                          documentation about possible analyses. See
                          -explain for information about a specific
                          operation. [default 'false'] 
   -v, --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --console-off Turn off the default messages to the 
                          console but not logging or sockets. 
                          [default 'false'] 
     --use-socket Host and port to print messages over in
                          localhost:port format [default ''] 
     --version Display version information. [default
                          'false'] 
   -f, --force Disable various checks including chip 
                          types. Consider using --chip-type option
                          rather than --force. [default 'false'] 
     --throw-exception Throw an exception rather than calling
                          exit() on error. Useful for debugging. This
                          option is intended for command line use
                          only. If you are wrapping an Engine and 
                          want exceptions thrown, then you should 
                          call Err::setThrowStatus(true) to ensure
                          that all Err::errAbort() calls result in an
                          exception. [default 'false'] 
     --analysis-files-path Search path for analysis library files. 
                          Will override AFFX_ANALYSIS_FILES_PATH
                          environment variable. [default ''] 
     --xml-file Input parameters in XML format (Will
                          override command line settings). [default
                          ''] 
     --temp-dir Directory for temporary files when working
                          off disk. Using network mounted drives is
                          not advised. When not set, the output 
                          folder will be used. The defaut is 
                          typically the output directory or the
                          current working directory. [default ''] 
   -o, --out-dir Directory for output files. Defaults to
                          current working directory. [default '.'] 
     --log-file The name of the log file. Generally 
                          defaults to the program name in the out-dir
                          folder. [default ''] 
 Engine Options (Not used on command line)
     --command-line The command line executed. [default ''] 
     --exec-guid The GUID for the process. [default ''] 
     --program-name The name of the program [default ''] 
     --program-company The company providing the program [default
                          ''] 
     --program-version The version of the program [default ''] 
     --program-cvs-id The CVS version of the program [default ''] 
     --version-to-report The version to report in the output files.
                          [default ''] 
     --free-mem-at-start How much physical memory was available when
                          the engine run started. [default '0'] 
     --meta-data-info Meta data in key=value pair that will be
                          output in headers. [default ''] 
 Input Options
   -c, --cdf-file File defining probe sets. [default ''] 
     --spf-file SPF File defining probe sets. [default ''] 
   -q, --qcc-file File defining QC probesets. [default ''] 
   -a, --qca-file File defining QC analysis methods. [default
                          ''] 
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
     --chrX-probes File containing probe_id (1-based) of 
                          probes on chrX. Used for copy number probe
                          chrX/Y ratio gender calling. [Experimental]
                          [default ''] 
     --chrY-probes File containing probe_id (1-based) of 
                          probes on chrY. Used for copy number probe
                          chrX/Y ratio gender calling. [Experimental]
                          [default ''] 
     --chrZ-probes File containing probe_id (1-based) of 
                          probes on chrZ. Used for copy number probe
                          chrZ/W ratio avian gender calling.
                          [Experimental] [default ''] 
     --chrW-probes File containing probe_id (1-based) of 
                          probes on chrW. Used for copy number probe
                          chrZ/W ratio avian gender calling.
                          [Experimental] [default ''] 
     --probe-class-file File containing probe_id (1-based) of 
                          probes and a 'class' designation. Used to
                          compute mean probe intensity by class for
                          report file. [default ''] 
     --target-sketch Sketch file. [default ''] 
     --channel-file Channel file. [default ''] 
     --reagent-kit-discriminator list of probeset names with pc1s and means
                          to use for classifying the reagent kits.
                          [default ''] 
 Output Options
     --out-file Name to use for the output file. [default
                          'apt-geno-qc.report.txt'] 
     --dm-out Folder to use for DM output. Enables DM
                          output. One per CEL file. [experimental]
                          [default ''] 
 Analysis Options
     --dm-het-mult DM Het Mult to use for DM output. [default
                          '1.25'] [default '1.25'] 
     --dm-thresh DM threshold to use for making no calls.
                          [default '0.33'] [default '0.33'] 
     --female-thresh Threshold for calling females when using
                          cn-probe-chrXY-ratio or 
                          cn-probe-chrZW-ratio method. [default
                          '0.48'] 
     --male-thresh Threshold for calling females when using
                          cn-probe-chrXY-ratio or 
                          cn-probe-chrZW-ratio method. [default
                          '0.71'] 
 Engine Options (Not used on command line)
     --cels Cel files to process. [default ''] 
     --time-start The time the engine run was started 
                          [default ''] 
     --time-end The time the engine run ended [default ''] 
     --time-run-minutes The run time in minutes. [default ''] 
     --analysis-guid The GUID for the analysis run. [default ''] 

Frequently Asked Questions

Q. What is a probe_id?

A. See the FAQ item on probe IDs for more info.