MANUAL: apt-canary (apt-1.10.1)

Introduction

apt-canary is the Affymetrix Power Tools (APT) implementation of the Canary clustering algorithm for calling genotypes of predefined copy number variable (CNV) regions. The Canary algorithm was developed by David Altshuler's group at The Broad Institute as part of a larger package called Birdsuite. At the time of this non-official release the site www.broad.mit.edu/mpg/birdsuite/.

The APT implementation requires files with prior information on how probe intensity summaries of individual CNV regions will cluster. For a non-prior implementation see the release at the Broad Institute www.broad.mit.edu/mpg/birdsuite/.

Input Files

Several input files are required by apt-canary. These input files are provided by affymetrix at the GenomeWideSNP_6 array support page (www.affymetrix.com).

  1. The region file, GenomeWideSNP_6.canary-v1.region, contains the names of CNV regions and lists of both copy number and SNP probes, designated as Smart Probes by the Broad Institue, matching the regions.

  2. The prior file, GenomeWideSNP_6.canary-v1.prior, the names of CNV regions as well as empirically derived prior information about cluster location, dispersion(variance) and relative frequency of membership for each cluster for each CNV region.

  3. The normalization file, GenomeWideSNP_6.canary-v1.normalization, contains a list of probes used for chip-by-chip scale normalization of probe intensities.

  4. The bed (a.k.a. map) file, GenomeWideSNP_6.canary-v1.bed, contains the names of CNV regions as well as chromosomal locations spanned by the CNV regions for NCBI build 36.1 of the human genome. The bed file is not required to run Canary. It is useful for identifying CNV regions with the genome.

To run canary with the above input files the correct CDF file to use is GenomeWideSNP_6.cdf. CEL files should be compatible with this CDF.

Custom CNV Maps

CNV maps alternative to those derived from the Broad Institute's set of CNV regions can be implemented by supplying the appropriate region and prior files. Clustering patterns of CNV regions and consequently the information in prior files are sensitive to the set of probes selected for a CNV region. For this reason, the user should be wary of any results got by improvement of probe selection applied to the region file without recomputing priors.

Quick Start

To run canary on a set of cel files using the default algorithm parameters use:

apt-canary \
  --out-dir canary-results \
  --cdf-file ../regression-data/data/lib/GenomeWideSNP_6/GenomeWideSNP_6.cdf \
  --cnv-region-file inputs/GenomeWideSNP_6.canary-v1.region \
  --cnv-normalization-file inputs/GenomeWideSNP_6.canary-v1.normalization \
  --cnp-prior-file inputs/GenomeWideSNP_6.canary-v1.prior \
  --cnv-map-file inputs/GenomeWideSNP_6.canary-v1.bed \
  --cel-files inputs/celfiles.txt

Options:

apt-canary - Call copy number states for defined regions using the canary algorithm

options:
 Basic Info and Control Options
   -h, --help                           This message. [default 'false'] 
     --explain Explain a particular operation (i.e.
                          --explain canary). [default ''] 
     --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --version Output program version and quit. [default
                          'false'] 
     --force Disable various checks including chip types
                          and map file versions. Consider using
                          --chip-type option rather than --force.
                          [default 'false'] 
 Input Options
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
     --cdf-file File defining probe sets. Use either
                          --cdf-file or --spf-file [default ''] 
     --spf-file File defining probe sets in spf (simple
                          probe format) which is like a text cdf 
                          file. [default ''] 
     --cnv-region-file File defining CNV regions and what 
                          probesets to use for each CNV region.
                          [default ''] 
     --cnp-prior-file File defining the canary priors for a given
                          CNV regions file. [default ''] 
     --cnv-map-file File (bed format) used for visualizing CNV
                          regions in other applications. This arg
                          causes the map file name to be included in
                          the CHP meta info. [default ''] 
     --cnv-normalization-file File containing probesets to use 
                          (restricted to) for doing probe level
                          normalization. [default ''] 
     --chip-type Chip types to check library and CEL files
                          against. Can be specified multiple times.
                          The first one is propigated as the chip 
                          type in the output files. Warning, use of
                          this option will override the usual check
                          between chip types found in the library
                          files and cel files. You should use this
                          option instead of --force when possible.
                          [default ''] 
 Output Options
     --out-dir Directory to write result files into. Any
                          previous results in directory will be
                          overwritten. [default '.'] 
     --table-output Output matching matrices of tab delimited
                          genotype calls and confidences. [default
                          'true'] 
     --cc-chp-output Output resulting calls in binary CHP 
                          format. This makes one AGCC Multi Data CHP
                          file per cel file analyzed. [default
                          'false'] 
 Analysis Options
     --apt-summarize-analysis String representing analysis parameters for
                          the apt-probeset-summarize step a.k.a.
                          pre-canary [default ''] 
     --apt-canary-analysis String representing analysis parameters for
                          canary. See --explain canary for more info.
                          [default ''] 
 Execution Control Options
     --block-size How many probesets to process at once,
                          useful when memory is limited. If set to 0
                          program attempts to guess available RAM and
                          set appropriately. [default '0'] 
     --precision Precision after decimal place [default '4'] 
     --analysis-name Set the name of the analysis. [default ''] 


Generated on Mon Nov 3 12:21:42 2008 for Affymetrix Power Tools by  doxygen 1.5.3