MANUAL: apt-canary (1.14.3)

Introduction

apt-canary is the Affymetrix Power Tools (APT) implementation of the Canary clustering algorithm for calling genotypes of predefined copy number variable (CNV) regions. The Canary algorithm was developed by David Altshuler's group at The Broad Institute as part of a larger package called Birdsuite. At the time of this non-official release the site www.broad.mit.edu/mpg/birdsuite/.

The APT implementation requires files with prior information on how probe intensity summaries of individual CNV regions will cluster. For a non-prior implementation see the release at the Broad Institute www.broad.mit.edu/mpg/birdsuite/.

Input Files

Several input files are required by apt-canary. These input files are provided by affymetrix at the GenomeWideSNP_6 array support page (www.affymetrix.com).

  1. The region file, GenomeWideSNP_6.canary-v1.region, contains the names of CNV regions and lists of both copy number and SNP probes, designated as Smart Probes by the Broad Institue, matching the regions.

  2. The prior file, GenomeWideSNP_6.canary-v1.prior, the names of CNV regions as well as empirically derived prior information about cluster location, dispersion(variance) and relative frequency of membership for each cluster for each CNV region.

  3. The normalization file, GenomeWideSNP_6.canary-v1.normalization, contains a list of probes used for chip-by-chip scale normalization of probe intensities.

  4. The bed (a.k.a. map) file, GenomeWideSNP_6.canary-v1.bed, contains the names of CNV regions as well as chromosomal locations spanned by the CNV regions for NCBI build 36.1 of the human genome. The bed file is not required to run Canary. It is useful for identifying CNV regions with the genome.

To run canary with the above input files the correct CDF file to use is GenomeWideSNP_6.cdf. CEL files should be compatible with this CDF.

Custom CNV Maps

CNV maps alternative to those derived from the Broad Institute's set of CNV regions can be implemented by supplying the appropriate region and prior files. Clustering patterns of CNV regions and consequently the information in prior files are sensitive to the set of probes selected for a CNV region. For this reason, the user should be wary of any results got by improvement of probe selection applied to the region file without recomputing priors.

Quick Start

To run canary on a set of cel files using the default algorithm parameters use:

apt-canary \
  --out-dir canary-results \
  --cdf-file ../regression-data/data/lib/GenomeWideSNP_6/GenomeWideSNP_6.cdf \
  --cnv-region-file inputs/GenomeWideSNP_6.canary-v1.region \
  --cnv-normalization-file inputs/GenomeWideSNP_6.canary-v1.normalization \
  --cnp-prior-file inputs/GenomeWideSNP_6.canary-v1.prior \
  --cnv-map-file inputs/GenomeWideSNP_6.canary-v1.bed \
  --cel-files inputs/celfiles.txt

Options:

apt-canary - Call copy number states for defined regions using the canary algorithm

options:
 Common Options (not used by all programs)
   -h, --help                           Display program options and extra
                          documentation about possible analyses. See
                          -explain for information about a specific
                          operation. [default 'false'] 
   -v, --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --console-off Turn off the default messages to the 
                          console but not logging or sockets. 
                          [default 'false'] 
     --use-socket Host and port to print messages over in
                          localhost:port format [default ''] 
     --version Display version information. [default
                          'false'] 
   -f, --force Disable various checks including chip 
                          types. Consider using --chip-type option
                          rather than --force. [default 'false'] 
     --throw-exception Throw an exception rather than calling
                          exit() on error. Useful for debugging. This
                          option is intended for command line use
                          only. If you are wrapping an Engine and 
                          want exceptions thrown, then you should 
                          call Err::setThrowStatus(true) to ensure
                          that all Err::errAbort() calls result in an
                          exception. [default 'false'] 
     --analysis-files-path Search path for analysis library files. 
                          Will override AFFX_ANALYSIS_FILES_PATH
                          environment variable. [default ''] 
     --xml-file Input parameters in XML format (Will
                          override command line settings). [default
                          ''] 
     --temp-dir Directory for temporary files when working
                          off disk. Using network mounted drives is
                          not advised. When not set, the output 
                          folder will be used. The defaut is 
                          typically the output directory or the
                          current working directory. [default ''] 
   -o, --out-dir Directory for output files. Defaults to
                          current working directory. [default '.'] 
     --log-file The name of the log file. Generally 
                          defaults to the program name in the out-dir
                          folder. [default ''] 
 Engine Options (Not used on command line)
     --command-line The command line executed. [default ''] 
     --exec-guid The GUID for the process. [default ''] 
     --program-name The name of the program [default ''] 
     --program-company The company providing the program [default
                          ''] 
     --program-version The version of the program [default ''] 
     --program-cvs-id The CVS version of the program [default ''] 
     --version-to-report The version to report in the output files.
                          [default ''] 
     --free-mem-at-start How much physical memory was available when
                          the engine run started. [default '0'] 
     --meta-data-info Meta data in key=value pair that will be
                          output in headers. [default ''] 
 Input Options
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
     --cdf-file File defining probe sets. Use either
                          --cdf-file or --spf-file [default ''] 
     --spf-file File defining probe sets in spf (simple
                          probe format) which is like a text cdf 
                          file. [default ''] 
     --cnv-region-file File defining CNV regions and what 
                          probesets to use for each CNV region.
                          [default ''] 
     --cnv-prior-file File defining the canary priors for a given
                          CNV regions file. [default ''] 
     --cnv-map-file File (bed format) used for visualizing CNV
                          regions in other applications. This arg
                          causes the map file name to be included in
                          the CHP meta info. [default ''] 
     --cnv-normalization-file File containing probesets to use 
                          (restricted to) for doing probe level
                          normalization. [default ''] 
     --chip-type Chip types to check library and CEL files
                          against. Can be specified multiple times.
                          The first one is propigated as the chip 
                          type in the output files. Warning, use of
                          this option will override the usual check
                          between chip types found in the library
                          files and cel files. You should use this
                          option instead of --force when possible.
                          [default ''] 
 Output Options
     --table-output Output matching matrices of tab delimited
                          genotype calls and confidences. [default
                          'true'] 
     --cc-chp-output Output resulting calls in binary CHP 
                          format. This makes one AGCC Multi Data CHP
                          file per cel file analyzed. [default
                          'false'] 
 Analysis Options
     --apt-summarize-analysis String representing analysis parameters for
                          the apt-probeset-summarize step a.k.a.
                          pre-canary [default ''] 
     --apt-canary-analysis String representing analysis parameters for
                          canary. [default ''] 
 Execution Control Options
     --precision Precision after decimal place [default '4'] 
     --analysis-name Set the name of the analysis. [default ''] 
     --use-disk Store CEL intensities to be analyzed on
                          disk. [default 'true'] 
     --disk-cache Size of intensity memory cache in millions
                          of intensities (when --use-disk=true).
                          [default '50'] 
 Engine Options (Not used on command line)
     --cels Cel files to process. [default ''] 
     --result-files CHP file names to output. Must be paired
                          with cels. [default ''] 
     --time-start The time the engine run was started 
                          [default ''] 
     --time-end The time the engine run ended [default ''] 
     --time-run-minutes The run time in minutes. [default ''] 
     --analysis-guid The GUID for the analysis run. [default ''] 

Canary Parameters:

    The canary algorithm does copy number calling on defined
    regions using priors. The following parameters are 
    accessible using the --apt-canary-analysis option.
    Use key1=val1,key2=val2,... string format.

        af-weight 
        TOL 
        hwe_tol 
        hwe_tol2 
        fraction-giveaway-0 
        fraction-giveaway-1 
        fraction-giveaway-2 
        fraction-giveaway-3 
        fraction-giveaway-4 
        min-fill-prop 
        conf-interval-half-width 
        inflation 
        min-cluster-variance 
        pseudopoint-factor 
        regularize_variance_factor