apt-canary is the Affymetrix Power Tools (APT) implementation of the Canary clustering algorithm for calling genotypes of predefined copy number variable (CNV) regions. The Canary algorithm was developed by David Altshuler's group at The Broad Institute as part of a larger package called Birdsuite. At the time of this non-official release the site www.broad.mit.edu/mpg/birdsuite/.
The APT implementation requires files with prior information on how probe intensity summaries of individual CNV regions will cluster. For a non-prior implementation see the release at the Broad Institute www.broad.mit.edu/mpg/birdsuite/.
Several input files are required by apt-canary. These input files are provided by affymetrix at the GenomeWideSNP_6 array support page (www.affymetrix.com).
The region file, GenomeWideSNP_6.canary-v1.region, contains the names of CNV regions and lists of both copy number and SNP probes, designated as Smart Probes by the Broad Institue, matching the regions.
The prior file, GenomeWideSNP_6.canary-v1.prior, the names of CNV regions as well as empirically derived prior information about cluster location, dispersion(variance) and relative frequency of membership for each cluster for each CNV region.
The normalization file, GenomeWideSNP_6.canary-v1.normalization, contains a list of probes used for chip-by-chip scale normalization of probe intensities.
To run canary with the above input files the correct CDF file to use is GenomeWideSNP_6.cdf. CEL files should be compatible with this CDF.
CNV maps alternative to those derived from the Broad Institute's set of CNV regions can be implemented by supplying the appropriate region and prior files. Clustering patterns of CNV regions and consequently the information in prior files are sensitive to the set of probes selected for a CNV region. For this reason, the user should be wary of any results got by improvement of probe selection applied to the region file without recomputing priors.
To run canary on a set of cel files using the default algorithm parameters use:
apt-canary \ --out-dir canary-results \ --cdf-file ../regression-data/data/lib/GenomeWideSNP_6/GenomeWideSNP_6.cdf \ --cnv-region-file inputs/GenomeWideSNP_6.canary-v1.region \ --cnv-normalization-file inputs/GenomeWideSNP_6.canary-v1.normalization \ --cnp-prior-file inputs/GenomeWideSNP_6.canary-v1.prior \ --cnv-map-file inputs/GenomeWideSNP_6.canary-v1.bed \ --cel-files inputs/celfiles.txt
apt-canary - Call copy number states for defined regions using the canary algorithm
options:
Common Options (not used by all programs)
-h, --help Display program options and extra
documentation about possible analyses. See
-explain for information about a specific
operation. [default 'false']
-v, --verbose How verbose to be with status messages 0 -
quiet, 1 - usual messages, 2 - more
messages. [default '1']
--console-off Turn off the default messages to the
console but not logging or sockets.
[default 'false']
--use-socket Host and port to print messages over in
localhost:port format [default '']
--version Display version information. [default
'false']
-f, --force Disable various checks including chip
types. Consider using --chip-type option
rather than --force. [default 'false']
--throw-exception Throw an exception rather than calling
exit() on error. Useful for debugging. This
option is intended for command line use
only. If you are wrapping an Engine and
want exceptions thrown, then you should
call Err::setThrowStatus(true) to ensure
that all Err::errAbort() calls result in an
exception. [default 'false']
--analysis-files-path Search path for analysis library files.
Will override AFFX_ANALYSIS_FILES_PATH
environment variable. [default '']
--xml-file Input parameters in XML format (Will
override command line settings). [default
'']
--temp-dir Directory for temporary files when working
off disk. Using network mounted drives is
not advised. When not set, the output
folder will be used. The defaut is
typically the output directory or the
current working directory. [default '']
-o, --out-dir Directory for output files. Defaults to
current working directory. [default '.']
--log-file The name of the log file. Generally
defaults to the program name in the out-dir
folder. [default '']
Engine Options (Not used on command line)
--command-line The command line executed. [default '']
--exec-guid The GUID for the process. [default '']
--program-name The name of the program [default '']
--program-company The company providing the program [default
'']
--program-version The version of the program [default '']
--program-cvs-id The CVS version of the program [default '']
--version-to-report The version to report in the output files.
[default '']
--free-mem-at-start How much physical memory was available when
the engine run started. [default '0']
--meta-data-info Meta data in key=value pair that will be
output in headers. [default '']
Input Options
--cel-files Text file specifying cel files to process,
one per line with the first line being
'cel_files'. [default '']
--cdf-file File defining probe sets. Use either
--cdf-file or --spf-file [default '']
--spf-file File defining probe sets in spf (simple
probe format) which is like a text cdf
file. [default '']
--cnv-region-file File defining CNV regions and what
probesets to use for each CNV region.
[default '']
--cnv-prior-file File defining the canary priors for a given
CNV regions file. [default '']
--cnv-map-file File (bed format) used for visualizing CNV
regions in other applications. This arg
causes the map file name to be included in
the CHP meta info. [default '']
--cnv-normalization-file File containing probesets to use
(restricted to) for doing probe level
normalization. [default '']
--chip-type Chip types to check library and CEL files
against. Can be specified multiple times.
The first one is propigated as the chip
type in the output files. Warning, use of
this option will override the usual check
between chip types found in the library
files and cel files. You should use this
option instead of --force when possible.
[default '']
Output Options
--table-output Output matching matrices of tab delimited
genotype calls and confidences. [default
'true']
--cc-chp-output Output resulting calls in binary CHP
format. This makes one AGCC Multi Data CHP
file per cel file analyzed. [default
'false']
Analysis Options
--apt-summarize-analysis String representing analysis parameters for
the apt-probeset-summarize step a.k.a.
pre-canary [default '']
--apt-canary-analysis String representing analysis parameters for
canary. [default '']
Execution Control Options
--precision Precision after decimal place [default '4']
--analysis-name Set the name of the analysis. [default '']
--use-disk Store CEL intensities to be analyzed on
disk. [default 'true']
--disk-cache Size of intensity memory cache in millions
of intensities (when --use-disk=true).
[default '50']
Engine Options (Not used on command line)
--cels Cel files to process. [default '']
--result-files CHP file names to output. Must be paired
with cels. [default '']
--time-start The time the engine run was started
[default '']
--time-end The time the engine run ended [default '']
--time-run-minutes The run time in minutes. [default '']
--analysis-guid The GUID for the analysis run. [default '']
Canary Parameters:
The canary algorithm does copy number calling on defined
regions using priors. The following parameters are
accessible using the --apt-canary-analysis option.
Use key1=val1,key2=val2,... string format.
af-weight
TOL
hwe_tol
hwe_tol2
fraction-giveaway-0
fraction-giveaway-1
fraction-giveaway-2
fraction-giveaway-3
fraction-giveaway-4
min-fill-prop
conf-interval-half-width
inflation
min-cluster-variance
pseudopoint-factor
regularize_variance_factor
1.7.1