MANUAL: apt-copynumber-workflow (1.20.0)

Contents

Introduction

apt-copynumber-workflow is a program for find de novo copy number changes and Loss of Heterozygosity (LOH) on a per sample basis with respect to a reference set of samples. The copy number algorithm it implements assumes that the reference set comprises a mix of normal human males (with XY chromosomes) and normal human females (with XX chromosomes). The algorithms assume that in this reference for each autosomal marker (SNP or Copy Number probe) the predominant Copy Number is 2, and for the sex chromosomes the copy number is determined by the gender.

apt-copynumber-workflow implements two distinct workflows, a batch worflow that uses as a reference the set of CEL files that are input; and conceptually, a single-sample workflow that compares each CEL file to a pre-computed reference. For efficiency of computation the "single-sample workflow" operates on a set of input CEL files at a time, but the output for any CEL file is unaffected by any of the other CEL files.

Quick Start

The basic requirements for a run of apt-probeset-genotype are:

On unix systems a basic command using the default parameters to do a batch run on GenomeWide SNP 6.0 data would look like:

apt-copynumber-workflow \
	-v 1 \
	--cnchp-output true \
	--cdf-file GenomeWideSNP_6.cdf \
	--chrX-probes GenomeWideSNP_6.chrXprobes \
	--chrY-probes GenomeWideSNP_6.chrYprobes \
	--special-snps GenomeWideSNP_6.specialsnps \
	--annotation-file GenomeWideSNP_6.na28.annot.db \
	--reference-input GenomeWideSNP_6.hapmap270.na28.r1.a5.ref \
    --o results_dir \
	*.CEL

The output will consist of a report file with some summary statistics about each chip analyzed, a text file per chip, and reference file. The 'a5' is a convention used by APT to refer to binary files saved in HDF5 format.

To output cnchp files instead of text files, remove the argument --cnchp-output false To suppress text file output remove the argument --text-output

WARNING: apt-copynumber-workflow will overwrite any existing output files it finds. If you wish to keep existing results make sure to specify a different output directory name.

NOTE: On windows the DOS prompt does not support wildcard expansion and the preferred method is to supply a text file with the path to the cel files via the '--cel-files' option (see below for details of file format).

NOTE: The windows DOS prompt also does not allow a continuation of a command with the '\' character, unlike unix. So in the examples shown here the '\' character should be omitted and everything entered on a single line.

NOTE: Enabling text output will slow down the runtime. You can dump the CHP files as text after the fact using apt-chp-to-txt as an alternative to using the text output option.

To run in single-sample workflow using an existing reference, replace

--reference-output

with

--reference-input ExistingReference.a5.ref

where "ExistingReference.a5.ref" is the filename of an existing reference,

Runtime Performance

As a performance estimate, running the 270 Hapmap samples on local disk on a 8 processor 2GHz Dual-Core AMD Opteron Processor 870 with 16G of RAM on a 64-bit linux OS took 801 minutes. RAM usage was 14 GB memory.

Options:

apt-copynumber-workflow - A program to compute copy number 
results from DNA analysis arrays.

usage:
	./apt-copynumber-workflow  \
       --adapter-type-normalization true \
       --text-output false \
       --reference-output CNReference.a5 \
       --set-analysis-name TestReference \
       --cdf-file GenomeWideSNP_6.cdf \
       --chrX-probes GenomeWideSNP_6.chrXprobes \
       --chrY-probes GenomeWideSNP_6.chrYprobes \
       --special-snps GenomeWideSNP_6.specialsnps \
       --netaffx-snp-annotation-file snp_annot_2.csv \
       --netaffx-cn-annotation-file cn_annot_2.csv \
       --o results --cel-files celfiles.txt

options:
 Common Options (not used by all programs)
   -h, --help                           Display program options and extra
                          documentation about possible analyses. See
                          -explain for information about a specific
                          operation. [default 'false'] 
   -v, --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --console-off Turn off the default messages to the 
                          console but not logging or sockets. 
                          [default 'false'] 
     --use-socket Host and port to print messages over in
                          localhost:port format [default ''] 
     --version Display version information. [default
                          'false'] 
   -f, --force Disable various checks including chip 
                          types. Consider using --chip-type option
                          rather than --force. [default 'false'] 
     --throw-exception Throw an exception rather than calling
                          exit() on error. Useful for debugging. This
                          option is intended for command line use
                          only. If you are wrapping an Engine and 
                          want exceptions thrown, then you should 
                          call Err::setThrowStatus(true) to ensure
                          that all Err::errAbort() calls result in an
                          exception. [default 'false'] 
     --analysis-files-path Search path for analysis library files. 
                          Will override AFFX_ANALYSIS_FILES_PATH
                          environment variable. [default ''] 
     --xml-file Input parameters in XML format (Will
                          override command line settings). [default
                          ''] 
     --temp-dir Directory for temporary files when working
                          off disk. Using network mounted drives is
                          not advised. When not set, the output 
                          folder will be used. The defaut is 
                          typically the output directory or the
                          current working directory. [default ''] 
   -o, --out-dir Directory for output files. Defaults to
                          current working directory. [default '.'] 
     --log-file The name of the log file. Generally 
                          defaults to the program name in the out-dir
                          folder. [default ''] 
 Engine Options (Not used on command line)
     --command-line The command line executed. [default ''] 
     --exec-guid The GUID for the process. [default ''] 
     --program-name The name of the program [default ''] 
     --program-company The company providing the program [default
                          ''] 
     --program-version The version of the program [default ''] 
     --program-cvs-id The CVS version of the program [default ''] 
     --version-to-report The version to report in the output files.
                          [default ''] 
     --free-mem-at-start How much physical memory was available when
                          the engine run started. [default '0'] 
     --meta-data-info Meta data in key=value pair that will be
                          output in headers. [default ''] 
 Input Options
     --config-file The configuration file name as passed from
                          GTC or the Cyto Browser. [default ''] 
     --reference-input Input reference file name. [default ''] 
     --cdf-file File defining probe sets. [default ''] 
     --spf-file spf format file defining probe sets.
                          [default ''] 
     --qcc-file File defining QC probesets. [default ''] 
     --qca-file File defining QC analysis methods. [default
                          ''] 
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
     --special-snps File containing all snps of unusual copy
                          (chrX,mito,Y) [default ''] 
     --chrX-probes File containing probe_id (1-based) of 
                          probes on chrX. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --chrY-probes File containing probe_id (1-based) of 
                          probes on chrY. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --target-sketch File specifying a target distribution to 
                          use for quantile normalization. [default 
                          ''] 
     --use-feat-eff File defining a feature effect for each
                          probe. Note that precomputed effects should
                          only be used for an appropriately similar
                          analysis (i.e. feature effects for pm-only
                          may be different than for pm-mm). [default
                          ''] 
     --read-models-brlmmp File to read precomputed BRLMM-P snp
                          specific models from. [default ''] 
     --minSegSeparation Value used to skip over centromere in LOH.
                          [default '1000000000'] 
 Output Options
     --reference-output Output reference file name. [default ''] 
     --file-name-prefix CYCHP file name prefix. [default ''] 
     --file-name-suffix CYCHP file name suffix. [default ''] 
     --file-name-ext CYCHP file name extension. [default 
                          'cychp'] 
 Analysis Options
     --adapter-type-normalization Adapter Type Normalization option. true =
                          perform adapter type normalization. 
                          [default 'true'] 
     --normalization-type Normalization option. 0 = none, 1 =
                          'quant-norm', 2 = 'med-norm.target=1000'
                          [default '1'] 
     --adapter-parameters Parameters to use when running adapter type
                          normalization. [default ''] 
     --brlmmp-parameters Parameters to use when running brlmmp.
                          [default ''] 
     --allele-peaks-reporter-method String representing allele peaks reporter
                          pathway desired. [default
                          'allele-peaks-reporter-method'] 
     --gc-correction-bin-count The number of bins to use for GC content.
                          [default '25'] 
     --allele-peaks-kernel Allele Peaks Kernel [default 'Gaussian
                          Kernel'] 
     --prior-size How many probesets to use for determining
                          prior. [default '10000'] 
 Misc Options
     --explain Explain a particular operation (i.e.
                          --explain cn-state or --explain loh).
                          [default ''] 
 Execution Control Options
     --mem-usage How many MB of memory to use for this run.
                          [default '0'] 
     --use-disk Store CEL intensities to be analyzed on
                          disk. [default 'true'] 
     --disk-cache Size of intensity memory cache in millions
                          of intensities (when --use-disk=true).
                          [default '50'] 
 Advanced Options
     --run-geno-qc Run the GenoQC engine. [default 'false'] 
     --run-probeset-genotype Run the Probeset Genotype engine. (For
                          testing purposes only.) [default 'true'] 
     --wave-correction-reference-method String representing wave correction pathway
                          desired. [default 'none'] 
     --keep-intermediate-data Set to true, this option will keep all,
                          intensity values computed while invoking 
                          any intensity adjustment method. [default
                          'false'] 
     --reference-chromosome Reference chromosome [default '2'] 
     --xx-cutoff XX cutoff [default '0.8'] 
     --xx-cutoff-high XX cutoff high [default '1.07'] 
     --y-cutoff Y cutoff [default '0.65'] 
     --wave-correction-log2ratio-adjustment-method String representing wave correction
                          log2ratio adjustment pathway desired.
                          [default 'none'] 
     --waviness-block-size marker count [default '50'] 
     --waviness-genomic-span genomic segment length [default '0'] 
     --cn-calibrate-parameters SmoothSignal calibration parameters 
                          [default ''] 
 Engine Options (Not used on command line)
     --cels CEL files to process. [default ''] 
     --arrs ARR files to process. Must be paired with
                          cels. [default ''] 
     --result-files Names (with path) of CHP files to output.
                          Must be paired with cels. [default ''] 
     --male-gender-ratio-cutoff Male gender ratio cutoff [default '0.71'] 
     --female-gender-ratio-cutoff Female gender ratio cutoff [default '0.48'] 
 Additional CNReferenceEngine Options
     --probeset-ids Tab delimited file with column 
                          'probeset_id' specifying probesets to
                          summarize. [default ''] 
     --annotation-file NetAffx Annotation file. [default ''] 
     --xChromosome X Chromosome [default '24'] 
     --yChromosome Y Chromosome [default '25'] 
 Additional CNLog2RatioEngine Options
   -a, --analysis String representing analysis pathway
                          desired. [default ''] 
     --delete-files Delete extra output files after the run has
                          completed. [default 'false'] 
     --log2-input Input Allele Summaries are in log2. 
                          [default 'false'] 
     --gc-content-override-file Input file used to override the GC content
                          read from the annotation file (Two columns
                          with header line, ProbeSetName/GCContent).
                          [default ''] 
 Additional CNAnalysisEngine Options
     --geno-qc-file The file output from GenoQC. [default ''] 
     --cyto2 Processing CYTO2 chip. [default 'false'] 
     --array-name Array name or type to use. [default ''] 
     --set-analysis-name Analysis name to use as prefix for output
                          files. [default ''] 
     --text-output Output data in ASCII text format in 
                          addition to calvin format. [default 
                          'false'] 
     --cnchp-output Report CNCHP files [default 'true'] 
     --cychp-output Report CYCHP files [default 'false'] 
     --time-start The time the engine run was started 
                          [default ''] 
     --time-end The time the engine run ended [default ''] 
     --time-run-minutes The run time in minutes. [default ''] 
     --analysis-guid The GUID for the analysis run. [default ''] 

Data transformations:
   pdnn-reference-method CopyNumber PDNN 
   wave-correction-reference-methodCopyNumber WaveCorrection 
   additional-waves-reference-methodCopyNumber AdditionalWaves 
   pdnn-intensity-adjustment-methodCopynumber PDNN Intensity
                         Adjustment 
   high-pass-filter-intensity-adjustment-methodCopyNumber
                         HighPassFilter 
   wave-correction-log2ratio-adjustment-methodCopynumber Wave
                         Correction Log2Ratio Adjustment 
   log2ratio-adjustment-method-high-pass-filterCopyNumber
                         HighPassFilter Log2Ratio Adjustment 
   cn-state              CopyNumber CNState 
   cn-cyto2              CopyNumber CNCyto2 
   log2-ratio            Copynumber Log2Ratio 
   log2-ratio-cyto2      Copynumber Log2RatioCyto2 
   allelic-difference    Copynumber AllelicDifference 
   allelic-difference-CytoScanCopynumber AllelicDifference CytoScan 
   gaussian-smooth       CopyNumber GaussianSmooth 
   genotype              Genotype 
   kernel-smooth         CopyNumber KernelSmooth 
   loh                   CopyNumber LOH 
   loh-cyto2             CopyNumber LOH Cyto2 
   lohCytoScan           CopyNumber LOH CytoScan 
   cn-neutral-loh        Copynumber CNNeutralLOH 
   normal-diploid        Copynumber NormalDiploid 
   mosaicism             Copynumber Mosaicism 
   cn-gender             Copynumber CNGender 
   cn-cyto2-gender       Copynumber CNCyto2Gender 
   cn-segment            Copynumber SegmentCN 
   loh-segment           Copynumber SegmentLOH 
   allele-peaks          Copynumber AllelePeaks 
   chipstream            Copynumber Chipstream 
   covariate-signal-adjusterCovariate Signal Adjuster 
   covariate-lr-adjuster Covariate log2ratio Adjuster 

Frequently Asked Questions

Q. Some question? For example:

	example

A. The answer.