apt-copynumber-workflow is a program for find de novo copy number changes and Loss of Heterozygosity (LOH) on a per sample basis with respect to a reference set of samples. The copy number algorithm it implements assumes that the reference set comprises a mix of normal human males (with XY chromosomes) and normal human females (with XX chromosomes). The algorithms assume that in this reference for each autosomal marker (SNP or Copy Number probe) the predominant Copy Number is 2, and for the sex chromosomes the copy number is determined by the gender.
apt-copynumber-workflow implements two distinct workflows, a batch worflow that uses as a reference the set of CEL files that are input; and conceptually, a single-sample workflow that compares each CEL file to a pre-computed reference. For efficiency of computation the "single-sample workflow" operates on a set of input CEL files at a time, but the output for any CEL file is unaffected by any of the other CEL files.
The basic requirements for a run of apt-probeset-genotype are:
On unix systems a basic command using the default parameters to do a batch run on GenomeWide SNP 6.0 data would look like:
apt-copynumber-workflow \
-v 1 \
--cnchp-output true \
--cdf-file GenomeWideSNP_6.cdf \
--chrX-probes GenomeWideSNP_6.chrXprobes \
--chrY-probes GenomeWideSNP_6.chrYprobes \
--special-snps GenomeWideSNP_6.specialsnps \
--annotation-file GenomeWideSNP_6.na28.annot.db \
--reference-input GenomeWideSNP_6.hapmap270.na28.r1.a5.ref \
--o results_dir \
*.CEL
The output will consist of a report file with some summary statistics about each chip analyzed, a text file per chip, and reference file. The 'a5' is a convention used by APT to refer to binary files saved in HDF5 format.
To output cnchp files instead of text files, remove the argument --cnchp-output false To suppress text file output remove the argument --text-output
WARNING: apt-copynumber-workflow will overwrite any existing output files it finds. If you wish to keep existing results make sure to specify a different output directory name.
NOTE: On windows the DOS prompt does not support wildcard expansion and the preferred method is to supply a text file with the path to the cel files via the '--cel-files' option (see below for details of file format).
NOTE: The windows DOS prompt also does not allow a continuation of a command with the '\' character, unlike unix. So in the examples shown here the '\' character should be omitted and everything entered on a single line.
NOTE: Enabling text output will slow down the runtime. You can dump the CHP files as text after the fact using apt-chp-to-txt as an alternative to using the text output option.
To run in single-sample workflow using an existing reference, replace
--reference-output
with
--reference-input ExistingReference.a5.ref
where "ExistingReference.a5.ref" is the filename of an existing reference,
As a performance estimate, running the 270 Hapmap samples on local disk on a 8 processor 2GHz Dual-Core AMD Opteron Processor 870 with 16G of RAM on a 64-bit linux OS took 801 minutes. RAM usage was 14 GB memory.
apt-copynumber-workflow - A program to compute copy number
results from DNA analysis arrays.
usage:
./apt-copynumber-workflow \
--adapter-type-normalization true \
--text-output false \
--reference-output CNReference.a5 \
--set-analysis-name TestReference \
--cdf-file GenomeWideSNP_6.cdf \
--chrX-probes GenomeWideSNP_6.chrXprobes \
--chrY-probes GenomeWideSNP_6.chrYprobes \
--special-snps GenomeWideSNP_6.specialsnps \
--netaffx-snp-annotation-file snp_annot_2.csv \
--netaffx-cn-annotation-file cn_annot_2.csv \
--o results --cel-files celfiles.txt
options:
Common Options (not used by all programs)
-h, --help Display program options and extra
documentation about possible analyses. See
-explain for information about a specific
operation. [default 'false']
-v, --verbose How verbose to be with status messages 0 -
quiet, 1 - usual messages, 2 - more
messages. [default '1']
--console-off Turn off the default messages to the
console but not logging or sockets.
[default 'false']
--use-socket Host and port to print messages over in
localhost:port format [default '']
--version Display version information. [default
'false']
-f, --force Disable various checks including chip
types. Consider using --chip-type option
rather than --force. [default 'false']
--throw-exception Throw an exception rather than calling
exit() on error. Useful for debugging. This
option is intended for command line use
only. If you are wrapping an Engine and
want exceptions thrown, then you should
call Err::setThrowStatus(true) to ensure
that all Err::errAbort() calls result in an
exception. [default 'false']
--analysis-files-path Search path for analysis library files.
Will override AFFX_ANALYSIS_FILES_PATH
environment variable. [default '']
--xml-file Input parameters in XML format (Will
override command line settings). [default
'']
--temp-dir Directory for temporary files when working
off disk. Using network mounted drives is
not advised. When not set, the output
folder will be used. The defaut is
typically the output directory or the
current working directory. [default '']
-o, --out-dir Directory for output files. Defaults to
current working directory. [default '.']
--log-file The name of the log file. Generally
defaults to the program name in the out-dir
folder. [default '']
Engine Options (Not used on command line)
--command-line The command line executed. [default '']
--exec-guid The GUID for the process. [default '']
--program-name The name of the program [default '']
--program-company The company providing the program [default
'']
--program-version The version of the program [default '']
--program-cvs-id The CVS version of the program [default '']
--version-to-report The version to report in the output files.
[default '']
--free-mem-at-start How much physical memory was available when
the engine run started. [default '0']
--meta-data-info Meta data in key=value pair that will be
output in headers. [default '']
Input Options
--config-file The configuration file name as passed from
GTC or the Cyto Browser. [default '']
--reference-input Input reference file name. [default '']
--cdf-file File defining probe sets. [default '']
--spf-file spf format file defining probe sets.
[default '']
--qcc-file File defining QC probesets. [default '']
--qca-file File defining QC analysis methods. [default
'']
--cel-files Text file specifying cel files to process,
one per line with the first line being
'cel_files'. [default '']
--special-snps File containing all snps of unusual copy
(chrX,mito,Y) [default '']
--chrX-probes File containing probe_id (1-based) of
probes on chrX. Used for copy number probe
chrX/Y ratio gender calling. [default '']
--chrY-probes File containing probe_id (1-based) of
probes on chrY. Used for copy number probe
chrX/Y ratio gender calling. [default '']
--target-sketch File specifying a target distribution to
use for quantile normalization. [default
'']
--use-feat-eff File defining a feature effect for each
probe. Note that precomputed effects should
only be used for an appropriately similar
analysis (i.e. feature effects for pm-only
may be different than for pm-mm). [default
'']
--read-models-brlmmp File to read precomputed BRLMM-P snp
specific models from. [default '']
--minSegSeparation Value used to skip over centromere in LOH.
[default '1000000000']
Output Options
--reference-output Output reference file name. [default '']
--file-name-prefix CYCHP file name prefix. [default '']
--file-name-suffix CYCHP file name suffix. [default '']
--file-name-ext CYCHP file name extension. [default
'cychp']
Analysis Options
--adapter-type-normalization Adapter Type Normalization option. true =
perform adapter type normalization.
[default 'true']
--normalization-type Normalization option. 0 = none, 1 =
'quant-norm', 2 = 'med-norm.target=1000'
[default '1']
--adapter-parameters Parameters to use when running adapter type
normalization. [default '']
--brlmmp-parameters Parameters to use when running brlmmp.
[default '']
--allele-peaks-reporter-method String representing allele peaks reporter
pathway desired. [default
'allele-peaks-reporter-method']
--gc-correction-bin-count The number of bins to use for GC content.
[default '25']
--allele-peaks-kernel Allele Peaks Kernel [default 'Gaussian
Kernel']
--prior-size How many probesets to use for determining
prior. [default '10000']
Misc Options
--explain Explain a particular operation (i.e.
--explain cn-state or --explain loh).
[default '']
Execution Control Options
--mem-usage How many MB of memory to use for this run.
[default '0']
--use-disk Store CEL intensities to be analyzed on
disk. [default 'true']
--disk-cache Size of intensity memory cache in millions
of intensities (when --use-disk=true).
[default '50']
Advanced Options
--run-geno-qc Run the GenoQC engine. [default 'false']
--run-probeset-genotype Run the Probeset Genotype engine. (For
testing purposes only.) [default 'true']
--wave-correction-reference-method String representing wave correction pathway
desired. [default 'none']
--keep-intermediate-data Set to true, this option will keep all,
intensity values computed while invoking
any intensity adjustment method. [default
'false']
--reference-chromosome Reference chromosome [default '2']
--xx-cutoff XX cutoff [default '0.8']
--xx-cutoff-high XX cutoff high [default '1.07']
--y-cutoff Y cutoff [default '0.65']
--wave-correction-log2ratio-adjustment-method String representing wave correction
log2ratio adjustment pathway desired.
[default 'none']
--waviness-block-size marker count [default '50']
--waviness-genomic-span genomic segment length [default '0']
--cn-calibrate-parameters SmoothSignal calibration parameters
[default '']
Engine Options (Not used on command line)
--cels CEL files to process. [default '']
--arrs ARR files to process. Must be paired with
cels. [default '']
--result-files Names (with path) of CHP files to output.
Must be paired with cels. [default '']
--male-gender-ratio-cutoff Male gender ratio cutoff [default '0.71']
--female-gender-ratio-cutoff Female gender ratio cutoff [default '0.48']
Additional CNReferenceEngine Options
--probeset-ids Tab delimited file with column
'probeset_id' specifying probesets to
summarize. [default '']
--annotation-file NetAffx Annotation file. [default '']
--xChromosome X Chromosome [default '24']
--yChromosome Y Chromosome [default '25']
Additional CNLog2RatioEngine Options
-a, --analysis String representing analysis pathway
desired. [default '']
--delete-files Delete extra output files after the run has
completed. [default 'false']
--log2-input Input Allele Summaries are in log2.
[default 'false']
--gc-content-override-file Input file used to override the GC content
read from the annotation file (Two columns
with header line, ProbeSetName/GCContent).
[default '']
Additional CNAnalysisEngine Options
--geno-qc-file The file output from GenoQC. [default '']
--cyto2 Processing CYTO2 chip. [default 'false']
--array-name Array name or type to use. [default '']
--set-analysis-name Analysis name to use as prefix for output
files. [default '']
--text-output Output data in ASCII text format in
addition to calvin format. [default
'false']
--cnchp-output Report CNCHP files [default 'true']
--cychp-output Report CYCHP files [default 'false']
--time-start The time the engine run was started
[default '']
--time-end The time the engine run ended [default '']
--time-run-minutes The run time in minutes. [default '']
--analysis-guid The GUID for the analysis run. [default '']
Data transformations:
pdnn-reference-method CopyNumber PDNN
wave-correction-reference-methodCopyNumber WaveCorrection
additional-waves-reference-methodCopyNumber AdditionalWaves
pdnn-intensity-adjustment-methodCopynumber PDNN Intensity
Adjustment
high-pass-filter-intensity-adjustment-methodCopyNumber
HighPassFilter
wave-correction-log2ratio-adjustment-methodCopynumber Wave
Correction Log2Ratio Adjustment
log2ratio-adjustment-method-high-pass-filterCopyNumber
HighPassFilter Log2Ratio Adjustment
cn-state CopyNumber CNState
cn-cyto2 CopyNumber CNCyto2
log2-ratio Copynumber Log2Ratio
log2-ratio-cyto2 Copynumber Log2RatioCyto2
allelic-difference Copynumber AllelicDifference
allelic-difference-CytoScanCopynumber AllelicDifference CytoScan
gaussian-smooth CopyNumber GaussianSmooth
genotype Genotype
kernel-smooth CopyNumber KernelSmooth
loh CopyNumber LOH
loh-cyto2 CopyNumber LOH Cyto2
lohCytoScan CopyNumber LOH CytoScan
cn-neutral-loh Copynumber CNNeutralLOH
normal-diploid Copynumber NormalDiploid
mosaicism Copynumber Mosaicism
cn-gender Copynumber CNGender
cn-cyto2-gender Copynumber CNCyto2Gender
cn-segment Copynumber SegmentCN
loh-segment Copynumber SegmentLOH
allele-peaks Copynumber AllelePeaks
chipstream Copynumber Chipstream
covariate-signal-adjusterCovariate Signal Adjuster
covariate-lr-adjuster Covariate log2ratio Adjuster
Q. Some question? For example:
example
A. The answer.
1.7.1