MANUAL: apt-copynumber-workflow (apt-1.10.1)

Contents

Introduction

apt-copynumber-workflow is a program for find de novo copy number changes and Loss of Heterozygosity (LOH) on a per sample basis with respect to a reference set of samples. The copy number algorithm it implements assumes that the reference set comprises a mix of normal human males (with XY chromosomes) and normal human females (with XX chromosomes). The algorithms assume that in this reference for each autosomal marker (SNP or Copy Number probe) the predominant Copy Number is 2, and for the sex chromosomes the copy number is determined by the gender.

apt-copynumber-workflow implements two distinct workflows, a batch worflow that uses as a reference the set of CEL files that are input; and conceptually, a single-sample workflow that compares each CEL file to a pre-computed reference. For efficiency of computation the "single-sample workflow" operates on a set of input CEL files at a time, but the output for any CEL file is unaffected by any of the other CEL files.

Quick Start

The basic requirements for a run of apt-probeset-genotype are:

On unix systems a basic command using the default parameters to do a batch run on GenomeWide SNP 6.0 data would look like:

apt-copynumber-workflow \
    --adapter-type-normalization true \
    --reference-output results-dir/MySamplesReference.a5.ref \
    --set-analysis-name MySamples \
    --cdf-file GenomeWideSNP_6.cdf \
    --chrX-probes GenomeWideSNP_6.chrXprobes \
    --chrY-probes GenomeWideSNP_6.chrYprobes \
    --special-snps GenomeWideSNP_6.specialSNPs \
    --netaffx-snp-annotation-file GenomeWideSNP_6.na25.annot.csv \
    --netaffx-cn-annotation-file GenomeWideSNP_6.cn.na25.annot.csv \
    --delete-files true \
    --o results_dir \
    --text-output \
    --cnchp-output false \
    --cel-files *.CEL

The output will consist of a report file with some summary statistics about each chip analyzed, a text file per chip, and reference file. The 'a5' is a convention used by APT to refer to binary files saved in HDF5 format.

To output cnchp files instead of text files, remove the argument --cnchp-output false To suppress text file output remove the argument --text-output

WARNING: apt-copynumber-workflow will overwrite any existing output files it finds. If you wish to keep existing results make sure to specify a different output directory name.

NOTE: On windows the DOS prompt does not support wildcard expansion and the preferred method is to supply a text file with the path to the cel files via the '--cel-files' option (see below for details of file format).

NOTE: The windows DOS prompt also does not allow a continuation of a command with the '\' character, unlike unix. So in the examples shown here the '\' character should be omitted and everything entered on a single line.

To run in single-sample workflow using an existing reference, replace

--reference-output

with

--reference-input ExistingReference.a5.ref

where "ExistingReference.a5.ref" is the filename of an existing reference,

Runtime Performance

As a performance estimate, running the 270 Hapmap samples on local disk on a 8 processor 2GHz Dual-Core AMD Opteron Processor 870 with 16G of RAM on a 64-bit linux OS took 801 minutes. RAM usage was 14 GB memory.

Options:

apt-copynumber-workflow - A program to compute copy number 
results from DNA analysis arrays.

usage:
	apt-copynumber-workflow.exe  \
       --adapter-type-normalization true \
       --text-output false \
       --reference-output CNReference.a5 \
       --set-analysis-name TestReference \
       --cdf-file GenomeWideSNP_6.cdf \
       --chrX-probes GenomeWideSNP_6.chrXprobes \
       --chrY-probes GenomeWideSNP_6.chrYprobes \
       --special-snps GenomeWideSNP_6.specialsnps \
       --netaffx-snp-annotation-file snp_annot_2.csv \
       --netaffx-cn-annotation-file cn_annot_2.csv \
       --o results --cel-files celfiles.txt

options:
 Basic Info and Control Options
   -h, --help                           This message. [default 'false'] 
     --explain Explain a particular operation (i.e.
                          --explain cn-state or --explain loh).
                          [default ''] 
   -v, --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --version Output program version and quit. [default
                          'false'] 
 Input Options
     --xml-file Input parameters in XML format (Will
                          override command line settings). [default
                          ''] 
     --reference-input Input reference file name. [default ''] 
     --reference-output Output reference file name. [default ''] 
     --cdf-file File defining probe sets. [default ''] 
   -f, --force Don't check the chip types, just assume 
                          they match. [default 'false'] 
     --qcc-file File defining QC probesets. [default ''] 
     --qca-file File defining QC analysis methods. [default
                          ''] 
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
     --special-snps File containing all snps of unusual copy
                          (chrX,mito,Y) [default ''] 
     --chrX-probes File containing probe_id (1-based) of 
                          probes on chrX. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --chrY-probes File containing probe_id (1-based) of 
                          probes on chrY. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --target-sketch File specifying a target distribution to 
                          use for quantile normalization. [default 
                          ''] 
     --use-feat-eff File defining a feature effect for each
                          probe. Note that precomputed effects should
                          only be used for an appropriately similar
                          analysis (i.e. feature effects for pm-only
                          may be different than for pm-mm). [default
                          ''] 
     --read-models-brlmmp File to read precomputed BRLMM-P snp
                          specific models from. [default ''] 
 Output Options
   -o, --out-dir Directory to write result files into. Any
                          previous results in directory will be
                          overwritten. [default '.'] 
 Analysis Options
   -a, --analysis String representing analysis pathway
                          desired. [default ''] 
     --med-polish Use median polish summarization method
                          instead of plier. [default 'false'] 
     --adapter-type-normalization Adapter Type Normalization option. true =
                          perform adapter type normalization. 
                          [default 'true'] 
     --normalization-type Normalization option. 0 = none, 1 =
                          'quant-norm', 2 = 'med-norm.target=1000'
                          [default '1'] 
     --adapter-parameters Parameters to use when running adapter type
                          normalization. [default ''] 
     --brlmmp-parameters Parameters to use when running brlmmp.
                          [default ''] 
 Execution Control Options
     --mem-usage How many MB of memory to use for this run.
                          [default '0'] 
     --block-size How many probesets to process at once,
                          useful when memory is limited. If set to 0
                          program attempts to guess available RAM and
                          set appropriately. [default '0'] 
     --run-geno-qc Run the GenoQC engine. [default 'true'] 
     --run-probeset-genotype Run the Probeset Genotype engine. (For
                          testing purposes only.) [default 'true'] 
     --prior-size How many probesets to use for determining
                          prior. [default '10000'] 
     --use-disk Use disk based representation to avoid
                          excessive RAM use. [default 'true'] 
     --disk-dir Directory for temporary files when working
                          off disk. Using network mounted drives is
                          not advised. [default ''] 
     --disk-cache Size of memory cache when working off disk
                          in megabytes. [default '100'] 
     --arrs ARR files to process. Must be paired with
                          cels. [default ''] 
     --cychps CYCHP files to output. Must be paired with
                          cels. [default ''] 
 CNReferenceEngine Options
     --probeset-ids Tab delimited file with column 
                          'probeset_id' specifying probesets to
                          summarize. [default ''] 
     --netaffx-snp-annotation-file NetAffx SNP Annotation file. [default ''] 
     --netaffx-cn-annotation-file NetAffx CN Annotation file. [default ''] 
     --xChromosome X Chromosome [default '24'] 
     --yChromosome Y Chromosome [default '25'] 
 CNLog2RatioEngine Options
     --delete-files Delete extra output files after the run has
                          completed. [default 'false'] 
     --log2-input Input Allele Summaries are in log2. 
                          [default 'false'] 
     --median-autosome-median-normalization Perform the median autosomal median
                          normalization step. [default 'true'] 
     --yTarget Y Target [default '0.6748'] 
     --allelic-difference-outlier-trim Allele Diff Outlier Trim [default '3'] 
     --gc-correction Apply the GC correction to the Log2Ratios
                          and Allelic Differences. [default 'true'] 
     --gc-content-override-file Input file used to override the GC content
                          read from the annotation files (Two columns
                          with header line, ProbeSetName/GCContent).
                          [default ''] 
     --gc-correction-bin-count The number of bins to use when applying the
                          gc-correction. [default '25'] 
     --geno-qc-file The file output from GenoQC. [default ''] 
     --cyto2 Processing CYTO2 chip. [default 'false'] 
     --CN2Gender-MAPD-threshold The MAPD cutoff threshold for CN2 gender
                          calling. [default '0.5'] 
     --CN2Gender-male-ChrX-lower-threshold The male CN call lower threshold for
                          chromosome X CN2 gender calling. [default
                          '0.8'] 
     --CN2Gender-male-ChrX-upper-threshold The male CN call upper threshold for
                          chromosome X CN2 gender calling. [default
                          '1.3'] 
     --CN2Gender-male-ChrY-lower-threshold The male CN call lower threshold for
                          chromosome Y CN2 gender calling. [default
                          '0.8'] 
     --CN2Gender-male-ChrY-upper-threshold The male CN call upper threshold for
                          chromosome Y CN2 gender calling. [default
                          '1.2'] 
     --CN2Gender-female-ChrX-lower-threshold The female CN call lower threshold for
                          chromosome X CN2 gender calling. [default
                          '1.9'] 
     --CN2Gender-female-ChrX-upper-threshold The female CN call upper threshold for
                          chromosome X CN2 gender calling. [default
                          '2.1'] 
     --CN2Gender-female-ChrY-lower-threshold The female CN call lower threshold for
                          chromosome Y CN2 gender calling. [default
                          '0'] 
     --CN2Gender-female-ChrY-upper-threshold The female CN call upper threshold for
                          chromosome Y CN2 gender calling. [default
                          '0.4'] 
     --array-name Array name or type to use. [default ''] 
     --set-analysis-name Analysis name to use as prefix for output
                          files. [default ''] 
     --text-output Output data in ASCII text format in 
                          addition to calvin format. [default 
                          'false'] 
     --cnchp-output Report CNCHP files [default 'true'] 
     --cychp-output Report CYCHP files [default 'false'] 

Data transformations:
   cn-state          CopyNumber CNState 
   gaussian-smooth   CopyNumber GaussianSmooth 
   loh               CopyNumber LOH 
   cn-neutral-loh    Copynumber CNNeutralLOH 
   normal-diploid    Copynumber NormalDiploid 
   mosaicism         Copynumber Mosaicism 
   no-call           Copynumber NoCall 

version: apt-1.10.1 $Id: apt-copynumber-workflow.cpp,v 1.67 2008/10/24 06:08:52 awilli Exp $

Frequently Asked Questions

Q. Some question? For example:
	example

A. The answer.


Generated on Mon Nov 3 12:21:42 2008 for Affymetrix Power Tools by  doxygen 1.5.3