MANUAL: apt-dmet-genotype (1.14.3)

Contents

Quick Start:

Genomic Samples in Single Sample Mode (Static Clustering):

Here is a basic example of analyzing genomic samples in single sample mode (static clustering) using the command line apt-dmet-genotype.

    apt-dmet-genotype \ 
        --cdf-file=DMET_Plus.v1.cdf  \
        --chrX-probes=DMET_Plus.v1.chrXprobes  \
        --chrY-probes=DMET_Plus.v1.chrYprobes  \
        --special-snps=DMET_Plus.v1.specialSNPs  \
        --reference-input=DMET_Plus.v1.genomic.ref.a5 \
        --cn-region-gt-probeset-file=DMET_Plus.v1.cn-gt.ps  \
        --probeset-ids=DMET_Plus.v1.genomic.gt.ps  \
        --region-model=DMET_Plus.v1.cn-region-models.txt  \
        --probeset-model=DMET_Plus.v1.cn-probeset-models.txt \
        --cc-chp-output \
        --sample-type=genomic  \
        --probeset-ids-reported=consent.txt  \
        --cel-files=cel_files.txt  \
        --out-dir=results

Plasmid Control Samples in Single Sample Mode (Static Clustering):

Single sample mode (static clustering) on plasmid controls requires the following changes:

        --reference-input=DMET_Plus.v1.plasmid.ref.a5 \
        --probeset-ids=DMET_Plus.v1.plasmid.gt.ps  \
        --sample-type=plasmid \

and one addition:

        --run-cn-engine=false

Genomic Samples in Multi-Sample Mode (Dynamic Clustering):

Multi-sample mode (dynamic clustering) on genomic samples requires the following additions:

        --reference-output=foo.a5 \
        --batch-name=cluster-run-1

Where batch-name is any unique name for the batch of cel files clustered together.

Options:

apt-dmet-genotype - A program for analyzing DMET 3.0 CEL files. 

usage:
   apt-dmet-genotype ... 


options:
 Common Options (not used by all programs)
   -h, --help                           Display program options and extra
                          documentation about possible analyses. See
                          -explain for information about a specific
                          operation. [default 'false'] 
   -v, --verbose How verbose to be with status messages 0 -
                          quiet, 1 - usual messages, 2 - more
                          messages. [default '1'] 
     --console-off Turn off the default messages to the 
                          console but not logging or sockets. 
                          [default 'false'] 
     --use-socket Host and port to print messages over in
                          localhost:port format [default ''] 
     --version Display version information. [default
                          'false'] 
   -f, --force Disable various checks including chip 
                          types. Consider using --chip-type option
                          rather than --force. [default 'false'] 
     --throw-exception Throw an exception rather than calling
                          exit() on error. Useful for debugging. This
                          option is intended for command line use
                          only. If you are wrapping an Engine and 
                          want exceptions thrown, then you should 
                          call Err::setThrowStatus(true) to ensure
                          that all Err::errAbort() calls result in an
                          exception. [default 'false'] 
     --analysis-files-path Search path for analysis library files. 
                          Will override AFFX_ANALYSIS_FILES_PATH
                          environment variable. [default ''] 
     --xml-file Input parameters in XML format (Will
                          override command line settings). [default
                          ''] 
     --temp-dir Directory for temporary files when working
                          off disk. Using network mounted drives is
                          not advised. When not set, the output 
                          folder will be used. The defaut is 
                          typically the output directory or the
                          current working directory. [default ''] 
   -o, --out-dir Directory for output files. Defaults to
                          current working directory. [default '.'] 
     --log-file The name of the log file. Generally 
                          defaults to the program name in the out-dir
                          folder. [default ''] 
 Engine Options (Not used on command line)
     --command-line The command line executed. [default ''] 
     --exec-guid The GUID for the process. [default ''] 
     --program-name The name of the program [default ''] 
     --program-company The company providing the program [default
                          ''] 
     --program-version The version of the program [default ''] 
     --program-cvs-id The CVS version of the program [default ''] 
     --version-to-report The version to report in the output files.
                          [default ''] 
     --free-mem-at-start How much physical memory was available when
                          the engine run started. [default '0'] 
     --meta-data-info Meta data in key=value pair that will be
                          output in headers. [default ''] 
 Input Options
     --cel-files Text file specifying cel files to process,
                          one per line with the first line being
                          'cel_files'. [default ''] 
   -c, --cdf-file File defining probe sets. Use either
                          --cdf-file or --spf-file. [default ''] 
     --spf-file File defining probe sets in spf (simple
                          probe format) which is like a text cdf 
                          file. [default ''] 
     --special-snps File containing all snps of unusual copy
                          (chrX,mito,Y) [default ''] 
     --chrX-probes File containing probe_id (1-based) of 
                          probes on chrX. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --chrY-probes File containing probe_id (1-based) of 
                          probes on chrY. Used for copy number probe
                          chrX/Y ratio gender calling. [default ''] 
     --reference-input Reference file with cluster prior
                          information. [default ''] 
   -s, --probeset-ids Tab delimited file with column 
                          'probeset_id' specifying probesets to
                          analyze. [default ''] 
     --probeset-ids-reported Tab delimited file with column 
                          'probeset_id' specifying probesets to
                          report. This should be a subset of those
                          specified with --probeset-ids if that 
                          option is used. [default ''] 
     --chip-type Chip types to check library and CEL files
                          against. Can be specified multiple times.
                          The first one is propigated as the chip 
                          type in the output files. Warning, use of
                          this option will override the usual check
                          between chip types found in the library
                          files and cel files. You should use this
                          option instead of --force when possible.
                          [default ''] 
     --region-model Regions model parameters. [default ''] 
     --probeset-model Probeset model parameters. [default ''] 
     --cn-region-gt-probeset-file Tab delimited file mapping probeset ids to
                          copynumber regions. [default ''] 
 Output Options
     --cc-chp-output Output resulting calls in directory called
                          'chp' under out-dir. This makes one AGCC
                          Multi Data CHP file per cel file analyzed.
                          [default 'false'] 
     --reference-output File to write reference values to.
                          Specifying this option will turn on dynamic
                          clustering. WARNING: Currently the 
                          resulting reference file is not really
                          usable as a reference. See the manual for
                          more info. [default ''] 
     --batch-name The name of the batch for the dynamic
                          cluster analysis. [default ''] 
 Analysis Options
     --set-analysis-name Explicitly set the analysis name. This
                          affects output file names (ie prefix) and
                          various meta info. [default 'dmet'] 
     --ps-analysis Explicitly set the ProbesetSummarizeEngine
                          analysis string. [default ''] 
     --gt-analysis Explicitly set the ProbesetGenotypeEngine
                          analysis string. [default ''] 
     --gt-qmethod-spec Explicitly set the ProbesetGenotypeEngine
                          quant spec. [default ''] 
     --sample-type Set the type of samples being processed. eg
                          genomic, plasmid. [default 'unknown'] 
     --batch-info Indicates whether or not information about
                          other cel files in the batch should be
                          reported in CHP headers. [default 'false'] 
     --null-context Indicates whether or not context info 
                          should be populated in the CHP files.
                          [default 'true'] 
     --run-cn-engine Indicates if the CN engine should be run or
                          not. [default 'true'] 
     --pra-thresh The threshold for calling PRAs based on the
                          cluster mean strength. [default '3'] 
     --geno-call-thresh The confidence threshold for reporting 
                          calls in the CHP file. [default '0.1'] 
 Gender Options
     --female-thresh Threshold for calling females when using
                          cn-probe-chrXY-ratio method. [default
                          '0.17'] 
     --male-thresh Threshold for calling females when using
                          cn-probe-chrXY-ratio method. [default
                          '0.68'] 
 Advanced Options
     --call-coder-max-alleles For encoding/decoding calls, the max number
                          of alleles per marker to allow. [default
                          '6'] 
     --call-coder-type The data size used to encode the call.
                          [default 'UCHAR'] 
     --call-coder-version The version of the encoder/decoder to use
                          [default '1.0'] 
 Execution Control Options
     --use-disk Store CEL intensities to be analyzed on
                          disk. [default 'true'] 
     --disk-cache Size of intensity memory cache in millions
                          of intensities (when --use-disk=true).
                          [default '50'] 
 Engine Options (Not used on command line)
     --cels Cel files to process. [default ''] 
     --report Probesets to report. eg consented. [default
                          ''] 
     --time-start The time the engine run was started 
                          [default ''] 
     --time-end The time the engine run ended [default ''] 
     --time-run-minutes The run time in minutes. [default ''] 
     --analysis-guid The GUID for the analysis run. [default ''] 

Dynamic vs Static Clustering

apt-dmet-genotype allows one to do both static and dynamic clustering. With dynamic clustering, the genotype calling algorithm will update cluster centers, variance, etc... before making genotype calls. In static clustering mode, the priors (cluster centers, variance, etc...) are not updated and genotype calls are based on the original input.

Dynamic clustering is enabled with the use of the --reference-output option. With this option, dynamic clustering is used. Without this option static clustering is used.

When using --reference-output, you must also provide a batch name using the --batch-name option.

WARNING: The resulting reference file when using --reference-output is not currently suitable for use with the --reference-input option. This may change in a later release.

WARNING: You should always provide a input reference file, regardless of whether you are doing dynamic or static clustering.

Sub-Engine Parameter Specification

apt-probeset-genotype allows users to pass options directly into the sub-engines it calls. These sub engines are:

One can specify these parameters on the command line by separating them from other parameters with "--". For example


  apt-probeset-genotype --cdf-file=....  ...  \
     -- [ProbesetSummarizeEngine options] -- [DmetCopyNumberEngine options]
     -- [ProbesetGenotypeEngine options] -- [DmetCHPWriter options]

See apt-engine-wrapper for getting more information about what options are available for each engine. For example, adding the following to the end of your apt-dmet-genotype command line:

   -- --summaries=true --feat-effects -- --text-output -- --table-output --feat-effects --summaries

Will result in additional text output from the first three analysis engines.