VIGNETTES: Genotype clustering for Axiom ™ Arrays

Date:
2009-10-12

Contents

Introduction

The Axiom ™ SNP-genotype clustering technology uses pre-defined SNP-specific clusters which are then dynamically adapted to the observed data. Each marker has a specific starting cluster with properties that have been learned from training data. By default, the clustering algorithm will adapt these clusters to the observed data and then make the genotype call. This section describes how to use the Axiom ™ clustering algorithm (Axiom ™ GT1) with a set of high-quality samples.

When genotyping using Axiom ™ GT1, both the pre-defined SNP-specific cluster data and observed data contribute to the final posterior cluster positions and thus to the genotype call. The Bayesian nature of the algorithm will give more weight to the observed data as the number of samples grows.

Because of the way in which information from all samples is used to update SNP cluster models it can be of negative impact to include poor quality experiments in the clustering process. Samples should first be assessed using the single-sample analysis method described in the vignette on Axiom ™ Single-Sample QC Analysis. After excluding samples with failing DQC a first-round of clustering such as described in this vignette should be performed. If any of the clustered samples have a call rate less than 97% they should be excluded and a second, final round of clustering should be performed.

It is also important to consider the number and type of samples used when genotyping. For example, even if a genotype is never observed in the samples analyzed, Axiom ™ GT1 will still make a guess on the location of an unobserved genotype. Moreover, Axiom ™ GT1 tends to be more successful in clustering SNPs for which there is an observation of all three clusters. With these aspects in mind, it is generally best to use as large and diverse a collection of high-quality samples as possible, in particular to optimize performance on rare alleles.

This vignette provides an overview of genotype clustering for Axiom ™ arrays using apt-probeset-genotype. For a more complete description of apt-probeset-genotype refer to the apt-probeset-genotype manual.

All library files referenced in this vignette can be found at the Axiom ™ product page.

Running a Clustering Analysis

Axiom ™ GT1 is designed to cluster by dynamically adjusting snp-specific models to the observed data. This default behavior can be obtained by using apt-probeset-genotype in the following manner:

  apt-probeset-genotype \
    --analysis-files-path ${AXIOM_LIB_PATH} \
    --xml-file Axiom_GW_Hu_SNP.r2.apt-probeset-genotype.AxiomGT1.xml \
    --out-dir out \
    --cel-files cel_file_list.txt

This above command illustrates a typical use of input parameter files using the --xml-file option. The Axiom ™ GT1 clustering analysis has many options and is highly configurable. To simplify keeping track of these settings across multiple runs and data sets, functionality for XML input parameter files has been introduced as of APT-1.12.0. These files can control all the settings of the apt tools and can gather all salient input parameters in one place, thereby shortening and clarifying command-lines.

The --analysis-files-path option can be used to specify the folder containing the XML file and the other library files referenced therein. For example, if the Axiom ™ library files are stored in /mypath/ then the command line string would be:

  apt-probeset-genotype \
    --analysis-files-path /mypath/ \
    --xml-file Axiom_GW_Hu_SNP.r2.apt-probeset-genotype.AxiomGT1.xml \
    --out-dir out \
    --cel-files cel_file_list.txt

The XML Input Parameter file

All of the genotyping controls in this vignette are stored in a XML input parameter file distributed as part of the library file package located at Axiom ™ product page. It is recommended that you use the official XML file included in the library file package versus creating your own from the example below.

<?xml version="1.0"?>
<ParameterSet subgroupType="Analysis" subgroupName="Axiom_GW_Hu_SNP.apt-probeset-genotype.AxiomGT1" executableName="apt-probeset-genotype">
    <Parameters>
        <Parameter name="set-analysis-name"  currentValue="AxiomGT1" />
        <Parameter name="chip-type"          currentValue="Axiom_GW_Hu_SNP" />
        <Parameter name="chip-type"          currentValue="Axiom_GW_Hu_SNP.r2" />
        <Parameter name="analysis"           currentValue="..." />
        <Parameter name="qmethod-spec"       currentValue="med-polish.expon=true" />
        <Parameter name="read-models-brlmmp" currentValue="Axiom_GW_Hu_SNP.r2.AxiomGT1.models" />
        <Parameter name="use-feat-eff"       currentValue="Axiom_GW_Hu_SNP.r2.AxiomGT1.feature-effects" />
        <Parameter name="cdf-file"           currentValue="Axiom_GW_Hu_SNP.r2.cdf" />
        <Parameter name="special-snps"       currentValue="Axiom_GW_Hu_SNP.r2.specialSNPs" />
        <Parameter name="chrX-probes"        currentValue="Axiom_GW_Hu_SNP.r2.chrXprobes" />
        <Parameter name="chrY-probes"        currentValue="Axiom_GW_Hu_SNP.r2.chrYprobes" />
        <Parameter name="target-sketch"      currentValue="Axiom_GW_Hu_SNP.r2.AxiomGT1.sketch" />
        <Parameter name="set-gender-method"  currentValue="cn-probe-chrXY-ratio" />
        <Parameter name="em-gender"          currentValue="false" />
        <Parameter name="female-thresh"      currentValue="0.54" />
        <Parameter name="male-thresh"        currentValue="1.0" />
    </Parameters>
</ParameterSet>

Where the analysis option is set to the following (with spaces removed): artifact-reduction. ResType=2. Clip=0.4. Close=2. Open=2. Fringe=4. CC=2, quant-norm. target=1000. sketch=50000, pm-only, brlmm-p. CM=1. bins=100. mix=1. bic=2. lambda=1.0. HARD=3. SB=0.75. transform=MVA. copyqc=0.00000. wobble=0.05. MS=0.15. copytype=-1. clustertype=2. ocean=0.00001. CSepPen=0.1. CSepThr=4

The input file specifies a range of genotyping controls:

The output report

The above command will execute genotyping of Axiom ™ data for the listed CEL files in cel_file_list.txt. It will generate three output files in TSV format.

Genotyping with Signature SNPs

Signature SNPs provide a fast and convenient method to generate a genotype "fingerprint" with enough information content to reliably confirm sample identity. This can be useful for sample tracking in laboratory information systems. The signature SNPs are 83 - 116 SNPs, depends on which array, which have consistent high heterozygosity in multiple populations and which are spread out throughout the genome. In this procedure, each sample is genotyped individually in single-sample mode for the signature SNPs so that the calls made for any one CEL file are not in any way influenced by the other CEL files in the batch.

Use the following command to run signature SNPs genotyping :

  apt-probeset-genotype \
    --analysis-files-path ${AXIOM_LIB_PATH} \
    --xml-file Axiom_GW_Hu_SNP.r2.apt-probeset-genotype.AxiomSS1.xml \
    --out-dir out \
    --cel-files cel_file_list.txt

The command is similar to the basic genotyping run, but uses a different XML input parameter file (Axiom_GW_Hu_SNP.r2.apt-probeset-genotype.AxiomSS1.xml) which is distributed as part of the library file package available on the Axiom ™ product page.

Affymetrix Power Tools (APT) Release 1.19.0